Algorithmic Allies with Leonard Speiser

About the Author: Leigh Hennig is an author, editor, and engineer. By day he serves as the Chief Network Architect for the largest datacenter in New England. Previously he worked as a systems/network engineer at MIT Lincoln Laboratory, Cisco Systems, and Amazon Web Services, where he specialized in network automation at scale. He is co-founder of Rocky Linux and the Rocky Enterprise Software Foundation. Recently he participated in a panel presenting to the US Copyright Office on AI in Literature. As a horror writer, his fiction has appeared in anthologies by Flame Tree Press, Crystal Lake Publishing, the 2022 HWA Poetry Showcase, British Fantasy Society, and MOTHER: TALES OF LOVE AND TERROR, a 2022 Bram Stoker Award finalist.

AI, FUD, and You: Our Reckless Conversations

Written by R. Leigh Hennig

As a propaganda tactic, fear, uncertainty, and doubt—FUD—has a long and storied history. If it were a currency you could trade, media and politicians would be even more fabulously wealthy than they already are. As a means to turn our atavistic fear of the unknown and our groupthink against us, cynics have good reason to create and disseminate FUD: it works. As the old saw goes, a lie can go halfway around the world before truth has put its boots on.

But as rational, thinking people, we can usually identify FUD before it has a chance to sink its teeth into us. “This headline is inflammatory,” the skeptical reader says. “That article is playing on my fears,” or “their response doesn’t tell the whole story.” As fiction writers, our goal is to manipulate the reader—their emotions, their expectations—convincing them, if only for a short time, of the world and characters within our stories. Especially as a horror writer, FUD is part of my tradecraft. We should excel at spotting FUD from a distance.

So why do we seem to so easily fall victim to AI FUD?

I’ve already lost some of you. Before I lose too many more, allow me to be clear:

Most of the AI tooling available (OpenAI’s ChatGPT, Google’s Bard, Microsoft’s Bing Chat, etc.) have been unethically trained on pirated or stolen material. They have not secured the appropriate license to the datasets used in the training of their products. Authors have not been properly compensated or attributed, making the use of such tools at the very least highly problematic. I hope the lawsuit that Sarah Silverman has against Meta and OpenAI is resolved in the plaintiff’s favor, and if possible, I’d like to see a class action lawsuit on behalf of all writers whose work has been misappropriated for the benefit of these corporations and their shareholders.
When used to generate copy or fiction, these tools do an awful job. Hallucinations aside, the writing is just…bad. Really bad. AI isn’t going to replace good writers, nor should it. Motions for it to do so should be shut down hard. Like many of you, I celebrate the protections the Writer’s Guild secured recently against AI-authored content eroding their business and cheapening their work.
Any effort to eliminate paying humans for their labor by replacing them with AI is immoral, unethical, and an affront to the fundamental decencies of essential human value which capitalism continues to maim and abuse.

However.

However.

Much of the current discourse among readers and writers regarding AI that I have seen is itself of concern, being ill informed, and lacking nuance. Put simply: we are spreading FUD. Let’s consider two of the biggest misconceptions that are falsely coloring our perspective:

“AI can’t do anything useful but generate bland, crappy writing.”

(In case you’re wondering whether or not people are really making this claim: yes, they are.)

Often dropped as a “hot take” on Twitter/Mastodon/Bluesky/et al, somehow it’s become popular to decry the utility of AI tooling with the notion that use cases beyond generating cheap copy are marginal, if they exist at all. And like most hot takes, this one is superficial, inaccurate, and usually dropped in bad faith, without any real intention of generating thoughtful conversation.

Here are some of the things that AI is being used for:

Predicting structures of proteins, aiding medicine and drug design
Designing parts for spaceships
Coping with ADHD, and dyslexia
Helping the World Bee Project save the bees
Developing a novel malaria vaccine, shedding light on genetic variation and Parkinson’s disease, and combatting antibiotic resistance
Reducing false positives in colorectal cancer, and searching for novel antibiotics
Recognize and fight cyberattacks
Aid in code completion, accelerating software development

While AI really needs to be considered as a broad term under which many similar technologies (machine learning, deep learning, natural language processing, etc.), the ways in which people are using AI for the betterment of society are endless, from aiding individuals in day-to-day tasks for work and life, to more expansive initiatives undertaken by corporations, governments, academia, non-profits, and others.

“AI has been trained on stolen material, therefore any use of AI is unethical.”

“Objection!” you say. “You said before that AI was unethical because it’s been trained on stolen data, so why are you also listing this as a misconception?”

Au contraire, I said that most of the AI tooling available has been unethically trained.

But what if we could use writing in the public domain for training AI tools and chatbots? What if there were models and datasets that were properly made for individual and commercial use? If an AI tool was built using licensed data, where copyrighted material was respected, would that make it morally acceptable to use such tooling? Would some of the ethical concerns regarding the source material, and subsequently the output such tools produce, therefore be mitigated? Actually…yes. I think it would.

Allow me to introduce you to Hugging Face.

Founded in 2016, Hugging Face is an American company that develops tools, libraries, models, and other relevant artifacts used in the development of AI. They host the Hugging Face Hub, a platform that allows hosting of code repositories, models, datasets (text, images, audio), and more. Per their website:

“We’re on a mission to democratize good machine learning, one commit at a time.

At Hugging Face, we build open-source resources to empower you to easily integrate AI into your products and workflows. We are convinced that AI can be accessible, optimized, and responsible

…

We have built the fastest-growing, open-source, library of pre-trained models in the world. With over 130M+ installs and 110K+ stars on GitHub, over 10 thousand companies are using HF technology in production, including leading AI organizations such as Google, Elastic, Salesforce, Algolia, and Grammarly.”

The open source part of this is key. According to their guidelines, restricted content includes any content “published without the explicit consent of the people represented,” and that “infringes or violates any rights of a third party or an applicable License.” Additionally:

“…we hold consent as a core value. While existing regulations do protect people’s rights to work, image, and data, new ways of processing information enabled by Machine Learning technology are posing new questions with respect to these rights. In this evolving legal landscape, prioritizing consent not only supports forethought and more empathy towards stakeholders but also encourages proactive measures to address cultural and contextual factors.”

Tools and guidelines for reporting content in violation of these policies, as well as a means to engage them in removing any content that violates or infringes on intellectual property rights, are also fundamental to the platform. Accordingly, all of the components needed to build various AI tools—everything from ChatGPT-style chatbots to big data analytics and machine learning tools—must also be compliant with their terms of service.

Whereas tools like ChatGPT have not disclosed their training sets, the data uploaded to Hugging Face is open source, and has to be legally available for use.

Take for instance the Wikipedia dataset, and its associated license information:

“Wikimedia’s mission is to create educational content that is freely available to all people. In keeping with that goal, all information on Wikimedia projects may be freely shared, copied, remixed, and used for any purpose (including commercial purposes!) in perpetuity.”

Another example is the databricks-dolly-15k dataset, an “open source dataset of instruction-following records generated by thousands of Databricks employees.” Their license states that “this dataset can be used for any purpose, whether academic or commercial, under the terms of the Creative Commons Attribution-ShareAlike 3.0 Unported License.”

There are over 46,900 datasets available on Hugging Face, all using various open source licenses. (If you’re partial to any license in particular, you can even sort and display the datasets according to license such as Apache 2.0, CC-BY-4.0, GPL, BSD-3, MIT, and others.) With so many datasets available, it’s probable that stolen or otherwise problematic data are in there, but I think the point remains: a lot of folks have recognized this problem of training models with unethically sourced data, and are making significant efforts to build AI tooling responsibly. It is possible that AI tools can be modeled, trained, and built with appropriate licensing for individual and commercial use. You can even register a free account and try one yourself, right now, with Hugging Face Chat:

“The goal of this app is to showcase that it is now (May 2023) possible to build an open source alternative to ChatGPT.

For now, it’s running OpenAssistant’s latest LLaMA based model (which is one of the current best open source chat models), but the plan in the longer-term is to expose all good-quality chat models from the Hub.”

(I do not have any relation whatsoever with Hugging Face, its affiliates, or any individual or organization that uploads data to their website or has built anything using what’s found there.)

Crucially, nothing that I’ve said takes away from the many significant dangers and problems with AI. If we’re using AI built on models and datasets that are open sourced, licensed, and freely, legally, ethically available for such use, only one of the common arguments against AI is mitigated.

“These systems can generate untruthful, biased and otherwise toxic information. Systems like GPT-4 get facts wrong and make up information, a phenomenon called “hallucination,” the NY Times notes. There are risks to democracy and the ease at which it can be used to spread disinformation (yes, AI can be used to create and spread FUD), concerns around privacy and surveillance, and serious consequences to the environment, as the computing power necessary to train these tools contributes to emissions and fossil fuel production.

That’s not all. In January, Time published a report detailing how OpenAI used Kenyan workers, paid on paltry wages of less than $2 per hour, to help train ChatGPT, exposing them to explicit and traumatic content. AI “could further fuel an epidemic of child sexual abuse,” the UK’s top law enforcement agency recently warned.

Is AI a net positive to society? I don’t know, and neither do you. No one does, and as development advances and public discourse evolves, the question remains open. But what we know for certain is that it’s not going away. AI has problems—big ones—and we need to figure out how we’re going to deal with those problems. Instead of making bold declarative statements, we need to recognize that we are in a transition period. The landscape is shifting with such rapidity that it would be unwise to plant a flag, yet that’s what too many seem to be doing. This is reckless.

Corporations (not even ones that facilitate ethically developing AI tooling) aren’t going to be of much help here; by their nature, they exist only to generate money for investors and shareholders. We need thoughtful regulation, and societal and cultural maturity. And as writers, we need to take responsibility for our part in that maturity by understanding the capabilities, dangers, and potential benefits of such tooling. Participating in the creation and dissemination of FUD is an abdication of those responsibilities.

About the Author: Leigh Hennig is an author, editor, and engineer. By day he serves as the Chief Network Architect for the largest datacenter in New England. Previously he worked as a systems/network engineer at MIT Lincoln Laboratory, Cisco Systems, and Amazon Web Services, where he specialized in network automation at scale. He is co-founder of Rocky Linux and the Rocky Enterprise Software Foundation. Recently he participated in a panel presenting to the US Copyright Office on AI in Literature. As a horror writer, his fiction has appeared in anthologies by Flame Tree Press, Crystal Lake Publishing, the 2022 HWA Poetry Showcase, British Fantasy Society, and MOTHER: TALES OF LOVE AND TERROR, a 2022 Bram Stoker Award finalist.

Published in issue #144 • Special AI Discovery Issue • July 2023

AI, FUD, and You: Our Reckless Conversations

Written by R. Leigh Hennig

You Might Also Like

Speculating: 11394.27

Exercise Your Writes – April 2023

The Final Issue of 2024