I’m coming in hot from maternity leave with some half-baked thoughts about AI. Over the past year I’ve seen lots of takes, from authors on my social media feeds promising to never read a book or watch a movie if AI was involved, to authors openly discussing how they use AI to generate the text of their books. (For instance, this 8/14 interview in Jane Friedman’s The Hot Sheet.) In the pro-AI / AI-neutral discussions, AI is being called a “tool” for some aspect of the creative process, akin to using spell-check, Grammarly, or Thesaurus.com. One of the suggested uses of AI is for research, and at first, I thought this was probably the easiest use to defend, at least against the argument that writers are supposed to create their written work themselves.1 Writers don’t create the facts of the world that we dig up in research–we learn them. We already use search engines, Wikipedia, YouTube, and more to research everything from historical facts to modern slang. In terms of writing a novel armed with accurate information about the real world, AI should be one more defensible research tool. But then I actually tried it.
I wouldn’t normally sit around thinking about ChatGPT while I nurse my baby, but a lawyer friend reminded me of an ongoing saga where a law firm got slapped for citing cases generated by ChatGPT in a filing to the court…it turned out the cases were made up. The lawyers hadn’t checked the cases before filing, and it’s easy to say, well, obviously a competent lawyer would check a case cite before putting it in a court filing. But in the words of Tracy Jordan, to play devil’s avocado, I get how it happened. Lawyers are busy. Some lawyers are intimidated by legal research. And once my husband Ben (also a lawyer) showed me what it looked like on ChatGPT, I can see how someone would have assumed the “artificial intelligence” was giving him an actual case citation.
Ben typed a legal question into the program that he’d been researching at work, and it spat out an answer. He started to get excited–the answer was what he wanted it to be, but he hadn’t found any good case law himself. He asked ChatGPT for cases and it generated a handful. He searched the citations–none of them existed.
Just now, I made an account with my old author email (hello, old pen name Regan Rose) and recreated what happened, this time with a simple legal question I know something about:
That’s not a terrible answer, but it doesn’t actually state the standard as Maine courts have written it. ("Competence to stand trial sufficient to meet the requirements of due process means that the accused is capable of understanding the nature and object of the charges and proceedings against him . . . and of conducting in cooperation with his counsel his defense in a rational and reasonable manner." Thursby v. State, 223 A.2d 61, 66 (Me. 1966).) Not to be a lawyer about it, but the answer isn’t quite accurate.
Those look like real cases! I’ve even read a State v. Dyer case out of Maine, but I don’t remember if it had anything to do with competence. I checked the case cites on Lexis. Every single one is fabricated. The citation for Dyer, for example, dumps you in the middle of a Delaware case named Partners v. Beck, about a company called Boston Chicken, Inc. So. Not a case about “the constitutional standard for competency to stand trial.” (Meanwhile, the statutory citation is close. Section 101-B used to be the law, but it was repealed in 2009. Now Maine’s competence procedure is codified at 15 M.R.S. §101-D.)
ChatGPT must be working on the issue, because when Ben did the experiment, he asked the AI if the cases were real, and it responded yes. I wish I had a screenshot because I promise you, it doubled down! Then he asked about a specific (fake) case by name, and then the AI spat out the canned apology.
We were stunned that the AI still hadn’t been trained to respond that it could not give a case citation when asked for one. (The people behind ChatGPT have to be aware of the situation with the lawyers; it’s made national news several times.) Even where they seem to have improved the AI’s understanding of the question “are these cases real” in the past few months, ChatGPT shouldn’t be making up cases in the first place.
When I told Andromeda about this, she told me that a coaching student had a similar issue in the context of her writing. She’d tried to use ChatGPT to research something, and it had given her fabricated sources. Luckily, Andromeda’s student went looking for the actual books and discovered that they don’t exist, rather than relying on the information without checking it.
So we know the “artificial intelligence” can give us fake facts. But if that weren’t enough, another troubling thing happened.
That same night we were playing with ChatGPT, Ben asked the program if it had read The Damage by Caitlin Wahrer. It responded that it could not read the text of the book but could read information about the book from sources available online. Ben then asked it to summarize the novel. (We did take screenshots of this part.)
Half of the sentences are correct, but they’re also so general that they are probably correct for half of all suspense novels. (“The novel’s tension is driven by the exploration of secrets within relationships and the search for justice amidst the complexities of human behavior.”) The other half of the sentences contain random inaccuracies. The main character is a woman named Julia Hall, who is married to Tony (not Thomas), and they have two kids, Chloe and Sebastian (not Ethan). The man who’s arrested is Ray Walker, not Johnny Strayhorn, but dibs on that name for my next villain. The book takes place in Maine, not upstate New York. The assault happens in a motel, not a person’s home. And most importantly, the victim of the crime is a man named Nick, not a woman named Nicky. This last one is the reason I’m even telling this part of the story: given the parts of the summary that were close to the truth, I suspect ChatGPT did access some kind of book summary or review of my novel somewhere online. Not only did it make up a bunch of names and get basic details wrong, it also assumed my fictional victim was a woman. The bias of its programming, training, or the materials it regularly accesses is inherent in the summary. (For all that it made up, it assumed the main couple was heterosexual; the assailant male. Nick’s character is gay, but ChatGPT summarized him as a straight woman.)
I couldn’t stop thinking about the potential pitfalls here, not just for lawyers, but writers who might be coming to a “tool” that’s being marketed as being “intelligent.” Are we writers aware that the intelligent tool’s answers can be inaccurate? Affected by bias? Straight-up fabricated and presented as fact?
On my maternity leave, I read Ilyon Woo’s Master Slave Husband Wife, for which she won the Pulitzer Prize in Biography. There, Woo told the true story of Ellen and William Craft, an enslaved couple who emancipated themselves. The story involved details of race, colorism, sex and gender expression, disability, and more. The book was meticulously researched. On a sentence-level, Woo took great pains to be accurate.2 She also exposed potential bias in historical sources.3 What if Woo had used ChatGPT for any of her research? Through her book, she educated me about an era of American history that I knew comparatively little about. What if that book had been based, even in part, on broad generalizations, fake resources, or false information that had been twisted by the bias inherent in the AI’s programming and/or the materials it had been exposed to? This might have done genuine harm, not only to the legacy of Ellen and William Craft, but also through me, the reader, who could have left the book mis-educated about something that continues to matter today.
I decided to run a final experiment for this piece. Back when I was writing The Damage, which involved the sexual assault of a male character, it took me a while to realize that my base of knowledge going into the book, and most of the sources I was reading in my research, had to do with the experiences of girls and women who had survived assault. Only when I realized this did I start seeking out more specific information about male survivors. I wondered now, if I had been using ChatGPT for my research, whether bias might creep into the answers I got. And I would say yes, it does.
I asked a question three ways, and you can see that ChatGPT answered my generic question about survivors with information that is more accurate (in ChatGPT’s base of knowledge) as to female survivors. A good example is comparing #1, #1, and #3—only the male-specific question includes the male-specific societal perception. This distinction is really important when you’re talking about what a man or male-presenting person might experience from the people around them after an assault has happened.
I don’t mean to be alarmist. I understand that you can use ChatGPT as a starting place for your research and check the sources it names, and otherwise attempt to check the answer it gives you. I highly doubt that someone as conscientious as Ilyon Woo would rely on ChatGPT for her research, or that she’d be capable of producing a book as excellent as Master Slave Husband Wife if she had. But still, I worry, because (to be a bit of a critic), widely read books need not be “excellent” by my standards. I worry about writers who may trust the intelligent tool’s promise that it has given them accurate information, and who may be on deadline, or intimidated by research, and who may rely on the information without checking it elsewhere. And I think the implications are especially bad when the AI gives its user an answer founded in some kind of bias that the user may also implicitly share.
I’m not here to give black and white rules or make grand promises of “I’ll never use ChatGPT,” because, who knows, maybe one day it will be as common as Google. But I’m not gonna lie…it troubles me. And for now, at least, I personally don’t plan to use it as a tool for anything.
Please share your thoughts in the comments, or better yet, have ChatGPT write a comment for you.
And if reading this hasn’t been a long-enough work break, feel free to watch this old SNL sketch about insurance from robot attacks, which feels almost relevent enough to include.
I’m focusing on this part of the issue and not addressing the arguments that have to do with paid labor, etc. I just don’t have the time, people.
For some examples, there were sentences that clarified that something was only probable, or possible, not definite. There were scenes where Woo recounted multiple versions of a single event because historical sources conflicted.
An example that springs to mind is how Woo was discussing a British journalist’s reporting on American slavery and how Woo paused to note that the journalist, although an abolitionist, had betrayed her own biases against Black people in some of her writings.
I've done my own fun experiments asking ChatGPT to tell me something I already know (for example, "What advice does Andromeda Romano-Lax give to other writers?") and got bland, generic, inaccurate answers as a response. When I tuned the prompt by asking for quotes, then I got a more accurate, limited answer, with sources. I worry that people's takeaway from Caitlin's post and ones like it will simply be, "AI will get more accurate with time--just wait and see" instead of recognizing that we are at the precipice of a brave (frightening) new world of misinformation...plus outsourcing of creative jobs to AI. This is such a strange parallel to what we're seeing in politics right now. Unfortunately, it seems like half the country (at least) doesn't really care very much whether statements are truthful.
I had an editor at a company who used ChatGPT to analyze data from a survey they did as a basis for an article. ChatGPT spat out one point that the editor was very excited about, but the data didn't support it at all. This is how I discovered AI hallucinations, and what a fun way to have it happen! (No, the article was never published, but I got paid so Yay for them!)
Frankly, at this point we consider ChatGPT (and other programs like it) to be like the computer on the Starship Enterprise, a system with all of the answers. But really, it's just learning how to talk to people. Perhaps that's why they call them "Large Language Model" instead of "Large Piles of Useful Information"?
But with this piece, I think I'll stick to my regular old ways for research for a while longer.