These days, so-called generative AI can (allegedly) make art, write books, and compose poetry. Systems like Stable Diffusion, Midjourney, and ChatGPT are seemingly quite good at it. But for some artists, this creates problems. Namely, determining what legal rights they have when their work is scraped by these tools.
Faced by the rise in these systems, authors and artists are pushing back. The Writers Guild of America (WGA) is striking in part over the potential use of AI to write scripts, referring to such systems as “plagiarism machines.” Visual artists have penned open letters denouncing the use of AI to replace illustrators, calling it “the greatest art heist in history.” Getty sued Stability AI in January for copyright infringement.
But what if your work exists in a kind of in-between space—not work that you make a living doing, but still something you spent hours crafting, in a community that you care deeply about? And what if, within that community, there was a specific sex trope that would inadvertently unmask how models like ChatGPT scrape the web—and how that scraping impacts the writers who created it.
The trope in question is called “the Omegaverse,” which is perhaps best described as an act of collective sexual worldbuilding. It began in the (very active) fandom for the TV series Supernatural, but has now spread to almost every corner of the fan-fiction world. These stories are defined by a specific sexual hierarchy made up of Alphas, Betas, and Omegas in which Alphas and Omegas can smell one another in particular ways, experience “heats,” and (usually) mate for life. Most of these stories are heavy on smut, and bodily fluids are crucial to the whole genre.
Within the Omegaverse, there is also something called “knotting,” a phenomenon borrowed from animals in which a penis grows a bulb at the base to remain locked inside a vagina. If this all sounds overwhelming, you’re not alone. “I remember the first time I encountered it, and I will confess, my reaction was, ‘What is this? What is happening?’” says Hayley Krueger, a fan-fiction writer who published an Omegaverse 101 explainer earlier this year. But she says she quickly fell in love with the trope.
When characters in the Omegaverse mate, they become linked biologically. Different writers have different ways of showing or expressing this—anything from being able to smell your mate’s mood, to being able to actually communicate telepathically across distances. “I really like the dynamic between characters,” Krueger says. “It's almost like soulmates, but you choose it and then you get all these perks that go with it.”
Because the Omegaverse has such specific terms and phrases associated with it, ones that are found within fan fiction and nowhere else, it’s an ideal way to test how generative AI systems are scraping the web. Determining what information has gone into a model like ChatGPT is almost impossible. OpenAI, the company behind the tool, has declined to make its training data sources public. The Washington Post did their own analysis of the model, and created a way to peek at the websites that make up Google’s C4 dataset. But even people who build applications using ChatGPT have no insight into what the system is trained on.
In the absence of any list of sources, people have tinkered with other ways to try and explore what these models might know and how. One way to do that is to prompt the system with words or questions you know come from a certain source. So, for example, if you want to know whether the works of Shakespeare are being used in the model, you might give the system a few unique lines of a play and see if it comes back with iambic pentameter. Or, if you want to know whether these systems are trained on fan fiction, you might give the model a specific trope unique to fandom.
A few months ago, a fan-fiction writer with the handle kafetheresu did just that. In a very thorough post on Reddit, they pointed out that when they gave the writing tool Sudowrite (which uses OpenAI’s GPT-3 to operate) specific words and phrases unique to the Omegaverse, the system readily filled in the rest in a way that suggested the AI knew all about this particular trope. (The Reddit poster declined to speak on the record.)