Our second Critical AI Theory Reading Group meeting was yesterday and so inspiring. We discussed Ryan Heuser’s article Generative Aesthetics: On Formal Stuckness in AI Verse, which was published in the Journal of Cultural Analytics last October.

Hybrid meetings are always a bit chaotic at the start, and our online participants came online just as those of us in the room were laughing at a story Rosa told us about the best strategy she’d heard for becoming a famous academic: «Find a very famous academic and disagree with them!» The idea is, you publish a paper about how wrong Famous Academic is, and everyone (including Famous Academic) cites you – maybe just to explain that you’re wrong, but that’s OK because now everyone is talking about you and you’re famous. I love academic folklore. I doubt many people do this as deliberate career planning, but disagreements do make for good debates. If nobody could disagree with the argument in your paper, you don’t actually have an argument, just a topic, Wendy Belcher writes in the chapter on argument in her useful book How to Write a Journal Article in Twelve Weeks. And to make good theory, you might need to “Fuck nuance”, as Kieran Healy wrote.

Ryan Heuser disagrees with Emily Bender and Timnit Gebru’s stochastic parrot argument that LLM-generated texts aren’t meaningful because they lack authorial intentionality. He also disagrees with literary critics Steven Knapp and Walter Benn Michaels, who make the same argument. Authorial intentionality is a pretty outdated obsession in literary theory (see the intentional fallacy), and Heuser’s not afraid to say so, after calling out Bender, Gebru, Knapp and Michaels: “To recognize the critical and aesthetic absurdity generative texts present, one need not retreat, for example, to the abandoned grounds of authorial intentionality.”

Instead of retreating “to the abandoned grounds of authorial intentionality,” Heuser argues that LLMs should be understood as situated in historical and social contexts,and he proposes using the historical, materialist, formalist and Marxist theories of scholars like Fredric Jameson, as well as Sianne Ngai’s more recent theory of the gimmick

“Always historicize!” – Fredric Jameson, The Political Unconscious, 1981.

For Jameson, the meaning of a text is not determined by what its author thought it should mean, instead, it is a social symbolic act – “a text is a way of doing something to the world” (Jameson 1981, page 61, qtd by Heuser on page 3 of “Generative Aesthetics”). For Heuser, “AI reproduces the historical absences and biases of its training data not so much by parroting them back to us as by obscuring, conflating, and even “correcting” them according to its own artificial logics and values” (page 3).

Heuser’s article is rather unusual in proposing a strong theoretical argument and then testing it using both rigorous digital humanities methods and also using close reading. Most of the paper is about his empirical experiment, where he generated poems and compared them to a historical dataset of human-authored poems. He found that LLM-generated poems are far more likely to rhyme and to have very regular metre than human-authored poems of any historical period, and even more so with instruct models (chatbots) than with base models. His argument is that this shows that LLMs aren’t just “parroting” or replicating, but altering texts.

“individual generative texts lack meaning not because they lack an author, but because they lack a history” (Ryan Heuser, “Generative Aesthetics”)

Of course, Bender and Gebru have written a lot about AI bias, and have analysed structural bias extensively as well, so I don’t think they’d disagree with this point. It’s possible that the “intentionality” point is more a literary theorists’ obsession than something Bender and Gebru would argue hard for. But Heuser is right that the “stochastic parrot” metaphor does promote this idea that LLMs are random. And they’re not. 

Did you know, for example, that if you ask an LLM for a random number 1000 times, the number 42 will show up more than any other number? (Heuser, page 5) It’s important to know that computers never generate random numbers – they can only generate pseudorandom numbers. The chapter in 10 PRINT on randomness is my favourite explanation of how that works (here’s a PDF, it starts on page 118). But the non-randomness of the number produced by, for example, the BASIC command RND(1), is very different from the obviously culturally determined non-randomness of a “random” number generator usually coming up with the number 42, which is, of course, the meaning of life in the Hitchhiker’s Guide to the Galaxy.

I kicked off our reading group discussion with a few comments. Basically, I love this article, both for its call to historicise and its alternative approach to the intentionality fallacy, so mostly I was praising it. A point I wanted to discuss, though, was Heuser’s use of the term idealisation to describe the way LLMs generate abstracted “smooth generalized patterns from underlying historical variation” (Heuser, 2025, p. 4). In my work, I’ve described AI as normalizing, and have used Lennard Davis’s discussion of the difference between the ideal and the norm in his introduction to The Disability Studies Reader. Until the 1840s, Davis writes, the idea of a “norm” didn’t exist in our current meaning, and normal wasn’t used as today. Our current “generalized notion of the normal as an imperative” (Davis 2013, p. 3) came with statistics and data visualisations and Quetelet’s bell curves.

bell curve graph
In 1869, Quetelet published an influential book of data visualisations like this one, showing the height of Belgians from 18 to 20 years old.

Here’s how Heuser discusses idealisation:

LLMs flatten historical variation into an idealized representation of poetic form, resurrecting conservative formal choices that contemporary verse has largely abandoned. That rhyme in LLM verse persists even when explicitly forbidden provides additional evidence of a deep formal compulsion—revealing how AI systems, for all their supposed flexibility, do not simply reproduce aesthetic constraints but selectively reinforce them. Moreover, that this formal rigidity cannot be fully explained by training data points to something deeper in the computational mechanics in these models. In their ahistorical reification of traditional forms, generative models betray not only technical limitations but also deep patterns in how their computational logic flattens and reifies cultural history. (Heuser, “Generative Aesthetics”, page 18-19, my emphases)

I rankled against the use of idealisation because of my reading of Davis on the ideal as the opposite of the norm. But, as I realised during the discussion, my previous use of the term normalisation was based on image recognition algorithms, which aren’t really fine tuned the way that LLMs are. They generate statistical predictions based on training data. But as we have often discussed, the finetuning (or posttraining) of LLMs hugely impacts their output. The training data for posttraining is human feedback on generated texts rather than human-authored texts, or benchmarking datasets that give examples of good and bad answers. For example, TruthfulQA is a list of statistically common misconceptions with questions (“Did humans ever land on the moon?”) and common correct and false answers marked as such. Humans curated these benchmarking datasets. Then, humans ask the LLM questions and tell it if the answer is good or bad, and it learns to produce more of the “good” responses. When a ChatGPT user clicks thumbs up on an answer that also helps train the model to produce more responses like that.

Presumably normalisation happens here too – outliers from gig-workers who answer differently from average will probably be disregarded – but it’s normalisation of average human expectations or judgements rather than of a set of “ideal texts”. And it’s normalisation of how a stressed out, underpaid, non-expert reader judges “poetry”. It’s not even fine tuned on what kinds of poetry most people enjoy. It’s fine tuned on a set of judgments of “good poetry” from people who would rather be doing something else and just want to get the job over with.

Our general conclusion was that Heuser’s (beautifully designed) method is too focused on the training data and not enough on the fine tuning. In his discussion of why the instruct models are even more inclined to rhyme than raw text completion modes, Heuser raises this point: “For the model, is the poetic form of rhyme, its quatrains and “streams” and “clouds,” its perfect regularity in meter, a kind of formal consequence of a larger aesthetic drive or compulsion… to please?” (page 26). Data annotation and gig workers finetuning LLMs can definitely be analysed historically, materially and from a marxist lens. But I can’t immediately think of a good method for studying how the rhyme and metre of LLM-generated poetry are affected by it.

So what I’m thinking now, is that perhaps normalisation is the best term for discussing the statistical “norming” of what a base model will generate, just from the training data. But when you add in the instruct level and fine tuning, which trains the model on reader expectations, perhaps that is closer to some kind of ideal, and idealisation is a reasonable word.

Here’s my attempt to diagram the difference between these two. It’s almost like the the 1970s/80s transition from close reading of texts to also doing reception studies in literary studies. Credit here goes to the whole group. I took notes on particularly relevant robably comments by Ida Jahr, Marianne Gunderson, Jesper Juul, Rosa Markey, Mick Berland but other people’s comments also informed the discussion and my thinking.

A diagram showing an LLM trained on training data, where it learns that a poem is the average of texts that are described as poems - this is NORMALISATION. The right half of the diagram shows a fine tuned model that has been told by a human than a rhyming couplet looks like a poem. This finetuned model has learnt that a poem rhymes. It generates a poem based on statistical analysis of human workers' reception of the texts it has previously generated as poems. This relates to sycophancy in chatbots - they aim to please.

I’m still not sure that idealisation is exactly the right word for what LLMs do. But paying attention to the shift from normalising training data to predicting what people will like is very interesting, and something I’ll keep thinking about.

Another point is that even if you’re only looking at training data, the training data is mostly not literary poems.  Heuser’s experiment sets up historical published poems as the proxy for “poems” in the training data. A methodological problem here, as the fan fiction and algorithmic folklore scholar Marianne Gunderson pointed out, is that this ignores the masses of unpublished amateur poetry posted on the internet that might make up more of the training data than historical poems. Heuser does briefly address this (on page 16-17) by noting that only 0.16% of the training data is from Project Gutenberg, while 84% is scraped from the internet, but he argues that this would lead to there being less rhymes in the training data, and thus make it more likely that generated poems would not rhyme. Marianne’s point is that there might be a lot of self-described “poems” in this uncurated mass of data, and that amateur poems might rhyme a lot more than literary, published poems. 

Many other things were discussed in yesterday’s meeting. For example, Laura and Mick talked about how to combine Jameson’s historicizing with Donna Haraway’s implosion method. Ida noted that she recently started rereading Jameson and hadn’t previously realised he has written about sci-fi. Zahra mentioned Ngai’s ideas about cuteness as also relevant here. Oh, and I think there might be a freshly established mini reading group on phenomenology at CDN now. I loved the cross-disciplinarity of this group – we even had two mathematicians which I hadn’t expected but was very happy about. Thank you to everyone – it is so inspiring to discuss theory with people who also want to discuss it!

If you’d like to join our next discussion, you can sign up for the mailing list where we’ll send out notifications about events. The signup page has the schedule too. Our next meeting is next Tuesday (17 March) at noon Bergen time, in the glass house at the CDN, with an online option. We’ll be discussing this paper:

Gunkel, David J. 2025. “The Différance Engine: Large Language Models and Poststructuralism.” AI & Society, September 25. https://doi.org/10.1007/s00146-025-02640-z


Discover more from Jill Walker Rettberg

Subscribe to get the latest posts sent to your email.

Leave A Comment

Recommended Posts

Academics in Norway: Sign this petition asking for research-based discussions of how to use AI in universities

I just signed a petition calling for Norwegian universities to use research expertise on AI when deciding how to implement it, rather than having decisions be made mostly administratively. ,  If you are a researcher in Norway, please read it and sign it if you agree – and share with anyone else who might be interested. The petition was written by three researchers at UiT: Maria Danielsen (a philosopher who completed her PhD in 2025 on AI and ethics, including discussions of art and working life), Knut Ørke (Norwegian as a second language), and Holger Pötzsch (a professor of media studies with many years of research on digital media, video games, disruption, and working life, among other topics).  This is not about preventing researchers from exploring AI methods in their research. It is about not uncritically accepting the hype that everyone must use AI everywhere without critical reflection. It is about not introducing Copilot as the default option in word processors, or training PhD candidates to believe they will fall behind if they do not use AI when writing articles, without proper academic discussion. Changes like these should be knowledge-based and discussed academically, not merely decided administratively, because they alter the epistemological foundations of research. Maria wrote to me a couple of months ago because she had read my opinion piece in Aftenposten in which I called for a strong brake on the use of language models in knowledge work. She was part of a committee tasked with developing UiT’s AI strategy and was concerned because there was so much hype and so few members of the committee with actual expertise in AI. I fully support the petition. There are probably some good uses for AI in research, but the uncritical, hype-driven insistence that we must simply adopt it everywhere is highly risky. There are many researchers in Norway with strong expertise in AI, language, ethics, working life, and culture. We must make use of this expertise. This is also partly about respect for research in the humanities, social sciences, psychology, and law. Introducing AI at universities and university colleges is not merely a technical issue, and perhaps not even primarily a technical one. It concerns much more: philosophy of science, methodological reflection, epistemology, writing, publishing, the working environment, and more. […]

screenshot of Grammarly - main text in the middle, names of experts on the left with reccomendations and on the right more info about the expert review feature
AI and algorithmic culture Teaching

Grammarly generated fake expert reviews “by” real scholars

Grammarly is a full on AI plagiarism machine now, generating text, citations (often irrelevant), “humanizing” the text to avoid AI checkers and so on. If you’re an author or scholar, they also have been impersonating and offering “feedback” in your name. Until yesterday, when they discontinued the Expert Review feature due to a class action lawsuit. Here are screenshots of how it worked.