Do LLMs normalise or idealise? Notes after discussing Ryan Heuser’s “Generative Aesthetics”

Our second Critical AI Theory Reading Group meeting was yesterday and so inspiring. We discussed Ryan Heuser’s article Generative Aesthetics: On Formal Stuckness in AI Verse, which was published in the Journal of Cultural Analytics last October.
Hybrid meetings are always a bit chaotic at the start, and our online participants came online just as those of us in the room were laughing at a story Rosa told us about the best strategy she’d heard for becoming a famous academic: «Find a very famous academic and disagree with them!» The idea is, you publish a paper about how wrong Famous Academic is, and everyone (including Famous Academic) cites you – maybe just to explain that you’re wrong, but that’s OK because now everyone is talking about you and you’re famous. I love academic folklore. I doubt many people do this as deliberate career planning, but disagreements do make for good debates. If nobody could disagree with the argument in your paper, you don’t actually have an argument, just a topic, Wendy Belcher writes in the chapter on argument in her useful book How to Write a Journal Article in Twelve Weeks. And to make good theory, you might need to “Fuck nuance”, as Kieran Healy wrote.
Ryan Heuser disagrees with Emily Bender and Timnit Gebru’s stochastic parrot argument that LLM-generated texts aren’t meaningful because they lack authorial intentionality. He also disagrees with literary critics Steven Knapp and Walter Benn Michaels, who make the same argument. Authorial intentionality is a pretty outdated obsession in literary theory (see the intentional fallacy), and Heuser’s not afraid to say so, after calling out Bender, Gebru, Knapp and Michaels: “To recognize the critical and aesthetic absurdity generative texts present, one need not retreat, for example, to the abandoned grounds of authorial intentionality.”
Instead of retreating “to the abandoned grounds of authorial intentionality,” Heuser argues that LLMs should be understood as situated in historical and social contexts,and he proposes using the historical, materialist, formalist and Marxist theories of scholars like Fredric Jameson, as well as Sianne Ngai’s more recent theory of the gimmick.
“Always historicize!” – Fredric Jameson, The Political Unconscious, 1981.
For Jameson, the meaning of a text is not determined by what its author thought it should mean, instead, it is a social symbolic act – “a text is a way of doing something to the world” (Jameson 1981, page 61, qtd by Heuser on page 3 of “Generative Aesthetics”). For Heuser, “AI reproduces the historical absences and biases of its training data not so much by parroting them back to us as by obscuring, conflating, and even “correcting” them according to its own artificial logics and values” (page 3).
Heuser’s article is rather unusual in proposing a strong theoretical argument and then testing it using both rigorous digital humanities methods and also using close reading. Most of the paper is about his empirical experiment, where he generated poems and compared them to a historical dataset of human-authored poems. He found that LLM-generated poems are far more likely to rhyme and to have very regular metre than human-authored poems of any historical period, and even more so with instruct models (chatbots) than with base models. His argument is that this shows that LLMs aren’t just “parroting” or replicating, but altering texts.
“individual generative texts lack meaning not because they lack an author, but because they lack a history” (Ryan Heuser, “Generative Aesthetics”)
Of course, Bender and Gebru have written a lot about AI bias, and have analysed structural bias extensively as well, so I don’t think they’d disagree with this point. It’s possible that the “intentionality” point is more a literary theorists’ obsession than something Bender and Gebru would argue hard for. But Heuser is right that the “stochastic parrot” metaphor does promote this idea that LLMs are random. And they’re not.
Did you know, for example, that if you ask an LLM for a random number 1000 times, the number 42 will show up more than any other number? (Heuser, page 5) It’s important to know that computers never generate random numbers – they can only generate pseudorandom numbers. The chapter in 10 PRINT on randomness is my favourite explanation of how that works (here’s a PDF, it starts on page 118). But the non-randomness of the number produced by, for example, the BASIC command RND(1), is very different from the obviously culturally determined non-randomness of a “random” number generator usually coming up with the number 42, which is, of course, the meaning of life in the Hitchhiker’s Guide to the Galaxy.
I kicked off our reading group discussion with a few comments. Basically, I love this article, both for its call to historicise and its alternative approach to the intentionality fallacy, so mostly I was praising it. A point I wanted to discuss, though, was Heuser’s use of the term idealisation to describe the way LLMs generate abstracted “smooth generalized patterns from underlying historical variation” (Heuser, 2025, p. 4). In my work, I’ve described AI as normalizing, and have used Lennard Davis’s discussion of the difference between the ideal and the norm in his introduction to The Disability Studies Reader. Until the 1840s, Davis writes, the idea of a “norm” didn’t exist in our current meaning, and normal wasn’t used as today. Our current “generalized notion of the normal as an imperative” (Davis 2013, p. 3) came with statistics and data visualisations and Quetelet’s bell curves.

Here’s how Heuser discusses idealisation:
LLMs flatten historical variation into an idealized representation of poetic form, resurrecting conservative formal choices that contemporary verse has largely abandoned. That rhyme in LLM verse persists even when explicitly forbidden provides additional evidence of a deep formal compulsion—revealing how AI systems, for all their supposed flexibility, do not simply reproduce aesthetic constraints but selectively reinforce them. Moreover, that this formal rigidity cannot be fully explained by training data points to something deeper in the computational mechanics in these models. In their ahistorical reification of traditional forms, generative models betray not only technical limitations but also deep patterns in how their computational logic flattens and reifies cultural history. (Heuser, “Generative Aesthetics”, page 18-19, my emphases)
I rankled against the use of idealisation because of my reading of Davis on the ideal as the opposite of the norm. But, as I realised during the discussion, my previous use of the term normalisation was based on image recognition algorithms, which aren’t really fine tuned the way that LLMs are. They generate statistical predictions based on training data. But as we have often discussed, the finetuning (or posttraining) of LLMs hugely impacts their output. The training data for posttraining is human feedback on generated texts rather than human-authored texts, or benchmarking datasets that give examples of good and bad answers. For example, TruthfulQA is a list of statistically common misconceptions with questions (“Did humans ever land on the moon?”) and common correct and false answers marked as such. Humans curated these benchmarking datasets. Then, humans ask the LLM questions and tell it if the answer is good or bad, and it learns to produce more of the “good” responses. When a ChatGPT user clicks thumbs up on an answer that also helps train the model to produce more responses like that.
Presumably normalisation happens here too – outliers from gig-workers who answer differently from average will probably be disregarded – but it’s normalisation of average human expectations or judgements rather than of a set of “ideal texts”. And it’s normalisation of how a stressed out, underpaid, non-expert reader judges “poetry”. It’s not even fine tuned on what kinds of poetry most people enjoy. It’s fine tuned on a set of judgments of “good poetry” from people who would rather be doing something else and just want to get the job over with.
Our general conclusion was that Heuser’s (beautifully designed) method is too focused on the training data and not enough on the fine tuning. In his discussion of why the instruct models are even more inclined to rhyme than raw text completion modes, Heuser raises this point: “For the model, is the poetic form of rhyme, its quatrains and “streams” and “clouds,” its perfect regularity in meter, a kind of formal consequence of a larger aesthetic drive or compulsion… to please?” (page 26). Data annotation and gig workers finetuning LLMs can definitely be analysed historically, materially and from a marxist lens. But I can’t immediately think of a good method for studying how the rhyme and metre of LLM-generated poetry are affected by it.
So what I’m thinking now, is that perhaps normalisation is the best term for discussing the statistical “norming” of what a base model will generate, just from the training data. But when you add in the instruct level and fine tuning, which trains the model on reader expectations, perhaps that is closer to some kind of ideal, and idealisation is a reasonable word.
Here’s my attempt to diagram the difference between these two. It’s almost like the the 1970s/80s transition from close reading of texts to also doing reception studies in literary studies. Credit here goes to the whole group. I took notes on particularly relevant robably comments by Ida Jahr, Marianne Gunderson, Jesper Juul, Rosa Markey, Mick Berland but other people’s comments also informed the discussion and my thinking.

I’m still not sure that idealisation is exactly the right word for what LLMs do. But paying attention to the shift from normalising training data to predicting what people will like is very interesting, and something I’ll keep thinking about.
Another point is that even if you’re only looking at training data, the training data is mostly not literary poems. Heuser’s experiment sets up historical published poems as the proxy for “poems” in the training data. A methodological problem here, as the fan fiction and algorithmic folklore scholar Marianne Gunderson pointed out, is that this ignores the masses of unpublished amateur poetry posted on the internet that might make up more of the training data than historical poems. Heuser does briefly address this (on page 16-17) by noting that only 0.16% of the training data is from Project Gutenberg, while 84% is scraped from the internet, but he argues that this would lead to there being less rhymes in the training data, and thus make it more likely that generated poems would not rhyme. Marianne’s point is that there might be a lot of self-described “poems” in this uncurated mass of data, and that amateur poems might rhyme a lot more than literary, published poems.
Many other things were discussed in yesterday’s meeting. For example, Laura and Mick talked about how to combine Jameson’s historicizing with Donna Haraway’s implosion method. Ida noted that she recently started rereading Jameson and hadn’t previously realised he has written about sci-fi. Zahra mentioned Ngai’s ideas about cuteness as also relevant here. Oh, and I think there might be a freshly established mini reading group on phenomenology at CDN now. I loved the cross-disciplinarity of this group – we even had two mathematicians which I hadn’t expected but was very happy about. Thank you to everyone – it is so inspiring to discuss theory with people who also want to discuss it!
If you’d like to join our next discussion, you can sign up for the mailing list where we’ll send out notifications about events. The signup page has the schedule too. Our next meeting is next Tuesday (17 March) at noon Bergen time, in the glass house at the CDN, with an online option. We’ll be discussing this paper:
Gunkel, David J. 2025. “The Différance Engine: Large Language Models and Poststructuralism.” AI & Society, September 25. https://doi.org/10.1007/s00146-025-02640-z
Related
Discover more from Jill Walker Rettberg
Subscribe to get the latest posts sent to your email.