That’s almost the title of a paper I read today, Do Multilingual LLMs Think In English?, which uses several methods to poke into what a language model actually does when responding to a prompt in a language other than English. Spoiler: it looks as though it goes via English even if it’s trained on many languages, because there is more English training data. They have a couple of really nice examples that help explain how this works that I’ll show you below.
Multilingual models are trained on data in many different languages, and since every token is “translated” into a vector (a list of numbers or coordinates that locate the concept/token in relation to each other in a big multidimensional imaginary space) the model itself doesn’t exactly have a language. Multilingual models can for instance recognise that a news story titled “United Kingdom buries Queen Elizabeth II after state funeral” and one titled ”?????????????????????????????96?” (translated: Her Majesty Queen Elizabeth II of the United Kingdom of Great Britain and Northern Ireland dies at 96)”) are about the same event (the example is from here). But as you can see the meaning isn’t actually precisely the same.
It seems likely that if most of the training data is in English, then the relationships between different concepts might be culturally closer to the way concepts are thought about in English-speaking countries. We also know that LLMs are worse at smaller languages, even languages spoken by a lot of people. However, there are also people who argue that multilingual LLMs might be “language agnostic”.
The preprint (which not yet peer-reviewed as far as I know) paper titled “Do Multilingual LLMs Think In English?” found several types of evidence that models are in fact going “via” English even when prompted in other languages and generating responses in other languages.
The authors are Lisa Schut, a PhD student in machine learning at Oxford, her supervisor Yarin Gal, and Sebastian Farquhar, who is a senior research scientist at Google Deepmind working on “reducing the expected harm of catastrophically bad outcomes from AGI”. Farquhar also works with Gal and Schut at Oxford Applied and Theoretical Machine Learning Group.
Schut and her co-authors used three different techniques to see whether a set of LLMs did indeed prioritise English. Here is a diagram showing the results of a logit lens for a model asked to complete the sentence Le bateau naviguait en douceur sur l’, which is French for “The boat sailed smoothly on the calm of the”. A direct translation doesn’t really work in English grammar. The logit lens is a way of seeing the intermediate steps the model takes to generate the response. In this figure, each row is one set of tokens the model generated on its way to the final output. You can see the first iteration shown is mostly English words. Associations you can sort of see makes sense – if we’re taking about boats floating then concepts like river, water, lake, weather, sol (sun) seem related. Take a look at each line of words to see how the model moves associatively from one concept to the next.

Beyond the English words for lake, river, etc, notice the place names are from the USA. Place names often sort of attract lots of connections in LLMs because they are often close to interesting description. It’s obvious that Ontario might be associated with lakes because “Lake Ontario” would presumably count as two tokens that are very often next to each other in the training data. Westbrook is a mid-sized coastal town in Connecticut which is often mentioned in connection with boating, marinas and so on – there are lots of websites aimed at tourists about this. Now a bigger city like New York might have as many associations with boating and marinas but would also be connected with a lot of other topics. So Westbrook is more closely associated with marine themes than New York would be. Here are some screenshots of websites that I am guessing are in the training data:




Also, isn’t it interesting that some of the tokens are censored?
Now these intermediate layers are “fuzzy”: the LLM hasn’t actually settled on a token yet. We’re also not seeing all the iterations. Maybe “Westbrook” was a semi-random token, there for a moment and gone the next. Clearly it’s not part of the final output. But it’s part of the process.
Here is Figure 3 in their paper, which shows a similar logit lens process for a Dutch phrase. Here too there are many English words before the model ends up with the response shown in the bottom row.

Lisa Schut, Yarin Gal and Sebastian Farquhar write that the logit lens analysis shows that the model is routing lexical words like nouns and verbs through English:
In general, lexical words – nouns and verbs – are often chosen in English. These parts of speech influence the semantic meaning of the sentence. Other parts of speech, such as adpositions, determiners and compositional conjugates are infrequently routed through English in Aya-23-35B and Llama-3.1-70B.
The rest of the paper also discusses two other methods they used to test whether LLMs “think” in English, vector steering and “causal tracing to determine whether facts in different languages are encoded in the same part of the model”. Neither of these are explained in as much depth though so I’ll leave them for now – but they conclude that while, facts encoded in different languages do seem to share a common representation (which would suggest the model is language agnostic), they also find that “model output is most frequently in English, further underlining the English-centric bias of the latent space.”
A section of the paper offers an overview of current research on multilingual LLMs, so might be useful if you, like me, find this interesting. For us humanities people, I also found this point interesting: they write that you can research a model from an internal perspective, so for instance using things like the logit lens or other techniques for figuring out what the model is doing to generate a response, or an external perspective, which means analysing the output. The external perspective is what media scholars and ethnographers and literary scholars and so on tend to do, and what Hermann Wigers and I did in our paper analysing AI-generated stories from different countries, because we’re good at analysing texts and images. Anyway, Schut, Gal and Farquhar write:
Having a unifying theory that combines both perspectives is important – the internal perspective helps us understand the mechanisms underlying behavior, while the external perspective examines the real-world impact of that behavior.”
So that’s a call for us to figure out how to work together.
References
Rettberg, Jill Walker, and Hermann Wigers. “AI-Generated Stories Favour Stability over Change: Homogeneity and Cultural Stereotyping in Narratives Generated by Gpt-4o-Mini.” Open Research Europe, vol. 5, no. 202, 2025, p. [version 1; peer review: awaiting peer review], https://doi.org/10.12688/openreseurope.20576.1.
Schut, Lisa, et al. “Do Multilingual LLMs Think In English?” arXiv:2502.15603, arXiv, 21 Feb. 2025. arXiv.org, https://doi.org/10.48550/arXiv.2502.15603.
Related
Discover more from Jill Walker Rettberg
Subscribe to get the latest posts sent to your email.