Have you tried playing with the mini version of DALL-E yet? It’s fun! What DALL-E does is generate wonderful images from written prompts, using a neural network trained on images scraped from the internet that have English language captions attached to them.

Two of the points that have interested me the most about this are 1) bias and 2) the idea that DALL-E is independently creating these images.

Bias is pretty well addressed both in the blog post and paper published about DALL-E in January 2021, the risks and limitations documentation of DALL-E 2 on GitHub, and in the “DALL-E Mini Model Card” that is published with the mini model that has become super-popular since being made available to the public (a model card is a short-form documentation of a machine learning model that improves transparency about the training dataset, methods, evaluations and biases of a model, first proposed in a paper by a team of Google researchers including Timnit Gebru who was later fired…)

DALL-E has some clear biases because it’s trained on images on the internet. We also know that image searches on search engines have racial bias and gender bias so it’s no great surprise that DALL-E, trained on the same images and captions as search engines, would be biased too. Here, for instance, is DALL-E mini’s idea of “a professor finishing a book about machine vision,” which is a topic constantly on my mind these days since that professor is me.

Unfortunately DALL-E does not think a professor looks like me. I also love how the professors are all writing their books about machine vision by hand, not using laptops. Some of the versions I tried actually had these bizarre laptop-books the professors were holding. Lovely.

DALL-E’s agency, that is, its perceived ability to act independently, seems to be less critically addressed than the bias. The cover of Cosmopolitan, above, is a fabulous example. “Meet the world’s first artificially intelligent magazine cover,” it says, and in smaller writing at the bottom, “and it only took 20 seconds to make.”

“Only 20 seconds to make” is extremely misleading. Here is a TikTok video posted by the designer of the cover page, Karen X. Cheng, showing just how much work it took to find a prompt that would generate just the right image.

@karenxcheng

Creating the first ever artificially intelligent magazine cover for @Cosmopolitan ! Using #dalle2 to generate the art

? original sound – Karen X

The prompt that eventually got DALL-E to produce the final image?

A wide angle shot from below of a female astronaut with an athletic feminine body walking with swagger towards camera on mars in an infinite university, synthwave digital art

The human is very much in the loop!

I’m sure it’s not coincidental that Cosmopolitan uses an AI-generated cover image, framing it as though DALL-E did it on its own, just a couple of weeks after all the fuss about the Google engineer who was suspended after declaring that an artificial intelligence language model known as LaMDA had become sentient. I mean, honestly, it’s kind of awesome that Cosmopolitan has a special issue on AI. I wouldn’t have expected that ten years ago.

In the Database of Machine Vision in Art, Games and Narratives we used “machine vision situations” as a unit of analysis where we identified agents in each situation and applied the same analytic model to all so we could analyse the big picture across many different situations. For instance, the drones checking Janelle Monáe and her friend’s identity in Dirty Computer is a situation. Here is our mini-analysis of it, and you can watch the video on YouTube – the scene starts about 4:30 minutes in. Our goal was to avoid the binary idea that humans simply use technology as a tool, or that technology controls us somehow, and design a model that while being reductive (any data analysis is going to be reductive) allows us to see assemblages where agency is shared between humans and nonhumans.

In the book I’m trying to finish, I use the idea of a machine vision situation to analyse real life events as well. It’s possible to describe the situation of Cosmopolitan‘s cover in many ways, but here’s one: Cosmopolitan’s designer spends hours trying out different prompts with DALL-E, until she finds one she likes, which she uses for the cover. DALL-E generates images based on the words in the designer’s prompts and the connections it has made between different words and different elements in images. The editors of Cosmo present the cover image as though DALL-E is almost sentient, almost magical, and as though this is somehow liberating the women who read the magazine. Are we supposed to imagine ourselves as astronauts with athletic bodies on Mars, or to imagine DALL-E as like that astronaut?

2 thoughts on “DALL-E and human-AI assemblages

  1. Sunday | Morgan's Log

    […] DALL-E and human-AI assemblages – jill/txt […]

  2. […] Jill/txt: DALL-E and human-AI assemblages […]

Leave A Comment

Recommended Posts

Triple book talk: Watch James Dobson, Jussi Parikka and me discuss our 2023 books

Thanks to everyone who came to the triple book talk of three recent books on machine vision by James Dobson, Jussi Parikka and me, and thanks for excellent questions. Several people have emailed to asked if we recorded it, and yes we did! Here you go! James and Jussi’s books […]

Image on a black background of a human hand holding a graphic showing the word AI with a blue circuit board pattern inside surrounded by blurred blue and yellow dots and a concentric circular blue design.
AI and algorithmic culture Machine Vision

Four visual registers for imaginaries of machine vision

I’m thrilled to announce another publication from our European Research Council (ERC)-funded research project on Machine Vision: Gabriele de Setaand Anya Shchetvina‘s paper analysing how Chinese AI companies visually present machine vision technologies. They find that the Chinese machine vision imaginary is global, blue and competitive.  De Seta, Gabriele, and Anya Shchetvina. “Imagining Machine […]

Do people flock to talks about ChatGPT because they are scared?

Whenever I give talks about ChatGPT and LLMs, whether to ninth graders, businesses or journalists, I meet people who are hungry for information, who really want to understand this new technology. I’ve interpreted this as interest and a need to understand – but yesterday, Eirik Solheim said that every time […]