DALL-E and human-AI assemblages
Have you tried playing with the mini version of DALL-E yet? It’s fun! What DALL-E does is generate wonderful images from written prompts, using a neural network trained on images scraped from the internet that have English language captions attached to them.
Two of the points that have interested me the most about this are 1) bias and 2) the idea that DALL-E is independently creating these images.
Bias is pretty well addressed both in the blog post and paper published about DALL-E in January 2021, the risks and limitations documentation of DALL-E 2 on GitHub, and in the “DALL-E Mini Model Card” that is published with the mini model that has become super-popular since being made available to the public (a model card is a short-form documentation of a machine learning model that improves transparency about the training dataset, methods, evaluations and biases of a model, first proposed in a paper by a team of Google researchers including Timnit Gebru who was later fired…)
DALL-E has some clear biases because it’s trained on images on the internet. We also know that image searches on search engines have racial bias and gender bias so it’s no great surprise that DALL-E, trained on the same images and captions as search engines, would be biased too. Here, for instance, is DALL-E mini’s idea of “a professor finishing a book about machine vision,” which is a topic constantly on my mind these days since that professor is me.
Unfortunately DALL-E does not think a professor looks like me. I also love how the professors are all writing their books about machine vision by hand, not using laptops. Some of the versions I tried actually had these bizarre laptop-books the professors were holding. Lovely.
DALL-E’s agency, that is, its perceived ability to act independently, seems to be less critically addressed than the bias. The cover of Cosmopolitan, above, is a fabulous example. “Meet the world’s first artificially intelligent magazine cover,” it says, and in smaller writing at the bottom, “and it only took 20 seconds to make.”
“Only 20 seconds to make” is extremely misleading. Here is a TikTok video posted by the designer of the cover page, Karen X. Cheng, showing just how much work it took to find a prompt that would generate just the right image.
The prompt that eventually got DALL-E to produce the final image?
A wide angle shot from below of a female astronaut with an athletic feminine body walking with swagger towards camera on mars in an infinite university, synthwave digital art
The human is very much in the loop!
I’m sure it’s not coincidental that Cosmopolitan uses an AI-generated cover image, framing it as though DALL-E did it on its own, just a couple of weeks after all the fuss about the Google engineer who was suspended after declaring that an artificial intelligence language model known as LaMDA had become sentient. I mean, honestly, it’s kind of awesome that Cosmopolitan has a special issue on AI. I wouldn’t have expected that ten years ago.
In the Database of Machine Vision in Art, Games and Narratives we used “machine vision situations” as a unit of analysis where we identified agents in each situation and applied the same analytic model to all so we could analyse the big picture across many different situations. For instance, the drones checking Janelle Monáe and her friend’s identity in Dirty Computer is a situation. Here is our mini-analysis of it, and you can watch the video on YouTube – the scene starts about 4:30 minutes in. Our goal was to avoid the binary idea that humans simply use technology as a tool, or that technology controls us somehow, and design a model that while being reductive (any data analysis is going to be reductive) allows us to see assemblages where agency is shared between humans and nonhumans.
In the book I’m trying to finish, I use the idea of a machine vision situation to analyse real life events as well. It’s possible to describe the situation of Cosmopolitan‘s cover in many ways, but here’s one: Cosmopolitan’s designer spends hours trying out different prompts with DALL-E, until she finds one she likes, which she uses for the cover. DALL-E generates images based on the words in the designer’s prompts and the connections it has made between different words and different elements in images. The editors of Cosmo present the cover image as though DALL-E is almost sentient, almost magical, and as though this is somehow liberating the women who read the magazine. Are we supposed to imagine ourselves as astronauts with athletic bodies on Mars, or to imagine DALL-E as like that astronaut?