andrew burke





Dall-E Prompts Are "Woke" - And I'm Okay With That

Posted on: 2024-02-12

I've recently been exploring OpenAI's API services as part of my ongoing Bloomsday experiments. I've found with the right prompts I can generate some pretty interesting text, especially with the newer more powerful GPT-4. It may not replace James Joyce, one of the greatest prose stylists in the English language, but it does a pretty decent pastiche of some of his more abstract experimental stuff.

After the fun I had with my nephew over the holidays I was especially interested in seeing what I could do with the updated Dall-E 3 image generator. While there are already a number of high quality and out-of-copyright authentic pictures of Dublin from the early 1900s I thought Dall-E might be able to do some interesting things for the surreal Circe episode. So I sent it the prompt:

A vintage photograph of a street scene in Dublin Ireland in June 1904, with a crowd watching a big parade of notables marching down the street

The API returns not only a generated image and some metadata, but interestingly also a "revised_prompt" value. As the API states:

With the release of DALLĀ·E 3, the model now takes in the default prompt provided and automatically re-write it for safety reasons, and to add more detail (more detailed prompts generally result in higher quality images).

I noticed evidence of this when using the ChatGPT app with my nephew, where it would redescribe what I had asked it with more details. It's an interesting approach, to use the LLM text model to adjust the user's prompt so it works better with the image generator. But also, as they note, rewriting the prompt also has "safety" reasons. Here's what it did with my original prompt:

A vintage photograph capturing a bustling street scene in Dublin, Ireland, taken in June 1904. A diverse crowd of Caucasian and Black men, women and children in period attire watches intently as a big parade marches down the cobblestone street. The parade consists of figures of prominence in the society of that time, striding with proud expressions proud expressions and finely tailored suits. The buildings lining the street exhibit the architecture of the era, with their intricate facades and capacious windows. Tinges of sepia tones further lend this image a distinctly nostalgic mood.

It's a much more detailed prompt and provides extra context to improve the image. But I found it especially interesting that it specifically asked or "a diverse crowd of Caucasian and Black men, women and children". This seems likely to be OpenAI actively trying to counteract the well documented racial biases inherent in AI image generators, and of course it's the kind of thing that right-wing culture warriors would pounce on as being "woke".

Interestingly the resulting image didn't seem to reflect the "diverse crowd" as prompted. Everybody seems pretty white (at least the people whose faces are visible):

AI generated image of a parade in Dublin around 1904

Given the historical and cultural context of 1904 Dublin, the image reflects the lack of skin-colour diversity that would likely be found there at the time. In fact in "Ulysses" the only Black people I've seen mentioned are white music-hall "minstrel" performers in blackface - so maybe it's just as well.

Don't even get me started on how problematic the term "Caucasian" is too!

However, the faces in the image do in fact reflect a great deal of diversity: the diversity of hideous eyeless nightmare mutants:

Closeup of AI generate image of a parade in Dublin with poorly featured faces

Dall-E may be the next generation in AI image generation, but, especially in large crowd scenes with lots of details and a paucity of source material like this, it still has trouble with faces - regardless of how "woke" the prompt re-engineering might be.

Previous: A Too-Brief Visit From Gretchen
Next: Cory Doctorow on Blogging