Prompting as Thinking-With: Using Generative AI to Visualise an Extinct Dwarf Emu

Monica Monin

This paper discusses a creative collaboration between two design researchers using text-to-image prompts as a way to think across a range of ideas including the relationships between collage practices and AI image generation – both modes of image-making that create images with images – as well as taking an ‘anarchival’ approach to addressing absence in historical archives. Initial experimentation with prompt-based model DALL-E 2 involved writing multiple prompts to generate images of the extinct King Island dwarf emu; specifically, an emu taken to live in Empress Josephine’s estate outside Paris. There is little visual record of the dwarf emus, and what remains is ambiguous and factually inaccurate. The scarcity of visual reference material provides an interesting case study for how a generative image model might attempt to elaborate a new image about a historical event. The results provide material to help think about how image generation models work, and also how we might visualise the experience of an extinct species. Reflecting on the initial experiments, we began to consider prompting with large-scale image generation models as a way to think-with and speculate, rather than to merely generate. We employ two methods to critique the resulting images: visual content analysis and comparative analysis across image-generation models. We conclude that at a time of both deliberate and accidental miscommunication, it is important for those with expertise in how images ‘work’ to critique and analyse image-generating tools, and consider how working with generative AI might be included as part of an anarchival practice.

‘Portrait of a King Island emu in a garden in Paris in 1804 with Empress Josephine and Napoleon in Decolonial style’. DALL•E 2, 2023.

DALL-E 2

(Me vs DALL-E)

Midjourney

Firefly

OpenArt

‘Oil painting of Empress Josephine with an emu baby in 1804’. DALL•E 2, 2023.

‘Photographic portrait of Empress Josephine with an emu in 1804’. DALL•E 2, 2023.

An albumen style image created by Dalle-2 with a seated woman holding what looks like a bird

DALL•E 2 generated image in the style of an oil painting, a woman in a blue dress holds a baby with a skull head in one hand, with a bird next to her

Equally surreal, the image of a human figure curiously merged with that of an emu, generated in response to the prompt ‘Portrait of Empress Josephine and Napoleon Bonaparte Château de Malmaison in Paris with an emu’, is associated with paintings of human forms hybridised with birds such as a duck, a flamingo and a nightingale. Other images in this search depict collectives of birds or birds and humans together, such as ducks accosting cherubic toddlers or ostriches chasing fair ladies. Images generated by Midjourney in response to the prompt ‘person crying whilst riding a giant duck’ also appear in this search.

The style of painting in this portrait and that of ‘Portrait of a King Island emu in a garden in Paris in 1804 with Empress Josephine and Napoleon in Decolonial style’ also reference portraiture in the ‘grand manner’ or ‘great style’ with classical architecture and natural elements in the backgrounds of the generated images. The later image’s search results returned both painted and engraved illustrations of bird species, depicting birds such as ostriches, cassowaries, emus and peacocks.

This method of ‘features by proxy’ allowed us to think about how these generated images reference features and formal compositions of other media (oil painting, photography) but also have specific ways of creating images according to how models work with image data.

Experiment 2:

Visual Content Analysis with ‘Features by proxy’

This experiment was driven by the question:

In order to make these speculative images with a generative model, what types of images
and features of images in the model’s training dataset inform the generated image?

Some of the generated images seemed familiar to us, whether by their medium, visual language or visual schemes, or by a combination of these. However, since we do not know and have no access to any of DALL•E’s training datasets, we used a ‘by proxy’ approach to explore this sense of familiarity: conducting reverse image searches of the DALL•E images through the Google search platform.

Our naming of this technique as ‘features by proxy’ and of investigating models ‘by proxy’ draws upon the work of Fabian Offert (2023), who suggests two ways to study how ‘foundation’ models conceptualise history, specifically focusing on CLIP (Contrastive Language-Image Pre-Training; Radford et al. 2021). Trained on image-text pairs, CLIP can be used to associate text with images. First, Offert suggests ‘attribution by proxy’ as a way to explore CLIP by using it to search within datasets outside its own training data. Second, Offert suggests ‘generative attribution’, which relies on the use of CLIP in training models such as DALL•E 2 and Stable Diffusion. Offert states that the way DALL•E renders historical periods as specific media reveals ‘a strong default in models like DALL•E that conjoins historical periods and historical media’ (129). These approaches offer ways to investigate the patterns and biases in AI models’ behaviour even when their training data is not accessible.

Although it does not find actual images of the model’s training dataset, our ‘features by proxy’ technique allows us to analyse a generated image through its repetition of media, visual language and visual schemes of broader visual culture. In this repetition we experience a ‘strong default’, as noted by Offert, to visualise historical eras as specific media. This is evident in media-specific lenses such as the warm tones of early photography or portrait painting in the ‘grand manner’, as discussed in the below critique of this method, as applied to four images from the original experiment with DALL•E 2.

Introduction

We quickly realised two things. First, this is not a ‘quick’ experiment; generating this many images across multiple models results in a huge amount of data, with a myriad of connections and relationships to think across. Second, because of the way the interface for different models is set up, it is not possible to input the prompts in the same way. For example, Firefly has ‘art’ or ‘photo’ options, which led to our using ‘art’ for prompts that didn’t include ‘photographic’, although we hadn’t specified this in earlier or other models.

A key insight from this experiment is that each model and each version of each model needs to be approached as its own space of image production. Across the matrix it is possible to observe how Firefly generates contemporary-looking photographic images whereas DALL•E 2 generates photographic images of earlier media forms. Therefore, it is not possible to talk about the models generally in terms of how they generate images. In an anarchival practice, considering how you are working with an archive and taking the specific materials into new meanings and spaces is integral. With generative models, this includes the specific version of the model being used.

We also noticed that all the models draw on several other birds and their features: on the larger emu (Dromaius novaehollandiae) found on mainland Australia, but also on peacocks, swans, ducks and ostriches. This may be due to the lack of visual references to actual dwarf emus, but possibly reflects the dominance of training data from the ‘Global North’.

Is the model struggling to read human and emu as different beings here, proposing a bird head on a human-like body, or is it that the jacket cannot fit onto the bird in a logical way because jackets have arms and birds do not? Closer inspection reveals more surreal elements. In the archway in the right top corner we make out a two-headed emu-like figure, with an emu-like head hovering above it, like a form discernible in a cloud.

Other examples of similar visual indeterminacy between human and bird figures include the set of emu-as-royalty generated from the prompt ‘Oil painting of the extinct King Island emu and Empress Josephine and Napoleon Bonaparte’:

A reverse image search using the generated image ‘Photographic portrait of Empress Josephine with an emu in 1804’ (above) returns mostly studio-based portraits created with albumen emulsion or other early photographic techniques. The background of the generated image is suggestive of a natural scene painted as a studio backdrop, and like other portraits from the reverse image search, the subject is depicted with a prop – in this case, the emu figure. Although albumen and other early photographic methods do not occur until after 1804,² it appears that the closest photographic media by date has been referenced in how Josephine and the emu are depicted by DALL•E 2 as a ‘photographic portrait’.

3. Further Experiments to Think-With

With these four images, and the original set of 17 prompts, we came up with two generative experiments, to open space for thinking-with each other, which we discuss below.

Experiment 3:

Comparative Analysis:
Creating more images to understand images

This experiment responded to the question:
How do other models respond to the same prompts, and what might that reveal?

In this experiment, we took the 17 prompts used with DALL•E and prompted a range of other models with them, including Adobe Firefly, Midjourney and OpenArt.

Scroll right, and click on images below to enlarge.

Remediating other images as other media

Where prompts specified media – for example, ‘oil painting’ or ‘photography’ – familiar styles or even specific artworks shimmered briefly into view before accentuating what was ‘wrong’ in a particular image. We chose one ‘photographic’ and one ‘oil painting’ image that allowed us to think about this in more depth.

Selecting images to think-with

Although there are interesting features in, and discussions to be had about, each of the images generated in the first experiment described above, as a launchpad for our collaboration we limited ourselves to four images to think-with in more depth together. We selected images that best represented our most engaging conversations, as discussed below.

Visual indeterminacy

In contrast to DALL•E Mini’s aesthetic grotesque, Whyman (2022) describes DALL•E 2, the version used in the experiment described here, as a more sophisticated tool capable of producing ‘things much closer to photo-realistic representations of reality’, which he speculates will be less fun than tooling around with the uncanniness of DALL•E Mini.¹ However, we found that uncanniness persists, particularly where humans and other animals fluctuate or merge in unexpected ways, like the emu-Bonaparte:

DALL-E 2 model generated image of Empress Josephine, Napoleon and an emu, in which Napoleon and the emu merge.

Section 1: Creating Images with Images

A King Island Emu in a garden in Paris in 1804

Les Quatre Saisons : L’Automne, Paul Cézanne. Public domain, via Wikimedia Commons.

Section 2: Working With Absence in Historical Archives

↳ Experiment 1: Me vs Dall•E

Australian Animals.

Public domain, via Wikimedia Commons.

Two ostriches, Wenceslas Hollar, 17th Cent. Public domain, via Wikimedia Commons.

André-Adolphe-Eugène Disdéri. Public domain, via Wikimedia Commons.

A King Island emu in a garden in Paris in 1804 with Empress Josephine and Napoleon

Section 3: Further Experiments to Think-With

↳ Experiment 2: Features by Proxy

↳ Experiment 3: Comparative Analysis

Francis Danby, Oberon and Titania, 1873.

Public domain, via Wikimedia Commons.

Moas of Prehistoric New Zealand, Benjamin Waterhouse Hawkins, 1894. Public domain, via WikiCommons.

Militão Augusto de Azevedo, 1880. Public domain, via Wikimedia Commons.

Discussion and Conclusion

Portrait of a King Island emu in a garden in Paris in 1804 with Empress Josephine and Napoleon

Image created by Dall•E 2, from the prompt: ‘Photographic portrait of Empress Josephine with an emu in 1804’.

Dall•E 2-generated image, from the prompt: ‘Oil portrait of Empress Josephine with an emu baby in 1804’.

This attempt at a photographic portrait inspired reflection on a digital image trying to ‘pass’ as a historical albumen photograph, obscuring the remediation that underpins its creation, only to be let down by an inability to render ‘clear’ facial features or hands.

Prompts in which the media were clearly specified but the rest of the prompt was grammatically ambiguous produced the most unsettling images. When it is not clear whether the human or the emu is the ‘baby’, things turn nightmarish.

There were several deeply unsettling depictions of babies in this series; however, a later survey (see Experiment 2) revealed equally uncanny actual oil paintings of both women and infants, and a rabbit warren of sites dedicated to why babies in historical painting look like grumpy old men or creatures from horror films. This opened a conversation about a pre-digital history of generating inaccurate or uncanny images of humans, particularly babies, illuminating the importance of considering AI-generated images in relation to longer histories than digital art.

Jean-Auguste-Dominique Ingres, Mirror Comtesse d'Haussonville [detail], 1845. Public domain, via Wikimedia Commons.

Paintings created between c. 1283 and 1650 with uncanny depictions of infants, anticlockwise from top left:
1. Madonna and Child with Two Angels (Crevole Madonna), Duccio di Buoinsegna c. 1283–1284.
2. Two of the Gyllenstierna children, Johan Assman, 1650.
3. Virgin and Child Surrounded by Angels, Jean Fouquet, 1450s. Right panel of the Melun Diptych. All images in the public domain, via Wikimedia Commons.

2. It was not until 1836 that Nicéphore Piépce was able to ‘fix’ and image to a substrate and years later that photography became more commonplace and used for portraiture.

1. This idea is reiterated in artists using older AI models to achieve this ‘aesthetic grotesque’ that is more difficult to achieve with newer models.

Portrait of a King Island emu in a garden in Paris in 1804 with Empress Josephine and Napoleon in Decolonial style