4. Discussion and Conclusion



Prompting as thinking-with

 

How do we think-with?

To generate an image, a text-to-image model such as DALL•E first inputs the user’s prompt into a text encoder that maps the prompt into a ‘text embedding’, a way of representing the text as a vector within model space. Then, another encoder, the ‘prior’, maps the text embedding to possible CLIP image embeddings. As mentioned earlier, CLIP is used to model associations between text and image data. An image decoder then generates the image, conditioned by the CLIP image embedding and text embedding (Ramesh et al. 2022).1 Anna Munster and Adrian Mackenzie (2019) describe the work of models with images as ‘invisual’, to apply this term to text-to-image models such as DALL•E; they are trained using digital images and in the end generate an image, but as processes they are not of the visual (unlike collage). We did not expect the model to be able to generate an accurate depiction of the dwarf emu and its experience; rather, we were curious as to what absence in the archive means in visualising an extinct species and specifically how that absence is ‘figured’ in a machine-learning-mediated approach. As noted by Offert, generative image models cannot reconstruct features that are not part of their training dataset, yet despite this they always generate an output (2021, 9–10). As we have observed in our experiments, despite this limitation, a generative model always makes an image, and in the generated dwarf emus we see how the models ‘invisually’ draw upon other features that we might imagine as being close to the emu in model space (peacock, ostrich, etc) to visualise a response to our prompts.


To think-with these image-generating processes means to take an approach to prompting that goes beyond generating an outcome or an answer. Instead, we consider the generative activity of models to be a process of collectively creating boundaries, enabling some reflection on how those boundaries are being formed. Here, drawing on Haraway’s thinking, what we come to understand as the boundaries of subjects and objects, for example how the emu and its experience are visualised, are made through generative practice; they do not exist as ‘fact’. This allows us to use the process not to ‘fill in the gaps’ about the dwarf emu and its experience nor to make explainable prompt-based image generation, but to think-with this process about species loss, archival absence and image-generation processes and their outputs (and more).

 

Thinking about absence in the archive

  

One aim of the project is to visualise the endling dwarf emus in a compelling way, to draw out the story for others – lest we forget. A generative act of contributing to archives of loss, in ways that acknowledge the fragmentation and incompleteness of the historical account. No single image, or document, or narrative can achieve this. Rather than ‘filling the gap’ in the historical archive, the collection of digital, paper and DALL•E images and our initial critique of them form a new part of this historical account, sitting in constellation with existing documents, texts, artworks and works yet to come. This process provides a thicker description rather than a completion. And it acknowledges that documentation and archiving are always only a situated and partial account of history, and once extinct, a species is irretrievable. For us, this opened conversations about broader ecological entanglements and what species loss means more broadly (see Rose et al. 2017; Rissanen and Sadokierski 2022).

 

A creative work that similarly deals with this is Sofia Crespo’s Critically Extant (2022). Crespo used only publicly available data to train a generative model to generate short, animated clips of critically endangered animals and plants. The resulting moving images are often vaguely suggestive of a species unable to give an ‘adequate’ image.2

 

Thinking about the models as image-making processes

 

To consider the potential implications of the prompt-generated images, we return to Lesueur’s tableau of the ‘Kangaroo Island emu’ family. Over time, experts have revealed it is a multispecies mashup with inaccurate representation of the species’ behaviours; as a historical record this image diminishes the ‘lifeway’ of these lost species. Therefore, it matters that increasingly we cannot ‘see’ the remediation of prompt-generated images. If such images are presented as ‘authentic’, either as a representation made at a particular time (the photographic example) or as able to fill in an archival ‘gap’, then these images become problematic.


The broad impact of images and image-making is apparent in W. J. T. Mitchell’s ‘counter-theses’, in particular that ‘visual culture is the visual construction of the social, not just the social construction of vision’ (2005, 343). In other words, the visual is not just made, it in turn also makes or conditions the social. As AI-generated images begin to proliferate in our social spheres, it is important for those with expertise in how images work to critique the practice of image-making with generative models.


Limitations imposed by the prompts

 

If ‘Kangaroo Island emu’ had been the prompt rather than ‘King Island’, would we have seen more marsupial/kangaroo hybrids? Is the ‘king’ in ‘King Island emu’ contributing to the crowns and other markers of royalty? Or that in some of the images the emu is larger than the human figures? Would the results have differed significantly if we had used ‘dwarf emu’ as a prompt? The prompt-prodding could go on endlessly, but we decided to halt here to keep the scope manageable.

 

Limitations of the models used

 

We only used large-scale, publicly accessible models in these experiments, which does not account for what could be produced with a more customised dataset. This is an avenue that would be worth pursuing in a future iteration of this project. It could be argued that this direction would also be more in line with the aims of an anarchival practice, as producing our own datasets and training our own models would open up further ways to explore how a history, and moreover one with sparse documentation, might be visualised or imagined via processes of machine learning. However, for this project, our use of large-scale models aimed to explore how little-documented species could be visualised via models now commonly accessed by people to make images, and further to, in some cases, make images that speculate or ‘imagine’ images that do not exist. We prompted models not to create a sufficient or accurate image of the dwarf emu but to explore how large-scale models deal with absence in their datasets.


Thinking-with acknowledges thinking as a collective activity; in the context of working with AI, this involves treating it as more than an abstract phenomenon and emphasising the materials, labour and energetic resources involved, as well as its ecological impact. Generating a single image using a text-to-image model, at the image-generation moment alone, can take as much energy as charging a smartphone halfway (Luccioni, Jernite and Strubell 2024: 6). Part of our discussion not yet reported here also involved thinking about the ecological implications of working with generative models and the complications of this in a project that deals with species loss in the time of the sixth extinction. For future work, thinking-with text-to-image models should involve rethinking and improving how we engage with generative AI models in acknowledgement and response to their material, social and environmental entanglements at a time of overlapping ecological crises.

 

 

Conclusion

 

This paper reports on a creative research collaboration in which text-to-image prompts are used as a way to think across two key ideas: the relationships between collage practices and prompt-based image generation, and taking an ‘anarchival’ approach to addressing absence in historical archives through creative practice.

 

In the first section, we describe both collage and text-to-image models as modes of image-making that create images with images. We distinguish between a formal and an anarchival approach to both modes of creative production and the critique of the images they produce. In doing so, we are not suggesting a hierarchy where one is valued over the other; they are simply different creative and critical approaches for different communicative intents. Drawing on scholarship from Foster and others, and critiquing examples from contemporary art and design practice, we define ‘anarchival collage’ as a practice in which a collage maker takes a critical lens to archival materials to question or trouble what is contained within, and absent from, archives. We then question how prompt-based text-to-image models are similar or different to collage, and what they might add to an anarchival practice. Reflecting that much critique and celebration of imagery generated through these models is formalist in approach, we argue that such critique misses the more important work of critiquing how these images operate within and shape social, cultural and political spheres, and we recognise a need for creative practice research that builds case studies to think-with.

In the second section of the paper, we report on creative practice experiments that include prompting large-scale image-generation models to think-with and speculate rather than to merely generate. All the experiments involve attempts to visualise the strange story of the extinct King Island dwarf emu, a species with scarce, fragmented and sometimes mistaken historical visual documentation.

In the first experiment, the value of working with DALL•E for Sadokierski was in ‘unblocking’ her collage-making practice; the prompt-to-image model generated images that helped her find a way of ‘hybridising’ the figures of Josephine and the emu, to draw out the surreal aspects of the story. In working with the model, she also realised the importance of seeking out Monin’s expertise to help understand how the model was operating, in order to use it in a more thoughtful and ethical way.

 

The second experiment was devised to speculate about what types of images, and features of images, the model may have been trained on. A ‘features by proxy’ approach, using reverse image search, was conceived as a way around not having access to the training datasets of DALL•E. Analysis of the results of reverse image searches with generated images allowed us to think about how a selection of the generated images reference features of other media (oil painting, photography) but also how models have specific ways of creating images according to how they work with image data.


The third experiment was a comparative analysis across a range of other models, using the original prompts to generate sets of images we could compare. A key insight from this experiment is that each model and each version of each model needs to be approached as its own space of image production, which is particularly relevant for an anarchival practice which seeks to critique the archive, or model, it draws from.


The suite of experiments surface issues and opportunities afforded by these models, by critiquing the processes of using them. The result of the experiments is a ‘thicker description’ of the story of the dwarf emu, as well as a thicker description of how image-making processes – both collage and text-to-image models – can function as part of anarchival practices.

At a time of both deliberate and accidental miscommunication, it is important for those with expertise in how images ‘work’ to critique and analyse image-generating tools. This case study highlights the value of actively seeking collaborations between those with expertise in the complexity of creating images that carry with them aesthetic, cultural, political and other dimensions, and that go on to do things in the world.

 

> References

Discussion and Conclusion

2. See also the work of:

Alexandra Daisy Ginsberg, in particular The Substitute,
www.daisyginsberg.com

Emma Lindsay’s paintings of extinct species from museum specimens,

https://emmalindsayartist.wordpress.com

and Timo Rissanen and Zoë Sadokierski’s Precarious Birds project, https://precariousbirds.net

1. This is a high-level description of text-to-image generation for Dall•E 2 as described by Ramesh et al. 2022. Other text-to-image models may vary.

Paper collage from black and white archival photographs of emu bones, a map and coloured stickers on a coral pink background

ABOVE: Paper collage from archival photographs of emu bones, a map and coloured stickers. 

Zoë Sadokierski, 2024.