Prompting as Thinking-With: Using Generative AI to Visualise an Extinct Dwarf Emu

Monica Monin

This paper discusses a creative collaboration between two design researchers using text-to-image prompts as a way to think across a range of ideas including the relationships between collage practices and AI image generation – both modes of image-making that create images with images – as well as taking an ‘anarchival’ approach to addressing absence in historical archives. Initial experimentation with prompt-based model DALL-E 2 involved writing multiple prompts to generate images of the extinct King Island dwarf emu; specifically, an emu taken to live in Empress Josephine’s estate outside Paris. There is little visual record of the dwarf emus, and what remains is ambiguous and factually inaccurate. The scarcity of visual reference material provides an interesting case study for how a generative image model might attempt to elaborate a new image about a historical event. The results provide material to help think about how image generation models work, and also how we might visualise the experience of an extinct species. Reflecting on the initial experiments, we began to consider prompting with large-scale image generation models as a way to think-with and speculate, rather than to merely generate. We employ two methods to critique the resulting images: visual content analysis and comparative analysis across image-generation models. We conclude that at a time of both deliberate and accidental miscommunication, it is important for those with expertise in how images ‘work’ to critique and analyse image-generating tools, and consider how working with generative AI might be included as part of an anarchival practice.

Internal spreads and front cover from the book Natural Things in Early Modern Worlds, as well as a diagram that shows the source material for one of the collages.

Introduction

The book’s editors write about what the collages do, from the perspective of historians:

These visual experiments are intentionally recombinatory and reliant on digital collage to pull visual sources out of place and create new meanings via their displacement. This plays on the tradition of early modern naturalists using images to take natural things out of their original places. [...] Through visual experiments we explore the human gaze as a means of alienating nature from its environment, a thing from its representation. (Cooley et al. 2023, 3)

More broadly, anarchival practices do not consider materials in archives as objective documentation; rather, they recognise the potential for archival materials to be reactivated, and through this reactivation to generate new meanings. In ‘An Archival Impulse’, Hal Foster (2004) suggests ‘anarchival’ practices that operate outside of the common archival imaginary of preservation are ‘concerned less with absolute origins than with obscure traces’ (5).³ In an anarchival practice, 'artists are often drawn to unfulfilled beginnings or incomplete projects – in art and history alike – that might offer points of departure again’ (5). Likewise Carine Zaayman describes anarchival approaches to colonial archives as seeking ‘both to tell different stories and to tell stories differently.’ (Zaayman 2023, 5–6) This notion of embracing the incomplete and unfulfilled aspects of archives to create something different makes the loose assembling of fragments inherent to collage an ideal tool for an anarchival practice.

As an activity that also works with and re-mediates vast ‘archives’, predominantly scraped without permission or acknowledgement from the internet, what then does artificial intelligence – specifically in this paper prompt-based text-to-image models – change or add when incorporated into anarchival practices? How is it similar and different to collage within an anarchival practice? And further, as asked by Fabian Offert (2023), what does contemporary ‘artificial intelligence “add” to an already (re-)mediated past?’ (123).

Formalism and Aesthetics in Generative AI

Similar to collage that is driven by formal experimentation, in many practices of generative AI image-making there is also an emphasis on the formal qualities of an image: its visual appeal and ‘interestingness’.⁴ Many examples of creative AI, Joanna Zylinska (2020) states, understand art ‘in terms of structure and pattern, with subsequent diversions from the established code and canon being treated as creative interventions’ (49). As discussed above, a formalist approach elevates an image’s visual characteristics above other aspects, such as the potential social, cultural and political significance of the image. Generative AI’s attention to the visual characteristics of images can be attributed both to how generative models work and to how image-making as a qualitative process has been understood and transformed into a machine learning process (Kang 2023). Trained on mass datasets of image-text pairs, text-to-image models create probabilistic distributions between text and visual features in a dataset and then use these distributions in the generation of new images in response to a text prompt. In this, the work of image-making by generative models focuses on the visual features of digital images, albeit through the specific perceptual topologies or ‘ways of seeing’ and image-making of machine learning models (Offert and Bell 2020, 2).⁵ Features in machine learning are distinguishing characteristics of data: for example, in digital images features such as edges or textures can be distinguished and modelled (Wasielewski 2023, 4). This is not to say that in the broader context of creative practice, the socio-cultural dimensions of image-making cannot be activated in a practice with generative machine learning; here we focus only on the activity of the model itself. As we discuss below in relation to how the models we engaged with visualise an ‘emu’ with features of other bird species, this specificity of how models ‘observe’ images in training and go on to generate images lends a distinct way of dealing with absence in archives or the dataset.

Beyond distinguishing features, creating machine learning (ML) models, as outlined by Edward B Kang (2023), is a practice of taking a qualitative ‘problem’ or process and actively deciding how it should be ‘conceptualized, translated, and formalized into the operational framework of ML as a ground truth’ (2). For prompt-based image generation, establishing a ‘ground truth’ largely involves creating datasets that are made up of image-text pairs composed of an image and a corresponding textual description. Text-to-image models are commonly assessed according to their ‘coherence’, that is, the images generated have a ‘coherent’ connection with their text prompts. However, more than coherence, large-scale image platforms, such as Midjourney and DALL•E, also aim to make images of high aesthetic appeal. These large-scale text-to-image models are not conceptualised to make just any image, i.e. badly rendered, ‘unaesthetic’ or scribbly images, but images of ‘high visual quality’ that might ‘trend on ArtStation’, to use a popular modifier in text-to-image prompts.⁶ This involves creating datasets of images considered to be of high aesthetic appeal, often web-scraped from design and illustration portfolios, ArtStation, photography platforms and other sources, in order to train models that can generate images of a similar aesthetic ‘quality’.⁷ Such as vibrant colours, striking compositions that we see in the re-generations of existing images as ‘Wes Anderson tableaus’ of sublime architecture, and fantastical land- and cityscapes.

Much of the critique and celebration of these examples of AI imagery is formalist in approach, focusing on what the images look like and prioritising visual aesthetics above their socio-cultural conditions and meaning. However, if we critique these images only by their formal and aesthetic qualities, we miss the more important work of critiquing how they operate within social, cultural and political spheres, and of visual production through machine learning as a shaping force within these spheres.

To think through this problem, the next section reports on an anarchival collaboration that includes both traditional collage-making and AI image generation.

1. Creating Images with Images

Collage and the Anarchive

As an image-making process, collage involves assembling new images from fragments of existing images or scraps of ephemera. Here, we consider one approach to collage-making that is driven by formal experimentation, where the collage maker often doesn’t have a preconceived idea of what the final image will be but arranges fragments until a surprising, pleasing or otherwise poetic juxtaposition of subjects, colours, textures, patterns or shapes feels complete. Whether the final image is abstract or figurative, completion of a ‘formal experiment’ is determined by a sense of compositional harmony. This exploratory and intuitive approach to image-making is driven by tacit knowledge, or expertise, developed through thoughtful practice. Even for an experienced collage artist, it often takes a while to slip into the ‘zone’ to work in this formal way, similar to working on a jigsaw puzzle, where slight variations in colour, shape and pattern become more obvious the longer you look. Although collage may appear to be a simple technique, well-crafted collages take skill and time to produce.

Collage artists build their own archives of material, amassing organised and/or ramshackle collections of scraps and pre-cut fragments to work with. For digital collage artists, this material can increasingly be taken from the archives of cultural institutions, as GLAM (galleries, libraries, archives and museums) collections are digitised and made open access with Creative Commons licences.¹ In other words, archives can be created from other archives, to create images from other images. A distinction we make below is the difference between drawing on material from cultural archives as fodder for formal compositions and using archival material to deliberately bring cultural or historical associations into the work.

An anarchival approach to collage differs from a formal approach in that, rather than being guided by serendipity and intuition to create satisfying compositions, the collage maker takes a critical lens to archival materials in order to question or trouble what is contained within, organised and omitted from archives.²In an anarchival approach, the collage artist treats the archive as a collection of materials that document past events in specific and situated ways, and further as a site that is open for reactivation and reconsideration of this culturally loaded material. We are not claiming that this approach is new; collage artists from historical art movements such as Dada and Futurism have taken a similar approach to using culturally loaded ephemera to make socio-cultural and political statements.

An anarchival collage might communicate a particular idea or position, or produce a ‘counternarrative’ that troubles the archive’s account of a particular history, and how that informs broader societal understanding of that history. This approach recognises the unattainable ideal of the archive as a complete and objective site of cultural memory. For historical archives, this often points to the colonial nature of collection materials and practices.

An example of anarchival collage is the suite of ‘illustrated plates’ created for the book Natural Things in Early Modern Worlds (Cooley et al. 2023), in which Sadokierski and collaborator Katie Dean assembled collages from visual sources (natural history illustrations, maps, photographs) supplied by the authors and editors of the book. The twelve plates are deliberately complex, ambiguous and often surprising, inviting readers to critique the way archival material is visually represented in scholarly publishing, and the inherent bias embedded in the process of creating images (Sadokierski and Dean 2023).

Section 1: Creating Images with Images

Section 2: Working With Absence in Historical Archives

↳ Experiment 1: Me vs Dall•E

Section 3: Further Experiments to Think-With

↳ Experiment 2: Features by Proxy

↳ Experiment 3: Comparative Analysis

Discussion and Conclusion

1. The ‘Open Culture’ section of the Creative Commons site hosts a wealth of articles on this topic:
creativecommons.org/category/open-culture

2. Carine Zaayman’s Anarchival Practices (2023) and mnemoscape’s 2014 themed issue ‘The Anarchival Impulse’ offer discussions on how the term ‘anarchival’ has emerged in creative practice research since Hal Foster’s introduction of the term, as an aside, in 2004:
www.mnemoscape.org/single-post/2014/09/14/editorial-the-anarchival-impulse

5. Similar to our observation of generative models, Amanda Wasielewski (2023) discusses a return to formalism in the computational analysis of images using discriminatory (classification) models.

4. Our references to ‘interestingness’ throughout the paper are a nod to Sianne Ngai's Our Aesthetic Categories (2015).

6. ArtStation is a platform where visual artists, mostly from gaming, 3D effects and animation, share their work.

7. As an example, LAION’s ‘LAION-Aesthetics’ (2022) image-text-pair dataset is a subset of images of ‘high visual quality’ derived out of LAION’s larger 5B (Schuhmann et al. 2022) dataset.

ABOVE: Internal spreads and front cover from the book Natural Things in Early Modern Worlds, as well as a diagram that shows the source material for one of the collages. Katie Dean and Zoë Sadokierski, 2023.

3. As noted by Wolfgang Ernst (2013, 8), there are anarchival dynamics already in archives.

Tip

Tip

1. Creating Images with Images