2. Background

Ever since two neural networks were made to compete with each other to learn from the feedback loop, we have witnessed enormous progress in the synthetic generation of images, as well as in other media. This neural network architecture is known as a Generative Adversarial Network [GANs] (Goodfellow and others 2014), making it possible to solve a variety of complex problems in computer graphics.

 

One of the most famous GAN networks is StyleGAN by NVIDIA (Karras and others 2018), which is able to generate never-before-seen visual outcomes based on vector input. This means that during the training of a StyleGAN model, specific vectors are defined by the neural network for each significant feature, visually determining the recognised patterns in the dataset. These vectors then form a multi-dimensional space of visual possibilities called ‘latent space’.

In this case, training a model means creating a closed tool that can generate new images based on what a neural network saw in the dataset. We can control this generative process by moving the vectors and pinpointing specific locations within the latent space.

 

One famous application of a StyleGAN model trained on a Flickr-Faces-HQ Dataset of photographs with human faces is the website This Person Does Not Exist, which shows a randomly chosen vector (thus, a new synthetic human face) each time the website refreshes. The publication of this StyleGAN model in 2018 is considered a breakthrough, when AI-generated synthetic visuals depicting humans crossed the uncanny valley of realistic-but-not-so-realistic CGI avatars.

The development of GANs clearly aims for greater realism; however, if the model is too complex and combines different classes of objects, scenes, and textures, such an image synthesis will lead to visually indeterminate outcomes (Hertzmann 2020). Images that appear to depict real scenes but fail to render specific details were typical outcomes of multi-category models such as BigGAN (Brock and others 2019) or text-to-image synthesis models such as AttnGAN (Xu and others 2017).

 

Until 2021, generative image synthesis had two different functions, namely the ability to (1) generate photorealistic fiction or (2) synthesise unrecognisable images that combine diverse objects, scenes, and textures. However, this changed in early 2021 with the progression in text-to-image synthesis using large language models like GPT-3 (Brown and others 2020). Abandoning the GAN architecture in favour of Transformer architecture, which allows multimodal deep learning, solves the issue of visually indeterminate outcomes from complex models.

Because the photorealistic quality of synthetic media is often indistinguishable from the non-synthetic (to the naked eye), there are, understandably, concerns about the possible misuse of this technology to manipulate the public. With or without bad intentions, photorealistic synthetic media will sooner or later turn seemingly trustworthy visual communication upside down. This is evident in the efforts of multiple tech companies that focus on offering a ‘remedy’ for deepfakes, either via detection mechanisms or authentication protocols for (audio)visual content. On the other hand, those examples of synthetic media that are evidently ‘not real’ — carrying visible signs of their synthetic origin or not photorealistic — are deemed uninteresting or not important, imperfect and unfinished outcomes of tools that still need to be improved. In this exposition, we argue in favour of these visually imperfect outcomes as thought-provoking and functional visual material.

This visual indeterminacy, as Aaron Hertzmann calls it, is a major theme in contemporary art, especially in ‘GAN art’. Artists such as Mario Klingemann, Mike Tyka, Sofia Crespo (Entangled Others), Refik Anadol, and Terence Broad, among others, are intentionally working with the aesthetics of the indeterminate images GANs produce. Visually indeterminate images are those that ‘appear to depict real scenes, but on closer examination, defy coherent spatial interpretation’ (Hertzmann, 2020). This imperfection can be taken as a quality offering various creative possibilities in artistic practice.

The strange visual ambiguity of GAN art is reminiscent of the abstract artistic tradition; however, the reason behind it is not usually an intentional departure from realism and a focus on the evocation of emotion. Most of the artists working with GANs in their practice engage in the critical intervention of the AI system, often stretching its capacities or deliberately disrupting the generative process. For example, artist and researcher Terence Broad works with network bending — adding deterministic transformation layers into the computational graph of a trained generative neural network, which essentially offers greater creative expression while also increasing the understanding of how the system works (Broad and others 2020). Similarly, Memo Akten explores performative creative expression by adding interactive instruments into deep learning models with the aim of producing ‘meaningful human control in a realtime continuous manner’ (2021). In the Learning to See series (Akten 2017), Akten is letting a machine-learning algorithm interpret everyday objects with a limited understanding of the outside world. As the outcome, we are presented with two completely different versions of reality, pointing to the idea that sense-making is possible only through the filter of what we already know. This kind of tension between the limited knowledge of a custom-trained neural network and an artist challenging it with the unknown is also present in the work of the sculptor Scott Eaton, who explores diverse shape-making abilities of limited neural networks (Eaton 2019).

Such practice can be seen as a form of glitch art, which utilises methods of data manipulation, misalignment, and distortion in order to discover the distinct aesthetics of digital (or analogue) errors, often highlighting the imperfections and failures of our digital systems. In a similar way, GAN art exposes the inherent unpredictability and autonomy of AI systems and the impossibility of achieving full control over or understanding of these systems. However, the intentions behind glitching associated with AI art are deeply connected with the urgency of understanding this completely new way of image production that steps outside human control and contextualising synthetic images in our rapidly changing visual culture. Despite the large quantity of GAN artworks that focus mostly on the aestheticisation of AI glitches, the abovementioned projects work intentionally with AI tools as feral knowledge-production tools.