06 Exhausted Data-Bodies
In this view we are not interactive at all, but merely a probed subject.
(Wilson 2003)
The rise of Google’s extractive business model initiated concern over the exploitation of data exhaust, whereby users’ behavioural data is ingested by corporate machinery and rendered for exchange in behavioural futures markets (Zuboff 2019). My project views data exhaust as a form of body text, destined to be read by machines in a new mode of textuality (CCTV + AI). Allan Sekula’s seminal essay "The Body and the Archive" highlights the historical use of photography in early criminology to document, categorise, and archive the human body (Sekula 1986). As photography sought to catalogue identity, it confronted the problem of volume; the overwhelming mass of images meant that the archive became chaotic and unwieldy. Today, the electronic database solves this problem with its vast storage capacity, and over time the photographic object that was once relied on to help establish identity has been replaced by data. Sekula’s observations prefigure a new form of surveillance power, in which CCTV merges with artificial intelligence. The increasing datafication of public spaces enables the emergence of dataveillance, through which vast volumes of personal and behavioural data are collected and analysed for law enforcement and other surveillance purposes. In general, predictive AI utilises statistical analysis and machine learning (ML) to detect patterns and predict behaviours, and it goes without saying that the performance and accuracy of these models are highly dependent on the quality and volume of training data. One of the key mechanisms with which predictive AI is able to query large textual and image sources is through the use of embeddings, a structured way to represent information that enables the identification of relationships and similarities within the data. Generated by unsupervised neural network layers, embeddings convert ingested data into a machine-readable form, and places them within a mathematical space that relates to all other information in the dataset, allowing the AI model to rapidly read all relevant data to make a prediction based on the proximity of certain features. In the context of HAR algorithms, designed to automatically analyse and interpret human actions based on video sequences initially labelled by humans, it becomes crucial to consider the nature and sources of the primary training material – what the action is suppose to depict, and how it has been described. Never before has the body and its behaviour been so intensely scrutinised, to the extent that the body can now be considered an explicit interface.
AI and ML have the potential to alter, enhance, and reinvent narratives surrounding human physicality. HAR seeks to automatically analyse and interpret human actions from video sequences, a task that has garnered significant attention from both academic researchers and industry professionals. This growing interest is driven by the increasing demand for automated systems that can accurately recognise and interpret human behaviour in a variety of applications, ranging from video indexing and biometric authentication to surveillance and security. Via HAR, the body is transformed into a readable entity, yet this process is fraught with challenges. One of the primary issues in HAR is the inherent ambiguity in defining the motion of body parts, particularly in complex, real-world scenarios. As Jegham et al. note, most public datasets used for HAR research are recorded in controlled environments with static cameras and uniform backgrounds, conditions far removed from the complexities of the real world (2020, 10). The absence of research in authentic, uncontrolled conditions limits the applicability of HAR technologies, as they fail to account for the unpredictability of real-world human behaviour. A case in point is Violent Individual Identification (fig. 8), a project enabled by a dataset consisting of 2,000 videos of individuals engaged in simulated violent behaviour. Because of the relative scarcity of authentic footage available online, the researchers resorted to using recorded performances by 25 male volunteer subjects who mimicked acts of violence. Figure 7 shows some of the violent activities from the dataset: (clockwise from top) (i) Strangling, (ii) Punching, (iii) Kicking, (iv) Shooting, and (v) Stabbing. However, there is a world of difference between explosive real-life violence and the kind of mimetics involved in making the training dataset. Despite these obvious limitations, the researchers claimed that their technology could identify violent acts such as stabbings, shootings, and brawls. This assertion prompted concerns from civil liberty groups, who warned that such software could be error-prone, leading to potential issues of mass surveillance and misidentification (Melendez 2018). The disparity between controlled training environments and unpredictable practical conditions emphasises the difficulties HAR encounters in accurately interpreting human actions across varied contexts.
Figure 7. Singh, Amarjot, Devendra Patil, and S. N. Omkar. 2018. Eye in the Sky: Real-Time Drone Surveillance System (DSS) for Violent Individuals Identification Using ScatterNet Hybrid Deep Learning Network, https://arxiv.org/pdf/1806.00746, 2, fig.1. Copyright: Singh, Amarjot, Devendra Patil, and S. N. Omkar.

