02 Legacy Systems
The project employs a ResNet human-action recognition (HAR) model. Unlike state-of-the-art HAR systems that prioritise precision through multimodal feature fusion, this older model’s propensity for erroneous inferences becomes a deliberate artistic strategy. Its failure to neatly categorise ambiguous motions and tendency to discompose activities into disjointed gestures parallels avant-garde techniques of deconstruction and defamiliarisation.
Human-action recognition (HAR) presents a complex challenge in the field of machine vision, where systems leverage sensor data in an attempt to accurately identify and interpret human actions. These systems have broad applications, ranging from augmented reality and human–computer interaction to home monitoring, sports, and security. The instance of HAR utilised by this project is a ResNet model trained on the Kinetics 400 video dataset introduced in 2017, which contains 400 human action classes, each with up to 1,150 video clips extracted from YouTube. HAR techniques have evolved significantly since the introduction of this particular model, with recent advancements focusing on deep-learning-based fusion techniques that incorporate multiple data sources to capture spatial and temporal relationships between body parts, enhancing the system’s ability to recognise complex actions. However, in my project, the legacy HAR system is not used to enhance interactivity or to ensure the clinical precision that state-of-the-art models aim for, but rather as a tool of estrangement – a means to expose the instabilities and contradictions within computational attempts to parse the actions of the human body. Despite its technical ambition, the legacy model frequently produces erroneous inferences, misclassifying actions, fragmenting movements, or applying labels that are wildly misaligned with the source material. It is precisely this propensity for error that becomes fertile ground for artistic intervention. In contrast to contemporary multimodal systems designed for precision and seamless recognition, this HAR model resists the closure of meaning. Its misclassifications function as poetic interruptions – flashes of misrecognition that call attention to the social and technical assumptions embedded within the dataset. When a motion is misread, or an everyday action is dismantled into its constitutive gestures (see 07 More Mimicry in Human-Action Datasets), it foregrounds the interpretive violence inherent in computational “seeing”. These breakdowns reflect avant-garde strategies of deconstruction and defamiliarisation, as seen in the disarticulated movements of Dada performance, the gestural distortions in Bauhaus dance, Hannah Höch’s fragmented photomontages, or the disruptions of Brechtian estrangement that invite critical engagement rather than passive consumption. In such works, meaning is not given but fractured and reassembled, inviting critical scrutiny of how bodies are seen, interpreted, and codified.
In the algorithmic age, such avant-garde techniques are reimagined through computational failure. Utilising HAR misclassifications in this project as a productive force echoes William Kentridge’s erasure-based animations, where both involve the incomplete, unstable, and iterative construction of meaning from partial cues, compelling viewers to re-evaluate meaning through fragmentation. When confronted with chaotic or unfamiliar patterns – such as the fluttering of moth wings or overlapping organic forms – HAR algorithms may attempt to map these onto known action categories like “waving”, “jumping”, or “falling”, based solely on superficial similarities to learned data. Just as Stan Brakhage’s frame-by-frame collage in Mothlight (fig. 1) creates a kinetic rhythm that refuses linear narrative, HAR systems often deconstruct continuous motion into isolated gestures, leading to fractured or incorrect interpretations. Contemporary projects like Stephanie Dinkins’ Conversations with Bina48 (which highlights racialised gaps in AI communication) or Everest Pipkin’s Image Lace (which degrades training data into abstract patterns) extend this tradition, using algorithmic error to critique the reductive taxonomies embedded in datasets like Kinetics 400. The operational failure of the legacy model used in this project becomes an aesthetic resource; its inability to recognise a nuanced or ambiguous motion mirrors broader concerns about the reduction of human complexity to discrete, legible categories. By embracing these misfires, viewers are invited to consider computational recognition not as a neutral process of matching input to label but as a contested zone where meaning is produced through layers of mediation, bias, and approximation. This approach builds on the contemporary idea of algorithmic remix by extending its logic into the kinetic and performative realm. Just as GenAI systems digest and recompose visual and textual inputs, HAR systems attempt to distil bodily gestures into pre-trained taxonomies. In both cases, the act of synthesis reveals more than it resolves. The distortions introduced by these systems expose the cultural assumptions embedded in the training data and the structural limitations of classification itself. Through these misalignments, my project situates machine learning not as a neutral mirror of reality but as a generative field of rupture – where bodies, identities, and narratives can be unsettled, reconfigured, and made strange.
