09 Distilling Literary Corpora
Three of the practical projects that will be outlined in the following sections, Synset_Gloss, Idling-in-the-Unreal, and No_m_oN, employ a language model trained exclusively on the works of Michel Foucault as part of the ensemble.
Foucault’s seminal works Discipline and Punish (1975) and The Birth of the Clinic (1963) were two of seventeen volumes used to train the language model employed in practical projects. In his essay “The Theory of Vibe”, Peli Grietzer (2017) explores how AI can process literary corpora within a mathematical philosophy of literature. He proposes that training an AI language model on a single author’s body of work enables the distillation of that author’s “vibe” into a mathematical abstraction, independent of any specific instance of their style. Following Grietzer, a language model based on Foucault forms a significant part of the aforementioned projects. Foucault’s philosophy of power is crucial in understanding the evolution of what could be termed info-power – the control and influence exerted through data analytics, social media, and continuous algorithmic assessment. Since his death in 1984, this new form of power has become one of the most significant forces shaping contemporary society. Foucault explored how surveillance and governance mechanisms such as panopticism and biopolitics are central to the exercise of power in modernity, particularly through the categorisation, normalisation, and control of bodies. Biometric technologies, which quantify physical characteristics like facial features and fingerprints for identification and surveillance purposes, reflect the extension of disciplinary mechanisms into the digital realm. Foucault’s analysis of how bodies are subjected to power and knowledge systems is essential for critiquing contemporary surveillance technologies, which now operate at the intersection of embodied data and state or corporate control.
Figure 12. Oeuvre, 2021. A corpus of Michel Foucault visualised using Word2Vec Word Embeddings and t-SNE.
The 17 volumes constituting the corpus used to train the language model for Synset_Gloss, Idling-in-the-Unreal, and No_m_oN have been visualised using Word2Vec Word Embeddings and t-SNE (fig. 12). T-SNE is an ML algorithm for data visualisation that performs a dimensionality reduction technique, mapping multidimensional data to two or more dimensions – a new data representation that makes it human-readable, while preserving neighbourhood relations.3
