long short-term memory 


Memory and Forgetting


Memory's anachronistic quality-its bringing together of now and then, here and there-is actually the source of its powerful creativity, its ability to build new worlds out of the materials of older ones

~ Rothberg, Michael (2009). Multidirectional Memory: Remembering the Holocaust in the Age of Decolonization

One Shot Models 




Memory Studies (various)


Aline Sierp (Maastricht University)



Journal of Memory Studies (launched in 2008)



Memory Studies Association, launched in Dec 2016 ~ inaugurating a cross-disciplinary community towards the study of memory and "those who are active in museums, memorial institutions, archives, the arts and other fields engaged in remembrance"


How does she consider inscription/automated inscription as a memory actor?



To look into...


models of artificial neurons / memory structures


sequential memory structures in circuitry & analog structures (e.g. flip-flop like positive feedback behavior)


recording / inscription methods - forms of writing


reference: the Master Algorithm


The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) was first organized in 2010 as a test of the progress made in image recognition systems.

The organizers made use of Amazon Mechanical Turk, an online platform to connect workers to requesters, to catalog a large collection of images with associated lists of objects present in the image. The use of Mechanical Turk permitted the curation of a collection of data significantly larger than those gathered previously.

In 2012, the AlexNet architecture, based on a modification of the LeNet architecture (1988) run on GPUs, entered and dominated the challenge with error rates half that of the nearest competitors. This victory dramatically galvanized the (already nascent) trend toward deep learning architectures in computer vision.


Since 2012, convolutional architectures consistently won the ILSVRC challenge (along with many other computer vision challenges). Each year the contest was held, the winning architecture increased in depth and complexity. The ResNet architecture, winner of the ILSVRC 2015 challenge, was particularly notable; ResNet architectures extended up to 130 layers deep, in contrast to the 8-layer AlexNet architecture.

Very deep networks historically were challenging to learn; when networks grow this deep, they run into the vanishing gradients problem. Signals are attenuated as they progress through the network, leading to diminished learning. This attenuation can be explained mathematically, but the effect is that each additional layer multiplica‐ tively reduces the strength of the signal, leading to caps on the effective depth of networks.

The ResNet introduced an innovation that controlled this attenuation: the bypass connection. These connections allow part of the signal from deeper layers to pass through undiminished.



Tensorflow is effectively a framework for graph-based computation and statistical optimization of graph structures, built upon an efficient ⬀tensor calculus engine. It allows for building sophisticated graph-based computational models and provides optimization algorithms for the free variables of these models. These models excell at extremely non-linear classification tasks, such as image recognition based on pixel data, word embeddings, and time-series analysis using recurrent nodes.


One of the major current weaknesses of TensorFlow is that constructing a new deep learning architecture is relatively slow (on the order of multiple seconds to initialize an architecture). As a result, it’s not convenient to construct some sophisticated deep architectures which change their structure on the fly in TensorFlow... progress in the deep learning framework space is rapid, and today’s novel system can be tomorrow’s old news

~ from Tensorflow for Deep Learning (O'Reilly 2018)


Machine learning (and deep learning in particular), like much of computer science is a very empirical discipline. It’s only really possible to understand deep learning through significant practical experience

~ from Tensorflow for Deep Learning (O'Reilly 2018)


Tensorflow is largely declarative. Calling a tensorflow operation adds a description of a computation to Tensorflow’s “computation graph”. In order to run a computational graph, you must create tf.Session objects.


* Begin by building a computational graph, in the above image we have the graph of a simple linear classifier wx+b = z 








A single realisation of three-dimensional Brownian motion for times 0 ≤ t ≤ 2. Brownian motion has the Markov property of "memorylessness", as the displacement of the particle does not depend on its past displacements.




Hyperbolic Tangent and ReLU tend to have the best performance in deep learning topologies..


Neural Network

input layers -> real valued features

essentially a multi-layer network of interconnected perceptron-like abstractions

A "deep network" implies 3 or more hidden layers

output layers -> prediction/estimate



Simple Perceptron

A perceptron can learn patterns in linearly separable datasets. Usually by fitting a line.


Depending on activation function, different types of classification/prediction can be achieved.


Start with a randomly initialized weight vector W;

while there exist input samples that are

misclassified, do;

Let X be the misclassified input vector;

W = W + Y\*η\*X;



... it seems most "artificial neural networks" can be thought of as nested linear compositions of nonlinear functions ...



weights and biases

sum(w*x) + b = y

(y - yprediction)**2

the squared term exponentially increases the error for higher error values

* Gradient Descent


A recurrent neural network (RNN). Inputs are fed into the network at the bottom, and outputs extracted at the top. W represents the learned transformation (shared at all timesteps). e network is represented conceptually on the le and is unrolled on the right to demonstrate how inputs from di erent timesteps are processed. Recurrent neural network (RNN) layers assume that the input evolves over time steps following a defined update rule that can be learned/optimized.

The update rule presents a prediction of the next state in the sequence given all the states that have come previously.


         ^^^ convolutional layer ^^^


fully-connected layer


One of many variations on a long short-term memory (LSTM) cell, containing a set of specially designed optimizable operations that attain much of the learning power of the RNN while preserving influences from the past.

Standard RNN layers are incapable of including influences from the distant past in their optimal bodies. Such distant influences are crucial for performing, for example,  language modeling. The long short-term memory (LSTM) is a modification to the RNN layer that allows for samples deeper in the past of a sequence to make their memory felt in the present.