* Intro

This work bases on data of genome folding.

The DNA molecule is composed by pairs of adenine (A), thymine (T),
guanine (G) and cytosine (C). Genes are portions of the DNA double
helix strings which contains the "code" for the production of one
specific protein. The whole of the genetic information encoded in the
DNA is tightly packed around other proteins called histones forming
bigger structures which are the chromosomes, 24 in the human genome
and in the data we are dealing with.

The chromosomes in turn are packed together in a very small space, in
a very complex "knot". In this arrangement, parts of a chromosome are
interacting or "touching" parts of other chromosomes or of the same
chromosome. This interaction is not only due to the very tight
packing, it is also functional as these interactions cause the cell in
which the DNA is to produce (or not) specific proteins.

Thus this complex spatial arrangement is of interest to genetic
researchers as the understanding how it is generated could lead to an
understanding not only of how specific interactions come into place
but also how other, "unwanted" interactions could be caused which
could be the origin of some of genetic diseases.

* Data

The data we have received consists of a list experimentally determined
interactions between genes. The list comprises 856552 of such
interactions between 482471 positions across the 24 chromosomes. Thus,
each recorded position can have multiple interactions with different
other positions, meaning it "touches" at the same time different
portions of chromosomes.

 

The is list thus composed of:
1. chromosome and position on the chromosome pairs
2. confidence level of the recorded interaction
3. type of the gene: promoter or enhance (ignored in the following)

 

* Metaphor

In this work, a physically inspired simulation transposes visually the continuous process of knotting of the chromosome strings due to the interactions we have recorded in the data.

Chromosomes have been simulated as a set of multiple joints
corresponding tho the positions in the data. Each of these joints is
connected to the previous and the next position on the same chromosome
with a binding force, thus the joints form a string or a line. Further
each joint is under the effect of a force which pull it toward all
other positions (on other chormosomes) with which it interacts.

For modelling all the interactions, spring-like forces have been
used. The joints are modelled as masses under the influence of
attrition in order to avoid that possible oscillations could grow to
great amplitudes and keeping the whole system in a rather low energy
state allowing for convergence to a stable state.

These masses move and interact in a three dimensional space. In the
video below only the projection of the masses position on the x, y
plane is used.

Each time the simulation is started the masses are placed randomly in
the space.

Running the simulation eventually a stable state or a "knot" of
chromosomes is found whose spatial structure is informed by the whole
network of interactions we have in the data.

 

* Conditioning Step

For the first incarnation of this work, it was necessary to reduce the
complexity of the data in order to run the simulation in
real-time. Therefore two conditioning steps have been performed on the
data before the simulation is run:

1. the information about the gene positions has been "undersampled",
meaning that the chromosomes have segmented into larger segments
and all the positions and their interaction in each segment have
been condensed in one of the joints in the simulation. Chromosomes
have different length, therefore the longest chromosome (number 1)
has been first segmented into a maximum number of pieces (32 in the
video below) and the segmentation length fixed. The other
chromosomes have then after been segmented using this length. Thus
the chromosome strings are all composed by a different number of
joints.

2. all the interactions with a confidence level below a threshold (10
in for the video below) have been ignored.


* Groups

For the simulation in real-time, not all the interactions between the
joints are simulated at the same time. Instead the interactions are
switch on between the joints in groups of three (8 of those groups are
used for the rendering of the video below). Each of these groups is
composed of joints which interact with each other. Joints in these
groups are drawn together as their interaction means that they should
be near to each other.

At the beginning the groups are chosen randomly. However, after a
certain time or when the area inscribed by the triangle which has the
three joints in the group as vertices reaches a small value the group
is changed using the following strategy:

1. At first the group consist of 3 joints:
- A
- B
- C
2. The group is changed in 2 steps:
+ the first is removed and the second and third become first and
second
+ A now joint D is added to the group choosing from one of the joints
the second joint (now C) interacts with. So the new group is now:
- B
- C
- D
3. After some time or when the joints are near enough the group is
again changed removing B and inserting joint E from the joint with
which D interacts with:
- C
- D
- E
etc.

As a result, these groups of three "crawl" through the whole network always
pulling interacting joints towards each other in a continuous knotting
of of the chromosome-strings.

 

* Video

In the video the joints on the same chromosome are jointed by a white
line. The white strings correspond to the chromosomes

Groups are rendered graphically in the video below as white
triangles.

The video frame is always adjusted to the maximum x and y positions of
the joints. This means that the image is always zoomed such to hold
all the joints. Further, while the structure gets more and more dense and
smaller this dynamic zoom adjustment acts as a adaptive magnification
factor on the forming knot.

* Exhibit 1

Knotting process simulated using a maximum of 32 joints for the chromosome strings and a confidence level threshold of 10, leaving us with a maximum of 8364 interaction per joint.

* Exhibit 2

Knotting process simulated using a maximum of 64 joints for the chromosome strings and a confidence level threshold of 10, leaving us with a maximum of 7994 interaction per joint.

* Exhibit 3

Knotting process simulated using a maximum of 128 joints for the chromosome strings and a confidence level threshold of 10, leaving us with a maximum of 7064 interactions per joint.