A Politecnico di Milano study reveals DNA "grammar"

Genome Biology published a research on the rules governing the shape of DNA in space

DNA three-dimensional structure is determined by a series of spatial rules based on particular protein sequences and their order. This was the finding of a study recently published in Genome Biology by Luca Nanni, one of our PhD students in Computer Science and Engineering, together with Professors Stefano Ceri of Politecnico di Milano and Colin Logie of the University of Nijmegen (Netherlands).

Our study’s greatest innovation lies in having identified precise rules for the disposition of CTCF proteins. The beauty and simplicity of CTCF's grammar shows us how nature and evolution produce regularity and incredibly ingenious and functional systems.

first author of the study Luca Nanni said.

Knowing these rules allow CTCF sequences to be engineered to obtain the desired DNA three-dimensional structure. For example, it should be possible to make two disconnected genes interact. Moulding DNA structure will open doors to the creation of pharmaceuticals for the treatment of diseases such as cancer.

The DNA molecule, which would be about two metres long if completely unrolled, wraps itself based on a complex system that maintains its accessibility and correct reading to reside in the cell’s nucleus.

Crucial in the study of the three-dimensional structure of the genome are topological domains, which are thought to aggregate DNA zones with similar roles and behaviour. For example, genes with similar function are likely to reside in the same topological domain.

We focused on some specific DNA sequences that encode for the CTCF protein. This protein isolates portions of DNA creating barriers between the various topological domains. With the help of computer simulations and the creation of a model for classifying these proteins according to their orientation, we identified a surprising regularity in their arrangement along the DNA sequence.

The study showed that the orientation and order of these DNA sequences makes it possible to reconstruct topological domains. The human genome compresses following a "grammar" logic comprising CTCF sequences, orientation, and the distance between them.


More information:
Spatial patterns of CTCF sites define the anatomy of TADs and their boundaries.
By Nanni, L., Ceri, S. & Logie, C.
Genome Biol 21, 197 (2020)
Find the article online