Machine learning can now interpret gene regulation clearly.
Posted on 31 December 2019

A mathematical thermodynamic model for gene regulation is formulated as an artificial neural network (ANN). Large DNA datasets are fed through the new ANN. The pattern of connections is presented in a way that is easy for biologists to interpret. 
Machine learning algorithms are helping biologists make sense of the dizzying number of molecular signals that control how genes function. But as new algorithms are developed to analyze even more data, they also become more complex and more difficult to interpret. Quantitative biologists Justin B. Kinney and Ammar Tareen have a strategy to design advanced machine learning algorithms that are easier for biologists to understand.
They used ANNs to analyze data from an experimental method called a "massively parallel reporter assay" (MPRA) which investigates DNA. Using this data, quantitative biologists can make ANNs that predict which molecules control specific genes in a process called gene regulation.
Cells don't need all proteins all the time. Instead, they rely on complex molecular mechanisms to turn the genes that produce proteins on or off, as needed. When those regulations fail, disorder and disease usually follow. Justin Kinney showcases the relatively easy-to-understand structure of a newly-designed artificial neural network. 
They developed a new approach that bridges the gap between computational tools and how biologists think. They created custom ANNs that mathematically reflect common concepts in biology concerning genes and the molecules that control them. In this way, the pair are essentially forcing their machine learning algorithms to process data in a way that a biologist can understand.