Progress in Determination of Protein Spatial Structure Based on Machine Learning

Introduction. The task of determining the spatial structure of proteins is one of the most important unsolved problems of mankind. Life on the planet Earth is called protein, because protein molecules are the drivers of life processes in living organisms. Proteins make up about 80% of the dry mass o...

Full description

Bibliographic Details
Main Author: B. Biletskyy
Format: Article
Language:English
Published: V.M. Glushkov Institute of Cybernetics 2021-03-01
Series:Кібернетика та комп'ютерні технології
Subjects:
Online Access:http://cctech.org.ua/13-vertikalnoe-menyu-en/215-abstract-21-1-5-arte
Description
Summary:Introduction. The task of determining the spatial structure of proteins is one of the most important unsolved problems of mankind. Life on the planet Earth is called protein, because protein molecules are the drivers of life processes in living organisms. Proteins make up about 80% of the dry mass of the cell and coordinate the processes of metabolism. The functions of proteins are defined by its spatial structure. The results of recent competitions in methods for determining protein structures have shown significant progress in this important area. One of the research groups presented the AlphaFold 2 method, the accuracy of which reached the accuracy of experimental methods. Purpose of the article. The aim of the work is to consider and analyze the basic principles of the AlphaFold software package for determining the spatial structure of proteins. Results. We consider the main stages in the process of recognizing the structure of a protein using the AlphaFold program complex. The stages and corresponding methods include: search for homologous proteins based on multiple alignment methods, construction of protein-specific differentiated potential using artificial neural networks and protein structure energy optimization using gradient descent and limited sampling. We discuss how combination of various bioinformatics techniques powered by data from open data sources can lead to significant improvements in accuracy of protein structure prediction. Special attention is paid to the use of artificial neural networks for building the smooth protein-specific potential and following energy minimization based on constructed potential. Conclusions. The combination of a number of methods and the use of information from protein and genetic data banks allows us to make significant progress in solving the extremely important task of determining the structure of a protein.
ISSN:2707-4501
2707-451X