1.Department of Mathematics and Physics,Sanjiang University;2.Hong Kong Baptist University
This project is supported by The Natural Science Foundation of the Jiangsu Higher Education Institutions of China (22KJD14005) and Early Career Scheme (No. 22302723) from Research Grants Council of Hong Kong.
AlphaFold, which is developed by DeepMind, has made amazing advances in predicting protein structures for life sciences research. Using the vast structural predictions made possible by AlphaFold, a database of over 200 million proteins has been established. Such a database covers the complete proteomes of many organsims. This review outlines the most recent progresses in exploring protein evolution using statistical physical methods based on the AlphaFold database. Traditional protein evolution research often concentrates on the sequences or structures of proteins within the same family, using a narrow microscopic approach. With the new emergence of extensive protein structure predictions by AlphaFold, whereas, scientists can expand their horizons to include vast assortments of proteins to make parallels with all proteins in different species and extract statistical trends through macroscopic observation. By comparing the proteins with similar chain lengths in over 40 model organisms, the statistical trends in protein evolution is discovered. For organisms with higher complexity, their constituent proteins present larger radii of gyration, higher flexibility, and higher segregation of hydrophobic and hydrophilic residues in both spatial and sequence. It is also validated by statistical physics analysis that higher organismal complexity correlates with higher functional specialization of constituent proteins. The findings in these studies connect molecular evolution to organism evolution, contributing to the understanding of the origin and evolution of lives.