Protein is the cornerstone of life. The function of organism depends on the stable and flexible protein structure. Spectral response signals of proteins, especially ultraviolet spectra, can be called fingerprints of protein skeleton. This optical fingerprint can reveal the precise protein structure through theoretical simulation, and provide extremely important information for life science and medical diagnosis.
However, the structure of proteins is extremely complex and changeable, which requires a large number of high-precision theoretical calculations of quantum chemistry. Because of the huge amount of computation, even the most powerful supercomputers can easily be overwhelmed. Therefore, the theoretical interpretation of protein spectra is a long-term difficulty and challenge, which limits the accurate analysis of spectra and the discovery of protein structure.
How to avoid too expensive quantum chemical calculations and interpret the optical fingerprints of protein skeleton in the simulation of spectral theory is an important scientific topic. In recent years, artificial intelligence(AI) technology has been widely used in various fields to reduce the computational complexity of complex systems.
Recently, Prof. JIANG Jun from Hefei National Laboratory for Physical Sciences at the Microscale, cooperated with Prof. LUO Yi, also from USTC, and Prof. Shaul Mukamel from University of California, Irvine，to establishe the structure–property relationship between the structure and properties of protein-peptide bonds by using the neural network technology of AI machine learning, which reduced the computational complexity by tens of thousands of times. Finally, they successfully predicted the ultraviolet spectra of peptide bonds, and revealed the structure descriptors and structure–property relationships with chemical connotations using the random forest method. The combination of AI and quantum chemistry provides an efficient tool for predicting the optical properties of proteins. Relevant results were published in Proceedings of National Academy of Science (DOI: 10.1073/pnas.1821044116) under the title A Neural Network Protocol for Electronic Excitation of N-Methylacetamide.
Image by NSRL
In recent years, Prof. JIANG's team has devoted themselves to developing the application of machine learning technology in the field of quantification, making it an important tool to solve quantification problems. In this work, researchers first obtained 50,000 groups of peptide bond model molecules with different configurations by molecular dynamics simulation and quantum chemistry calculation at 300 K. Bond length, bond angle, dihedral angle and charge information are selected as descriptors by machine learning algorithm. The structure–property relationship between the ground state structure of peptide bond and its excited state properties was established by big data training with neural network. Based on the trained machine learning model, the ground state dipole moments and excited state properties of the peptide bonds are predicted. Finally, the ultraviolet absorption spectra of the peptide bonds are predicted. In order to verify the robustness and transferability of the machine learning model, the ultraviolet absorption spectra of peptide bonds at 200 K and 400 K were predicted based on the machine learning model obtained at 300 K. The results are in good agreement with simulations using the time-dependent density-functional theory (TDDFT).
This is the first time that AI technology has been applied to theoretical calculation and prediction of protein spectroscopy. A large number of data are obtained through theoretical calculation, and AI is used to train and establishe the structure–property relationship. The final model is used for prediction, which provides a new idea for simulating the spectrum of proteins. This work establishes the feasibility and advantages of machine learning to simulate the ultraviolet absorption spectra of protein peptide bond skeleton, and interpretation of optical fingerprints of protein will become easier and more effective.
Relevant work has been funded by the National Natural Science Foundation of China and the Pilot Project of CAS. The first author of this paper is YE Sheng Ph.D., post-doctoral HU Wei and LI Xin, co-authors are JIANG Jun and Shaul Mukamel.
（Written by YANG Xinqi, edited by YE Zhenzhen，USTC News Center. ）