Artificial intelligence for predicting the function of enzymes has made significant progress!

Enzymes are important biological catalysts in all living cells: they facilitate chemical reactions, through which all molecules important for living organisms are produced from specific substrates. Most living organisms possess thousands of different enzymes, each of which is responsible for a very specific reaction. The combined action of all enzymes constitutes the metabolic system, which provides the conditions for the life and survival of the organism.

Although the genes encoding the enzyme can be easily identified, in the vast majority (more than 99%), the exact function of the enzyme expressed by the enzyme gene is unknown. This is because experiments to determine their function, i.e., which starting substrate molecules are converted into specific end product molecules, are time-consuming.

However, with the rapid development of artificial intelligence (AI) technology, researchers now led by bioinformatician Professor Martin Lercher from the Heinrich Heine University (HHU) in Düsseldorf, Germany 🇩🇪, together with researchers in Sweden and India, have taken an important step forward in predicting the function of enzymes: they have developed an AI-based method that can predict with high accuracy whether an enzyme can bind to a specific substrate and catalyze its reaction. The findings, published in the latest issue of Nature Communications, are an important step forward from previous methods and can help predict the function of an enzyme more specifically and accurately.

 

“What’s special about our Enzyme Substrate Prediction (ESP) model is that we are not limited to a single, special enzyme and other enzymes that are closely related to them, just like the previous model,” said Prof. Lercher. Our generic model can be adapted to any combination of 1 enzyme and more than 1000 different substrates. ”

Alexander Kroll, a PhD student and lead author of the study, has developed a deep learning model in which information about enzymes and substrates is encoded in mathematical structures called numerical vectors. Vectors of approximately 18,000 experimentally validated known enzyme-substrate pairs were used as inputs to train deep learning models.

After training the model in this way, they apply it to an independent test dataset where they already know the correct answer. As a result, the model largely correctly predicted which substrates matched which enzymes, and it was more than 91% accurate on independent and diverse test data.

This approach offers a wide range of potential applications. For example, in pharmaceutical research and biotechnology, it is important to know which substances can be converted by enzymes. This will allow research and industry to narrow down a large number of possible pairs to the most promising ones, which they can then use for enzymatic production of new drugs, chemicals and even biofuels.

In addition, it will be able to create improved models to mimic cellular metabolism. In addition, it helps to understand the physiological characteristics of various organisms.