Jake Galson, PhD, Head of Technology, Alchemab Therapeutics
Dr. Galson begins the presentation by providing the audience with a glimpse of his main research areas. He explained how his research aims to uncover biomarkers that make people healthy or especially resilient to disease. He explains that they sequence the antibody repertoire of these healthy individuals and use them in a reverse engineering approach to help design novel antibody drugs and antibody drug targets. He then applied these concepts toward discovery using technology and machine learning.
Alchemab Data Cube
The largest resource of combined antibody and antigen sequence datasets, all from our unique set of resilient patient cohorts. The challenge is narrowing down all that data to find viable targets, thus leading to the developmental platform of machine learning. This model aims to apply it toward identifying physiologically validated therapeutics. There were significant challenges in the machine learning process, but the inspiration was gained by using preexisting predictive language models.
AntiBERTa: Alchemab Pretrained Model
AntiBERTa is a 12-layer transformer model and was pre-trained by analyzing 57M human body sequences. AntiBerta is then programmed to learn information relevant to antibody biology. It learns the best numeric representation from an inherent structure in the data.
The function of AntiBerta is the first in a 2-Step Process for antibody machine learning. Step 2 involves the ALM, (Alchemab Model Library), which uses transfer learning to fine-tune the model for specific ML tasks.
Case Study of Functional Clustering
Convergence analysis is the starting point for antibody discovery. The diversity of the antibody repertoire is rare. Consequently, a convergence between individuals is also rare. Convergence increases upon common antibody exposure. Resilience is caused by the presence of antibodies and common autoantigens. This occurrence means convergence is a viable tool for identifying those antibodies.
The importance of functional clustering is to detect convergence. We can increase our sensitivity to detect convergence by looking for similar sequences. We can then identify clusters that contain sequences Functional clustering offers the greatest sensitivity to predict convergence.
AntiBERTa-enabled Paratope Prediction for Functional Clustering
AntiBERTa is designed to take the antibody sequence and predict the paratope. In validation testing, this model consistently outperformed other paratope, prediction models.
AntiBERTa analyzes the antibody sequences to predict the paratope of cluster. Then, functional clustering is achieved by grouping together similar antibodies based on the paratope sequences instead of CDR3 sequences. Subsequently, the fine-tuning of AntiBERTa plays a vital role in choosing the paratope. Fine-tuning AntiBerta was achieved by increasing the sensitivity and pre-trained the model on the same paratope data towards precision.
Applying Functional Clustering to Pancreatic Cancer Cohort
The study design featured:
Results:
Conclusion
Alchemab demonstrated that harnessing the power of a patient’s natural immunity offers the potential to cure disease by mining protective antibody repertoires from resilient individuals. The research shows that putting machine learning to work in a novel way led to the inception of AntiBERTa, and is shown to be a validated approach to re-discover antibody variants. AntiBERTa also is shown to be a viable paratope prediction tool applicable to functional clustering.