Konrad Krawczyk, D.Phil., Chief Scientific Officer, NaturalAntibody, Germany
Konrad Krawczyk illustrates how computational learning systems can be applied towards therapeutic antibody discovery efforts. The presentation overview highlights the importance of adding a predictive layer to help facilitate antibody discovery.
The primary emphasis is that antibody research is evolving more rapidly and is producing hefty amounts of data. The research community needs a more streamlined avenue to scale and keep up with big data. Technology can provide solutions towards this objective.
Mapping the Elements for Computationally Focused ab Engineering
The presentation teaches us how machine learning is built or taught to be used for antibody engineering purposes. The following illustrates how the machine learning model breaks down the concept into specific data-specific needs which forms the foundation that can be built upon.
Data Sources - Data sources are abundant.
Data - Data is a necessary ingredient for any task.
Models - There is a need for a powerful algorithm model and benchmarking.
Platform - A viable platform makes the data analytics applicable and hands free.
These elements combined in one place can lead to biological insights. Therefore, applying these concepts together to work together for the purpose of engineering antibodies required a reorganizational process to develop a machine learning platform. The beginning steps surrounded acquiring antibody data with modeling in mind. Narrowing the industry-specific information into qualifying subsets:
Understanding How an Ab Informational Landscape Works
The idea is to combine automated with manual annotation or a combination of the two with an intent to construct a single point entry of all the databases in one HUB with buildable layers.
The HUB or machine learning database would then allow point and click access to the following components:
Added service-specific meta-data capable of powering the following function points:
Added layers of meta-data specific information such as:
Careful thought is applied to keep updating simple and efficient. Upkeep is important in this arena considering there are now 540,000 bio projects with inconsistent annotation which cannot be maintained manually.
Therefore, technology is built around the data to allow for easy searchability and with trained models which can contribute towards the development of a modeling specific library-type framework.
Modeling Antibodies with Therapeutics in Mind
Computation modelling works well primarily because of the speed; it is much faster than any other method. However, the antibody specific learnings are customized to deliver the type of information that would be the most useful. These customizations came into fruition with a set of considerations important to the user such as choosing the right model and how to make it work in a way that makes it high functioning and logical to the user.
Choosing the Right Model: Masked Language Learning
We learned that many parameters are not necessary to achieve superior performance. We also learned that one big dataset is not necessarily transferable to others (patents, nonhuman). The aim is for the model to perform antibody specific tasks related to predicting position in a sequence or select the optimal binding location.
Thought leadership regarding immunogenicity concerns and how to enhance the model space well enough to embed diverse antibody types remained a consistent message throughout the presentation. In addition, the performance criteria are compared to the current domain BERT models in use (AntiBerta, AntiBERTy,Ablang).
One of the main objectives is to make the machine model human-friendly by enhancing the functionality of the model making it able to act as an antibody search engine with capabilities for both single and multiple molecule views.
Summary
The machine learning-specific presentation forced the audience to examine the possibilities. There is a valuable data within our industries that can provide a wealth of valuable information towards drug discovery, but it requires the use of technology to place it at our fingertips.
The keys lie in training machine models more comprehensively on diverse ab datasets. As scientists, we should understand bigger models are not always ideal and there are many advantages and caveats associated with model engineering maps.
The strongest takeaway is there is significant justification to move aggressively forward with computationally driven models because of the enormity of data available. It simply cannot be handled manually in an efficient way.