Unveiling the intricate mechanisms of protein-based language models in scientific research
In a groundbreaking study led by researchers at the Massachusetts Institute of Technology (MIT), sparse autoencoders have been used to unlock the mysteries hidden within protein language models (PLMs). This innovative approach could revolutionise the way we understand these complex models, potentially leading to better choices for models in specific tasks such as identifying new drugs or vaccine targets.
Sparse autoencoders (SAEs) work by transforming dense PLM embeddings into sparse activations, making the inner workings of PLMs more transparent and human-interpretable. By expanding a protein's representation within a neural network from a constrained number of neurons to a larger number, features are able to "spread out" more meaningfully.
Researchers trained SAEs on protein-level and amino acid-level embeddings from models like ESM2. The resulting sparse neurons or features strongly activate on proteins sharing common biological functions or structural families. By performing Gene Ontology (GO) enrichment analysis, these sparse features can be associated with concrete biological functions, such as metabolic pathways, enzymatic activities, or sensory perception roles.
Many of these sparse features align with known protein families and biochemical functions, offering clear biological interpretation. Furthermore, automated large language model (LLM) tools, like Anthropic’s Claude, aid in interpreting these sparse features by relating model neurons to protein families and molecular roles, further enhancing understanding without human bias.
Beyond static interpretation, variants like "transcoders" also learn sparse approximations of transformations between layers in protein models, providing insights into how biological information is organised hierarchically within these deep models. This supports a better understanding of the flow and abstraction of protein features during model processing.
The biological insights gained from this study include the identification of protein families and enzymatic functions embedded implicitly in PLM representations. Researchers were also able to link specific neurons to molecular functions, discover functional groups in proteins that may not be obvious from sequence alone, and improve trust and transparency in PLMs, fostering safer and more explainable AI-driven biological research.
This study, published in the Proceedings of the National Academy of Sciences, was led by Onkar Gujral, an MIT graduate student, and was funded by the National Institutes of Health. Previous research conducted by Berger and colleagues in 2021 used a protein language model to predict which sections of viral surface proteins are less likely to mutate in a way that enables viral escape, allowing them to identify possible targets for vaccines against influenza, HIV, and SARS-CoV-2.
Understanding what features a particular protein model is encoding could help researchers choose the right model for a specific task, potentially leading to more accurate predictions and discoveries. This process of making the nodes more interpretable could potentially help to open up the "black box" of protein language models and understand their inner workings.
In conclusion, the application of sparse autoencoders to protein language models has the potential to transform our understanding of these complex models, enabling researchers to map latent model features to concrete biological entities and functions. This could open new avenues for understanding both model behavior and protein biology, ultimately leading to more accurate and explainable AI-driven biological research.