Focused learning by antibody language models using preferential masking of non-templated regions.

Patterns (N Y)

Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA.

Published: June 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Existing antibody language models (AbLMs) are pre-trained using a masked language modeling (MLM) objective with uniform masking probabilities. While these models excel at predicting germline residues, they often struggle with mutated and non-templated residues, which concentrate in the complementarity-determining regions (CDRs) and are crucial for antigen binding specificity. Here, we demonstrate that preferential masking of the primarily non-templated CDR3 is a compute-efficient strategy to enhance model performance. We pre-trained two AbLMs using either uniform or preferential masking and observed that the latter improves residue prediction accuracy in the highly variable CDR3. Preferential masking also improves antibody classification by native chain pairing and binding specificity, suggesting improved CDR3 understanding and indicating that non-random, learnable patterns help govern antibody chain pairing. We further show that specificity classification is largely informed by residues in the CDRs, demonstrating that AbLMs learn meaningful patterns that align with immunological understanding.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12191730PMC
http://dx.doi.org/10.1016/j.patter.2025.101239DOI Listing

Publication Analysis

Top Keywords

preferential masking
16
antibody language
8
language models
8
masking non-templated
8
binding specificity
8
chain pairing
8
masking
5
focused learning
4
antibody
4
learning antibody
4

Similar Publications

Glyphosate, a widely used organophosphorus herbicide in agriculture, poses potential threats to aquatic ecosystems and human health due to its long-term environmental persistence. This study presents a spectroscopic detection system based on a competitive reaction utilizing the Ponceau 4R (P4R)-Cu complex. Leveraging glyphosate's high affinity for chelating copper ions, the method enables indirect, rapid, and visual quantitative analysis of glyphosate.

View Article and Find Full Text PDF

The influence of glycine on -lactoglobulin amyloid fibril formation - computer simulation study.

Z Phys Chem (N F)

November 2024

Faculty of Chemistry and Chemical Technology, University of Ljubljana, Večna pot, 113, SI-1000 Ljubljana, Slovenia.

Amyloids are protein aggregates involved in various protein condensation diseases. Our study aims to investigate the influence of glycine on the fibrillization mechanism of -lactoglobulin (BLG), a model protein known to form amyloid fibrils from hydrolysed peptides in low pH aqueous solutions. We conducted atomistic molecular dynamics simulations of aqueous solutions of native and unfolded BLG in glycine buffer at pH 2.

View Article and Find Full Text PDF

[Airway management in children : What should be known in pediatric anesthesia].

Anaesthesiologie

August 2025

Klinik für Anaesthesiologie, LMU-Klinikum, LMU München, Marchioninistraße 15, 81377, München, Deutschland.

Pediatric airway management can be much more difficult due to physiological and anatomical characteristics. Special attention should be paid to signs for a difficult airway when taking an anesthesiological anamnesis. This applies especially to children with syndromale diseases.

View Article and Find Full Text PDF

Background: Radiological imaging plays an indispensable role in both preclinical and clinical studies of multiple myeloma (MM). However, manual quantification in longitudinal small animal PET/CT is limited by annotator bias, signal artifacts from urinary/fecal excretion, and voxel misalignment due to non-rigid registration. To address these challenges and improve characterization of tumor biology, we developed a semi-automated PET/CT quantification pipeline targeting defined regions of interest (ROIs) within the bone marrow-rich mouse skeleton, achieving sub-organ spatial resolution, including in anatomically complex sites such as the pelvis.

View Article and Find Full Text PDF

Existing antibody language models (AbLMs) are pre-trained using a masked language modeling (MLM) objective with uniform masking probabilities. While these models excel at predicting germline residues, they often struggle with mutated and non-templated residues, which concentrate in the complementarity-determining regions (CDRs) and are crucial for antigen binding specificity. Here, we demonstrate that preferential masking of the primarily non-templated CDR3 is a compute-efficient strategy to enhance model performance.

View Article and Find Full Text PDF