Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background: The COVID-19 pandemic has caused over 776 million infections and 7 million deaths globally between December 2019 and November 2024. Since the emergence of the original Wuhan strain, SARS-CoV-2 has evolved into multiple variants-including Alpha, Delta, and Omicron-primarily through mutations in the Spike glycoprotein. The S1 subunit, which binds the human angiotensin-converting enzyme 2 (ACE2) receptor, mutates frequently and plays a key role in infectivity and immune escape, while the more conserved S2 subunit mediates membrane fusion. Anticipating future mutations is essential for guiding vaccine design and therapeutic strategies. Generative Large Language Models (LLMs) have shown promise in protein sequence modeling due to their capacity to produce realistic and functional synthetic sequences. Here, we introduce SARITA, a GPT-3-based LLM with up to 1.2 billion parameters, fine-tuned via continual learning on the protein model RITA trained on 107 017 high-quality SARS-CoV-2 Spike sequences (up to March 1st 2021) to generate high-quality synthetic SARS-CoV-2 Spike S1 subunits.

Results: SARITA is able to generate realistic, full-length synthetic S1 subunits starting from a 14-amino-acid prompt. When evaluated on unseen sequences collected between March 2021 and November 2023-including major Variants of Concern (VOCs) such as Delta and Omicron, and Variants of Interest such as Iota-SARITA outperforms baseline and state-of-the-art LLMs in terms of sequence quality, biological plausibility, and similarity to real-world viral evolution. SARITA generates high-quality sequences in over 97% of cases, with markedly lower False Mutation Rate and higher similarity scores (PAM30, Levenshtein distance) compared to alternative approaches. It also accurately reproduces key mutations characteristic of future variants-such as L212I, R158L, T95P, and E406K-which were not present in the training data but emerged later in VOCs like Omicron and Delta. Structure-based analysis confirms the functional plausibility of these substitutions, with ΔΔG values within experimentally supported thresholds for ACE2 and antibody binding. Furthermore, SARITA anticipates immune-evasive mutations and accurately captures the positional and statistical distribution of mutations found in post- March 1st 2021 variants, highlighting its potential as a predictive tool for viral evolution.

Conclusion: These results indicate the potential of SARITA to predict future SARS-CoV-2 S1 evolution, potentially aiding in the development of adaptable vaccines and treatments.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12319310PMC
http://dx.doi.org/10.1093/bib/bbaf384DOI Listing

Publication Analysis

Top Keywords

sars-cov-2 spike
12
large language
8
march 1st
8
1st 2021
8
sarita
6
mutations
5
sarita large
4
language model
4
model generating
4
generating subunit
4

Similar Publications

New SARS-CoV-2 variants continue to emerge and may cause new waves of COVID-19. Antibody evasion is a major driver of variant emergence but variants can also exhibit altered capacity to enter lung cells and to use ACE2 species orthologues for cell entry. Here, we assessed cell line tropism, usage of ACE2 orthologues and antibody evasion of variant MC.

View Article and Find Full Text PDF

A bivalent SARS-CoV-2 subunit vaccine for cats neutralizes both the original ancestral strain and BA.1 Pseudovirus carrying the 453F and 501 T mutation.

Vaccine

September 2025

College of Veterinary Medicine, Huazhong Agricultural University, Wuhan 430070, China; National Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan 430070, China; Hubei Jiangxia Laboratory, Wuhan 430200, China. Electronic address:

The spillover and spillback of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) between humans and animals, especially companion animals, threaten global public health security. However, risk assessment of SARS-CoV-2 variants infecting companion animals and the development of corresponding prevention and control technologies are lacking. The aim of this study is to assess the potential risk of enhancement of the infectivity of SARS-CoV-2 in cats owing to mutations at key sites within the spike (S) protein receptor-binding domain (RBD) region and develop an efficient vaccine to cross-neutralize high-risk SARS-CoV-2 variants.

View Article and Find Full Text PDF

The COVID-19 pandemic remains a global health crisis, with successive SARS-CoV-2 variants exhibiting enhanced transmissibility and immune evasion. Notably, the Omicron variant harbors extensive mutations in the spike protein's receptor-binding domain (RBD), altering viral fitness. While temperature is a critical environmental factor modulating viral stability and transmission, its molecular-level effects on variant-specific RBD-human angiotensin-converting enzyme 2 (hACE2) interactions remain underexplored.

View Article and Find Full Text PDF

Objectives: This study compared the diagnostic accuracy of seven different commercial serological assays for COVID-19, using RT-PCR as the gold standard, through meta-analysis and indirect comparison.

Methods: Fifty-seven studies, published from November 2019 to June 2024, were included. The diagnostic performance of IgA, IgG, and total antibody assays for SARS-CoV-2 was assessed.

View Article and Find Full Text PDF

Purpose: SARS-CoV-2 infection may lead to a worse prognosis in COVID-19 patients by inducing syncytia formation which implies intercellular transmission and immune evasion. Hesperidin (HSD) and hesperetin (HST) are two citrus flavonoids that demonstrate the potential to interfere with spike/human angiotensin-converting enzyme-2 (hACE2) binding and show an inhibitory effect in the SARS-CoV-2 pseudovirus internalization model. Here, we determined the effects of HSD and HST to inhibit syncytia formation using in vitro cell models.

View Article and Find Full Text PDF