Publications by authors named "Shaowei An"

Ulcerative colitis (UC) is a chronic inflammatory bowel disease that remains incurable. Although current medications can alleviate symptoms, treatment still faces major challenges such as side effects and drug resistance. Therefore, the development of new drugs is urgently needed.

View Article and Find Full Text PDF

Several lossy compressors have achieved superior compression rates for mass spectrometry (MS) data at the cost of storage precision. Currently, the impacts of precision losses on MS data processing have not been thoroughly evaluated, which is critical for the future development of lossy compressors. We first evaluated different storage precision (32 bit and 64 bit) in lossless mzML files.

View Article and Find Full Text PDF

Accurate metabolite annotation and false discovery rate (FDR) control remain challenging in large-scale metabolomics. Recent progress leveraging proteomics experiences and interdisciplinary inspirations has provided valuable insights. While target-decoy strategies have been introduced, generating reliable decoy libraries is difficult due to metabolite complexity.

View Article and Find Full Text PDF

Background: As a gold-standard quantitative technique based on mass spectrometry, multiple reaction monitoring (MRM) has been widely used in proteomics and metabolomics. In the analysis of MRM data, as no peak picking algorithm can achieve perfect accuracy, manual inspection is necessary to correct the errors. In large cohort analysis scenarios, the time required for manual inspection is often considerable.

View Article and Find Full Text PDF

Background: Plate design is a necessary and time-consuming operation for GC/LC-MS-based sample preparation. The implementation of the inter-batch balancing algorithm and the intra-batch randomization algorithm can have a significant impact on the final results. For researchers without programming skills, a stable and efficient online service for plate design is necessary.

View Article and Find Full Text PDF

Background: Liquid chromatography-mass spectrometry is widely used in untargeted metabolomics for composition profiling. In multi-run analysis scenarios, features of each run are aligned into consensus features by feature alignment algorithms to observe the intensity variations across runs. However, most of the existing feature alignment methods focus more on accurate retention time correction, while underestimating the importance of feature matching.

View Article and Find Full Text PDF

With the continuous improvement of biological detection technology, the scale of biological data is also increasing, which overloads the central-computing server. The use of edge computing in 5G networks can provide higher processing performance for large biological data analysis, reduce bandwidth consumption and improve data security. Appropriate data compression and reading strategy becomes the key technology to implement edge computing.

View Article and Find Full Text PDF

Introduction: Metabolomics analysis based on liquid chromatography-mass spectrometry (LC-MS) has been a prevalent method in the metabolic field. However, accurately quantifying all the metabolites in large metabolomics sample cohorts is challenging. The analysis efficiency is restricted by the abilities of software in many labs, and the lack of spectra for some metabolites also hinders metabolite identification.

View Article and Find Full Text PDF

Motivation: Liquid chromatography coupled with high-resolution mass spectrometry is widely used in composition profiling in untargeted metabolomics research. While retaining complete sample information, mass spectrometry (MS) data naturally have the characteristics of high dimensionality, high complexity, and huge data volume. In mainstream quantification methods, none of the existing methods can perform direct 3D analysis on lossless profile MS signals.

View Article and Find Full Text PDF

As the pervasive, standardized format for interchange and deposition of raw mass spectrometry (MS) proteomics and metabolomics data, text-based mzML is inefficiently utilized on various analysis platforms due to its sheer volume of samples and limited read/write speed. Most research on compression algorithms rarely provides flexible random file reading scheme. Database-developed solution guarantees the efficiency of random file reading, but nevertheless the efforts in compression and third-party software support are insufficient.

View Article and Find Full Text PDF

Background: With the precision of the mass spectrometry (MS) going higher, the MS file size increases rapidly. Beyond the widely-used open format mzML, near-lossless or lossless compression algorithms and formats emerged in scenarios with different precision requirements. The data precision is often related to the instrument and subsequent processing algorithms.

View Article and Find Full Text PDF