MQF and buffered MQF: quotient filters for efficient storage of k-mers with their counts and metadata.

BMC Bioinformatics

Department of Population Health and Reproduction, School of Veterinary Medicine, University of California, Davis, CA, USA.

Published: February 2021


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background: Specialized data structures are required for online algorithms to efficiently handle large sequencing datasets. The counting quotient filter (CQF), a compact hashtable, can efficiently store k-mers with a skewed distribution.

Result: Here, we present the mixed-counters quotient filter (MQF) as a new variant of the CQF with novel counting and labeling systems. The new counting system adapts to a wider range of data distributions for increased space efficiency and is faster than the CQF for insertions and queries in most of the tested scenarios. A buffered version of the MQF can offload storage to disk, trading speed of insertions and queries for a significant memory reduction. The labeling system provides a flexible framework for assigning labels to member items while maintaining good data locality and a concise memory representation. These labels serve as a minimal perfect hash function but are ~ tenfold faster than BBhash, with no need to re-analyze the original data for further insertions or deletions.

Conclusions: The MQF is a flexible and efficient data structure that extends our ability to work with high throughput sequencing data.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7885209PMC
http://dx.doi.org/10.1186/s12859-021-03996-xDOI Listing

Publication Analysis

Top Keywords

quotient filter
8
insertions queries
8
data
6
mqf
5
mqf buffered
4
buffered mqf
4
mqf quotient
4
quotient filters
4
filters efficient
4
efficient storage
4

Similar Publications

Aberrant predictive learning along the positive schizotypy - autistic traits continuum: evidence from ambiguous social information processing.

Asian J Psychiatr

August 2025

Shanghai Key Laboratory of Mental Health and Psychological Crisis Intervention, Affiliated Mental Health Center (ECNU), School of Psychology and Cognitive Science, East China Normal University, Shanghai, China; Shanghai Changning Mental Health Center, Affiliated Mental Health Center of East China No

Deficits in social information processing have been observed in both schizophrenia spectrum disorders (SSD) and autism spectrum disorder (ASD), though the underlying mechanisms may differ. From a predictive coding perspective, such deficits are thought to arise from an overreliance on prior expectations in SSD, whereas individuals with ASD may exhibit difficulties in forming or using such expectations. However, very few studies have investigated the behavioral markers underlying social predictive learning along the ASD-SSD continuum.

View Article and Find Full Text PDF

Benzotriazole (BT) and benzothiazole (BTH) derivatives as emerging marine pollutants in the Pearl River Estuary: Spatiotemporal distribution, environmental fates, and ecological risks.

Environ Pollut

July 2025

Guangxi Laboratory on the Study of Coral Reefs in the South China Sea, Coral Reef Research Center of China, School of Marine Sciences, Guangxi University, Nanning, 530004, China.

The estuarine environment is a complex and dynamic ecosystem in the transition zone where rivers meet the sea. However, information on the distribution, fate, and risks of novel organic ultraviolet filters and stabilizers (OUVs) in estuaries remains limited. In this study, the occurrence of benzotriazole (BT) and benzothiazole (BTH) derivatives in seawater, suspended particulate matter (SPM) and sediments in the Pearl River Estuary (PRE) was analysed.

View Article and Find Full Text PDF

Objectives: Autism Spectrum Disorder (ASD) is a neurodevelopmental condition characterized by social challenges and repetitive behaviors, influenced by genetic and environmental factors. Autistic traits, including variations in sensory processing, exist across both clinical and subclinical spectrums, impacting individuals with and without an ASD diagnosis. Given the importance of understanding sensory processing in individuals with autism, this essay aims to explore the relationship between autistic traits and sensory hypersensitivity within the general population.

View Article and Find Full Text PDF

Occurrence and distribution of melamine and its derivatives in the diverse aquatic environments in China.

Sci Rep

July 2025

Key Laboratory of Public Health Safety of Ministry of Education, School of Public Health, Fudan University, Shanghai, 200032, China.

Due to the special characteristics of low octanol-water partition coefficient and high-water solubility, melamine (MEL) and its derivatives released from daily products will be preferentially allocated to aquatic environments. However, no study has reported the occurrence and distribution of MEL and its derivatives in the diverse aquatic environments in China. This study collected 8 categories of 121 water samples in different regions of China, including bottled water, tap water, swimming pool water, river and lake water, precipitation, well water, filtered water, and network pipe water.

View Article and Find Full Text PDF

Po (polonium) has high radiotoxicity that tends to accumulate in marine bivalves inhabiting coastal zones with elevated background levels of primordial radionuclides. In this study, the Po concentration in two organs of the Asian green mussel (Perna viridis), collected from the coastal region of Binh Thuan, Vietnam, was determined using alpha spectrometry. The results showed a significant difference in Po activity concentration between the two organs of P.

View Article and Find Full Text PDF