Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

The khmer package is a freely available software library for working efficiently with fixed length DNA words, or k-mers. khmer provides implementations of a probabilistic k-mer counting data structure, a compressible De Bruijn graph representation, De Bruijn graph partitioning, and digital normalization. khmer is implemented in C++ and Python, and is freely available under the BSD license at  https://github.com/dib-lab/khmer/.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4608353PMC
http://dx.doi.org/10.12688/f1000research.6924.1DOI Listing

Publication Analysis

Top Keywords

bruijn graph
8
khmer
4
khmer software
4
software package
4
package enabling
4
enabling efficient
4
efficient nucleotide
4
nucleotide sequence
4
sequence analysis
4
analysis khmer
4

Similar Publications

Motivation: Long-read sequencing enables complete bacterial genome assemblies, but individual assemblers are imperfect and often produce sequence-level and structural errors. Consensus assembly using Trycycler can improve accuracy, but its lack of automation limits scalability. There is a need for an automated method to generate high-quality consensus bacterial genome assemblies from long-read data.

View Article and Find Full Text PDF

Accurate cancer subtyping with accompanying molecular characterization is critical for precision oncology. While machine learning approaches have been applied to both digital pathology and cancer genomics, previous work has been limited in sample size and has typically aggregated granular cancer subtypes into coarse groupings , likely obfuscating informative molecular and prognostic associations and phenotypic variation of more detailed tumor subtypes. Accordingly, we collated 378,123 hematoxylin and eosin (H&E)-stained whole-slide images (WSIs) with matched targeted DNA clinical sequencing results and OncoTree detailed cancer subtypes from a real-world cohort of 71,142 patients.

View Article and Find Full Text PDF

The repeat content and heterozygosity rate of a target genome are important factors in determining the feasibility of achieving a complete telomere-to-telomere assembly. The mathematical relationship between the required coverage and read length for the purpose of unique reconstruction remains unexplored for diploid genomes. We investigate the information-theoretic conditions that the given set of sequencing reads must satisfy to achieve the complete reconstruction of the true sequence of a diploid genome up to switch errors.

View Article and Find Full Text PDF

Metagenomic CRISPR Array Analysis Tool: a novel graph-based approach to finding CRISPR arrays in metagenomic datasets.

Microlife

July 2025

University of Stuttgart, Institute of Biomedical Genetics, Department of RNA Biology and Bioinformatics, Allmandring 31, 70569 Stuttgart, Germany.

Clustered Regularly Interspersed Short Palindromic Repeats and CRISPR-associated genes (CRISPR-Cas) is a bacterial immune system also famous for its use in genome editing. The diversity of known systems could be significantly increased by metagenomic data. Here we present the Metagenomic CRISPR Array Analysis Tool (MCAAT), a highly sensitive algorithm for finding CRISPR arrays in unassembled metagenomic data.

View Article and Find Full Text PDF

Current Progress in Phased Genome Assembly from Long-Read DNA Sequencing Data.

Methods Mol Biol

July 2025

Systems and computing Engineering Department, Universidad de los Andes, Bogotá, Colombia.

Genome assembly is a core task in the field of genomics. The availability of long-read sequencing technologies enabled the construction of high-quality complex genomes, including phasing of heterozygous contigs. This chapter provides an overview of the main algorithmic techniques for genome assembly.

View Article and Find Full Text PDF