Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background: Biological sequences are increasing rapidly and exponentially worldwide. Nucleotide sequence databases play an important role in providing meaningful genomic information on a variety of biological organisms.

Results: The getSequenceInfo software tool allows to access sequence information from various public repositories (GenBank, RefSeq, and the European Nucleotide Archive), and is compatible with different operating systems (Linux, MacOS, and Microsoft Windows) in a programmatic way (command line) or as a graphical user interface. getSequenceInfo or gSeqI v1.0 should help users to get some information on queried sequences that could be useful for specific studies (e.g. the country of origin/isolation or the release date of queried sequences). Queries can be made to retrieve sequence data based on a given kingdom and species, or from a given date. This program allows the separation between chromosomes and plasmids (or other genetic elements/components) by arranging each component in a given folder. Some basic statistics are also performed by the program (such as the calculation of GC content for queried assemblies). An empirically designed nucleotide ratio is calculated using nucleotide information in order to tentatively provide a "NucleScore" for studied genome assemblies. Besides the main gSeqI tool, other additional tools have been developed to perform various tasks related to sequence analysis.

Conclusion: The aim of this study is to democratize the use of public repositories in programmatic ways, and to facilitate sequence data analysis in a pedagogical perspective. Output results are available in FASTA, FASTQ, Excel/TSV or HTML formats. The program is freely available at: https://github.com/karubiotools/getSequenceInfo . getSequenceInfo and supplementary tools are partly available through the recently released Galaxy KaruBioNet platform ( http://calamar.univ-ag.fr/c3i/galaxy_karubionet.html ).

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9264741PMC
http://dx.doi.org/10.1186/s12859-022-04809-5DOI Listing

Publication Analysis

Top Keywords

public repositories
12
sequence public
8
queried sequences
8
sequence data
8
sequence
6
getsequenceinfo
4
getsequenceinfo suite
4
suite tools
4
tools allowing
4
allowing genome
4

Similar Publications

Aims/hypothesis: Alpha cell dysregulation is an integral part of type 2 diabetes pathophysiology, increasing fasting as well as postprandial glucose concentrations. Alpha cell dysregulation occurs in tandem with the development of insulin resistance and changes in beta cell function. Our aim was to investigate, using mathematical modelling, the role of alpha cell dysregulation in beta cell compensatory insulin secretion and subsequent failure in the progression from normoglycaemia to type 2 diabetes defined by ADA criteria.

View Article and Find Full Text PDF

PERC: a suite of software tools for the curation of cryoEM data with application to simulation, modeling and machine learning.

Acta Crystallogr F Struct Biol Commun

October 2025

Science and Technology Facilities Council, Research Complex at Harwell, Didcot OX11 0FA, United Kingdom.

Ease of access to data, tools and models expedites scientific research. In structural biology there are now numerous open repositories of experimental and simulated data sets. Being able to easily access and utilize these is crucial to allow researchers to make optimal use of their research effort.

View Article and Find Full Text PDF

Introduction: Vaping among children and young people (CYP) has increased globally over the past decade, with rates stabilising in the UK in recent years. Factors such as curiosity, social influence, stress management and attractive flavours contribute to its popularity. Although the long-term health impacts are uncertain, vaping poses risks including nicotine dependence, cardiovascular and respiratory issues, and cognitive impairment, though evidence on long-term effects is still emerging.

View Article and Find Full Text PDF

Accurate attribution of the areas and populations impacted by climate-related events often relies on linear distance-based methods, where the study unit is assigned temperature data to the closest weather station. We developed a novel method and data pipeline that provides a grid-based measure of exposure to extreme heat and cold events called Grid EXposure (, enabling linkage to individual-level human health data at different spatial scales. GridEX automates the gathering of station-based climatological data and provides estimates of apparent temperature, offering a more comprehensive representation of human thermal comfort and perceived temperature.

View Article and Find Full Text PDF

Directory of Public Datasets for Youth Mental Health to Enhance Research Through Data, Accessibility, and Artificial Intelligence: Scoping Review.

JMIR Ment Health

September 2025

Department of Psychology, University of California, Los Angeles, 1285 Franz Hall, Box 951563, Los Angeles, CA, 90095, United States, 1 3107941262.

Background: Youth mental health issues have been recognized as a pressing crisis in the United States in recent years. Effective, evidence-based mental health research and interventions require access to integrated datasets that consolidate diverse and fragmented data sources. However, researchers face challenges due to the lack of centralized, publicly available datasets, limiting the potential for comprehensive analysis and data-driven decision-making.

View Article and Find Full Text PDF