Publications by Armando J Pinho | LitMetric

Publications by authors named "Armando J Pinho"

Page 1 of 2

JARVIS3: an efficient encoder for genomic data.

Maria J P Sousa , Armando J Pinho , Diogo Pratas

Bioinformatics

November 2024

Motivation: Large-scale genomic projects grapple with the complex challenge of reducing medium- and long-term storage space and its associated energy consumption, monetary costs, and environmental footprint.

Results: We present JARVIS3, an advanced tool engineered for the efficient reference-free compression of genomic sequences. JARVIS3 introduces a pioneering approach, specifically through enhanced table memory models and probabilistic lookup-tables applied in repeat models.

View Article and Find Full Text PDF

AltaiR: a C toolkit for alignment-free and temporal analysis of multi-FASTA data.

Jorge M Silva , Armando J Pinho , Diogo Pratas

Gigascience

January 2024

Background: Most viral genome sequences generated during the latest pandemic have presented new challenges for computational analysis. Analyzing millions of viral genomes in multi-FASTA format is computationally demanding, especially when using alignment-based methods. Most existing methods are not designed to handle such large datasets, often requiring the analysis to be divided into smaller parts to obtain results using available computational resources.

View Article and Find Full Text PDF

AlcoR: alignment-free simulation, mapping, and visualization of low-complexity regions in biological data.

Jorge M Silva , Weihong Qi , Armando J Pinho , Diogo Pratas

Gigascience

December 2022

Background: Low-complexity data analysis is the area that addresses the search and quantification of regions in sequences of elements that contain low-complexity or repetitive elements. For example, these can be tandem repeats, inverted repeats, homopolymer tails, GC-biased regions, similar genes, and hairpins, among many others. Identifying these regions is crucial because of their association with regulatory and structural characteristics.

View Article and Find Full Text PDF

Concentration of inverted repeats along human DNA.

Carlos A C Bastos , Vera Afreixo , João M O S Rodrigues , Armando J Pinho

J Integr Bioinform

June 2023

This work aims to describe the observed enrichment of inverted repeats in the human genome; and to identify and describe, with detailed length profiles, the regions with significant and relevant enriched occurrence of inverted repeats. The enrichment is assessed and tested with a recently proposed measure (-scores based measure). We simulate a genome using an order 7 Markov model trained with the data from the real genome.

View Article and Find Full Text PDF

On the Impact of the Data Acquisition Protocol on ECG Biometric Identification.

Mariana S Ramos , João M Carvalho , Armando J Pinho , Susana Brás

Sensors (Basel)

July 2021

Electrocardiographic (ECG) signals have been used for clinical purposes for a long time. Notwithstanding, they may also be used as the input for a biometric identification system. Several studies, as well as some prototypes, are already based on this principle.

View Article and Find Full Text PDF

AC2: An Efficient Protein Sequence Compression Tool Using Artificial Neural Networks and Cache-Hash Models.

Milton Silva , Diogo Pratas , Armando J Pinho

Entropy (Basel)

April 2021

Recently, the scientific community has witnessed a substantial increase in the generation of protein sequence data, triggering emergent challenges of increasing importance, namely efficient storage and improved data analysis. For both applications, data compression is a straightforward solution. However, in the literature, the number of specific protein sequence compressors is relatively low.

View Article and Find Full Text PDF

Efficient DNA sequence compression with neural networks.

Milton Silva , Diogo Pratas , Armando J Pinho

Gigascience

November 2020

Background: The increasing production of genomic data has led to an intensified need for models that can cope efficiently with the lossless compression of DNA sequences. Important applications include long-term storage and compression-based data analysis. In the literature, only a few recent articles propose the use of neural networks for DNA sequence compression.

View Article and Find Full Text PDF

Multimodal Emotion Evaluation: A Physiological Model for Cost-Effective Emotion Classification.

Gisela Pinto , João M Carvalho , Filipa Barros , Sandra C Soares , Armando J Pinho

Sensors (Basel)

June 2020

Emotional responses are associated with distinct body alterations and are crucial to foster adaptive responses, well-being, and survival. Emotion identification may improve peoples' emotion regulation strategies and interaction with multiple life contexts. Several studies have investigated emotion classification systems, but most of them are based on the analysis of only one, a few, or isolated physiological signals.

View Article and Find Full Text PDF

Smash++: an alignment-free and memory-efficient tool to find genomic rearrangements.

Morteza Hosseini , Diogo Pratas , Burkhard Morgenstern , Armando J Pinho

Gigascience

May 2020

Background: The development of high-throughput sequencing technologies and, as its result, the production of huge volumes of genomic data, has accelerated biological and medical research and discovery. Study on genomic rearrangements is crucial owing to their role in chromosomal evolution, genetic disorders, and cancer.

Results: We present Smash++, an alignment-free and memory-efficient tool to find and visualize small- and large-scale genomic rearrangements between 2 DNA sequences.

View Article and Find Full Text PDF

Distribution of Distances Between Symmetric Words in the Human Genome: Analysis of Regular Peaks.

Carlos A C Bastos , Vera Afreixo , João M O S Rodrigues , Armando J Pinho , Raquel M Silva

Interdiscip Sci

September 2019

Finding DNA sites with high potential for the formation of hairpin/cruciform structures is an important task. Previous works studied the distances between adjacent reversed complement words (symmetric word pairs) and also for non-adjacent words. It was observed that for some words a few distances were favoured (peaks) and that in some distributions there was strong peak regularity.

View Article and Find Full Text PDF

AC: A Compression Tool for Amino Acid Sequences.

Morteza Hosseini , Diogo Pratas , Armando J Pinho

Interdiscip Sci

March 2019

Advancement of protein sequencing technologies has led to the production of a huge volume of data that needs to be stored and transmitted. This challenge can be tackled by compression. In this paper, we propose AC, a state-of-the-art method for lossless compression of amino acid sequences.

View Article and Find Full Text PDF

Metagenomic Composition Analysis of an Ancient Sequenced Polar Bear Jawbone from Svalbard.

Diogo Pratas , Morteza Hosseini , Gonçalo Grilo , Armando J Pinho , Raquel M Silva

Genes (Basel)

September 2018

The sequencing of ancient DNA samples provides a novel way to find, characterize, and distinguish exogenous genomes of endogenous targets. After sequencing, computational composition analysis enables filtering of undesired sources in the focal organism, with the purpose of improving the quality of assemblies and subsequent data analysis. More importantly, such analysis allows extinct and extant species to be identified without requiring a specific or new sequencing run.

View Article and Find Full Text PDF

Cryfa: a secure encryption tool for genomic data.

Morteza Hosseini , Diogo Pratas , Armando J Pinho

Bioinformatics

January 2019

Summary: The ever-increasing growth of high-throughput sequencing technologies has led to a great acceleration of medical and biological research and discovery. As these platforms advance, the amount of information for diverse genomes increases at unprecedented rates. Confidentiality, integrity and authenticity of such genomic information should be ensured due to its extremely sensitive nature.

View Article and Find Full Text PDF

Comparison of Compression-Based Measures with Application to the Evolution of Primate Genomes.

Diogo Pratas , Raquel M Silva , Armando J Pinho

Entropy (Basel)

May 2018

An efficient DNA compressor furnishes an approximation to measure and compare information quantities present in, between and across DNA sequences, regardless of the characteristics of the sources. In this paper, we compare directly two information measures, the Normalized Compression Distance (NCD) and the Normalized Relative Compression (NRC). These measures answer different questions; the NCD measures how similar both strings are (in terms of information content) and the NRC (which, in general, is nonsymmetric) indicates the fraction of one of them that cannot be constructed using information from the other one.

View Article and Find Full Text PDF

Biometric and Emotion Identification: An ECG Compression Based Method.

Susana Brás , Jacqueline H T Ferreira , Sandra C Soares , Armando J Pinho

Front Psychol

April 2018

We present an innovative and robust solution to both biometric and emotion identification using the electrocardiogram (ECG). The ECG represents the electrical signal that comes from the contraction of the heart muscles, indirectly representing the flow of blood inside the heart, it is known to convey a key that allows biometric identification. Moreover, due to its relationship with the nervous system, it also varies as a function of the emotional state.

View Article and Find Full Text PDF

DNA word analysis based on the distribution of the distances between symmetric words.

Ana H M P Tavares , Armando J Pinho , Raquel M Silva , João M O S Rodrigues , Carlos A C Bastos

Sci Rep

April 2017

We address the problem of discovering pairs of symmetric genomic words (i.e., words and the corresponding reversed complements) occurring at distances that are overrepresented.

View Article and Find Full Text PDF

ECG biometric identification: A compression based approach.

Susana Bras , Armando J Pinho

Annu Int Conf IEEE Eng Med Biol Soc

August 2015

Using the electrocardiogram signal (ECG) to identify and/or authenticate persons are problems still lacking satisfactory solutions. Yet, ECG possesses characteristics that are unique or difficult to get from other signals used in biometrics: (1) it requires contact and liveliness for acquisition (2) it changes under stress, rendering it potentially useless if acquired under threatening. Our main objective is to present an innovative and robust solution to the above-mentioned problem.

View Article and Find Full Text PDF

Analysis-Driven Lossy Compression of DNA Microarray Images.

Miguel Hernández-Cabronero , Ian Blanes , Armando J Pinho , Michael W Marcellin , Joan Serra-Sagristà

IEEE Trans Med Imaging

February 2016

DNA microarrays are one of the fastest-growing new technologies in the field of genetic research, and DNA microarray images continue to grow in number and size. Since analysis techniques are under active and ongoing development, storage, transmission and sharing of DNA microarray images need be addressed, with compression playing a significant role. However, existing lossless coding algorithms yield only limited compression performance (compression ratios below 2:1), whereas lossy coding methods may introduce unacceptable distortions in the analysis process.

View Article and Find Full Text PDF

An alignment-free method to find and visualise rearrangements between pairs of DNA sequences.

Diogo Pratas , Raquel M Silva , Armando J Pinho , Paulo J S G Ferreira

Sci Rep

May 2015

Species evolution is indirectly registered in their genomic structure. The emergence and advances in sequencing technology provided a way to access genome information, namely to identify and study evolutionary macro-events, as well as chromosome alterations for clinical purposes. This paper describes a completely alignment-free computational method, based on a blind unsupervised approach, to detect large-scale and small-scale genomic rearrangements between pairs of DNA sequences.

View Article and Find Full Text PDF

Three minimal sequences found in Ebola virus genomes and absent from human DNA.

Raquel M Silva , Diogo Pratas , Luísa Castro , Armando J Pinho , Paulo J S G Ferreira

Bioinformatics

August 2015

Motivation: Ebola virus causes high mortality hemorrhagic fevers, with more than 25 000 cases and 10 000 deaths in the current outbreak. Only experimental therapies are available, thus, novel diagnosis tools and druggable targets are needed.

Results: Analysis of Ebola virus genomes from the current outbreak reveals the presence of short DNA sequences that appear nowhere in the human genome.

View Article and Find Full Text PDF

MAFCO: a compression tool for MAF files.

Luís M O Matos , António J R Neves , Diogo Pratas , Armando J Pinho

PLoS One

February 2016

In the last decade, the cost of genomic sequencing has been decreasing so much that researchers all over the world accumulate huge amounts of data for present and future use. These genomic data need to be efficiently stored, because storage cost is not decreasing as fast as the cost of sequencing. In order to overcome this problem, the most popular general-purpose compression tool, gzip, is usually used.

View Article and Find Full Text PDF

XS: a FASTQ read simulator.

Diogo Pratas , Armando J Pinho , João M O S Rodrigues

BMC Res Notes

January 2014

Background: The emerging next-generation sequencing (NGS) is bringing, besides the natural huge amounts of data, an avalanche of new specialized tools (for analysis, compression, alignment, among others) and large public and private network infrastructures. Therefore, a direct necessity of specific simulation tools for testing and benchmarking is rising, such as a flexible and portable FASTQ read simulator, without the need of a reference sequence, yet correctly prepared for producing approximately the same characteristics as real data.

Findings: We present XS, a skilled FASTQ read simulation tool, flexible, portable (does not need a reference sequence) and tunable in terms of sequence complexity.

View Article and Find Full Text PDF

DNA sequences at a glance.

Armando J Pinho , Sara P Garcia , Diogo Pratas , Paulo J S G Ferreira

PLoS One

September 2014

Data summarization and triage is one of the current top challenges in visual analytics. The goal is to let users visually inspect large data sets and examine or request data with particular characteristics. The need for summarization and visual analytics is also felt when dealing with digital representations of DNA sequences.

View Article and Find Full Text PDF

Inter-STOP symbol distances for the identification of coding regions.

Carlos A C Bastos , Vera Afreixo , Sara P Garcia , Armando J Pinho

J Integr Bioinform

November 2013

In this study we explore the potential of inter-STOP symbol distances for finding coding regions in DNA sequences. We use the distance between STOP symbols in the DNA sequence and a chi-square statistic to evaluate the nonhomogeneity of the three possible reading frames and the occurrence of one long distance in one of the frames. The results of this exploratory study suggest that inter-STOP symbol distances have strong ability to discriminate coding regions in prokaryotes and simple eukaryotes.

View Article and Find Full Text PDF

MFCompress: a compression tool for FASTA and multi-FASTA data.

Armando J Pinho , Diogo Pratas

Bioinformatics

January 2014

Motivation: The data deluge phenomenon is becoming a serious problem in most genomic centers. To alleviate it, general purpose tools, such as gzip, are used to compress the data. However, although pervasive and easy to use, these tools fall short when the intention is to reduce as much as possible the data, for example, for medium- and long-term storage.

View Article and Find Full Text PDF