In the version of this article initially published, the statement "there are no pan-genomes for any other animal or plant species" was incorrect. The statement has been corrected to "there are no reported pan-genomes for any other animal species, to our knowledge." We thank David Edwards for bringing this error to our attention.
View Article and Find Full Text PDFWe used a deeply sequenced dataset of 910 individuals, all of African descent, to construct a set of DNA sequences that is present in these individuals but missing from the reference human genome. We aligned 1.19 trillion reads from the 910 individuals to the reference genome (GRCh38), collected all reads that failed to align, and assembled these reads into contiguous sequences (contigs).
View Article and Find Full Text PDFBioinformatics
February 2019
Motivation: General-purpose processors can now contain many dozens of processor cores and support hundreds of simultaneous threads of execution. To make best use of these threads, genomics software must contend with new and subtle computer architecture issues. We discuss some of these and propose methods for improving thread scaling in tools that analyze each read independently, such as read aligners.
View Article and Find Full Text PDFIntroduction: Although lung cancer is generally thought to be environmentally provoked, anecdotal familial clustering has been reported, suggesting that there may be genetic susceptibility factors. We systematically tested whether germline mutations in eight candidate genes may be risk factors for lung adenocarcinoma.
Methods: We studied lung adenocarcinoma cases for which germline sequence data had been generated as part of The Cancer Genome Atlas project but had not been previously analyzed.
Background: Novel fusion transcripts (FTs) caused by chromosomal rearrangement are common factors in the development of cancers. In the current study, the authors used massively parallel RNA sequencing to identify new FTs in colon cancers.
Methods: RNA sequencing (RNA-Seq) and TopHat-Fusion were used to identify new FTs in colon cancers.
The purpose of the online resource presented here, POPcorn (Project Portal for corn), is to enhance accessibility of maize genetic and genomic resources for plant biologists. Currently, many online locations are difficult to find, some are best searched independently, and individual project websites often degrade over time-sometimes disappearing entirely. The POPcorn site makes available (1) a centralized, web-accessible resource to search and browse descriptions of ongoing maize genomics projects, (2) a single, stand-alone tool that uses web Services and minimal data warehousing to search for sequence matches in online resources of diverse offsite projects, and (3) a set of tools that enables researchers to migrate their data to the long-term model organism database for maize genetic and genomic information: MaizeGDB.
View Article and Find Full Text PDFThe DFCI Gene Index Web pages provide access to analyses of ESTs and gene sequences for nearly 114 species, as well as a number of resources derived from these. Each species-specific database is presented using a common format with a home page. A variety of methods exist that allow users to search each species-specific database.
View Article and Find Full Text PDFTGICL is a pipeline for analysis of large Expressed Sequence Tags (EST) and mRNA databases in which the sequences are first clustered based on pairwise sequence similarity, and then assembled by individual clusters (optionally with quality values) to produce longer, more complete consensus sequences. The system can run on multi-CPU architectures including SMP and PVM.
View Article and Find Full Text PDFComparative genomics promises to rapidly accelerate the identification and functional classification of biologically important human genes. We developed the TIGR Orthologous Gene Alignment (TOGA;