A microdroplet co-culture system is useful for the parallel assessment of numerous possible cell-cell interactions by generating isolated subcommunities from a pool of heterogeneous cells. However, the integration of single-cell sequencing into such analysis has been limited due to the lack of effective molecular identifiers for each in-droplet subcommunity. Herein, we present a strategy for generating in-droplet subcommunity identifiers using DNA-functionalized microparticles encapsulated within microdroplets.
View Article and Find Full Text PDFAdvances in experimental technologies, such as DNA sequencing, have opened up new avenues for the applications of phylogenetic methods to various fields beyond their traditional application in evolutionary investigations, extending to the fields of development, differentiation, cancer genomics, and immunogenomics. Thus, the importance of phylogenetic methods is increasingly being recognized, and the development of a novel phylogenetic approach can contribute to several areas of research. Recently, the use of hyperbolic geometry has attracted attention in artificial intelligence research.
View Article and Find Full Text PDFClin Cancer Res
November 2019
Purpose: The epithelial-to-mesenchymal transition, the major process by which some cancer cells convert from an epithelial phenotype to a mesenchymal one, has been suggested to drive chemo-resistance and/or metastasis in patients with cancer. However, only a few studies have demonstrated the presence of CD45/CD326 doubly-positive cells (CD45/CD326 DPC) in cancer. We deployed a combination of cell surface markers to elucidate the phenotypic heterogeneity in non-small cell lung cancer (NSCLC) cells and identified a new subpopulation that is doubly-positive for epithelial and non-epithelial cell-surface markers in both NSCLC cells and patients' malignant pleural effusions.
View Article and Find Full Text PDFHum Genome Var
June 2019
HLA-VBSeq is an HLA calling tool developed to infer the most likely HLA types from high-throughput sequencing data. However, there is still room for improvement in specific genetic groups because of the diversity of HLA alleles in human populations. Here, we present HLA-VBSeq v2, a software application that makes use of a new Japanese HLA reference panel to enhance calling accuracy for Japanese HLA class-I genes.
View Article and Find Full Text PDFIn recent genome analyses, population-specific reference panels have indicated important. However, reference panels based on short-read sequencing data do not sufficiently cover long insertions. Therefore, the nature of long insertions has not been well documented.
View Article and Find Full Text PDFPurpose: A prospective cohort study for pregnant women, the Maternity Log study, was designed to construct a time-course high-resolution reference catalogue of bioinformatic data in pregnancy and explore the associations between genomic and environmental factors and the onset of pregnancy complications, such as hypertensive disorders of pregnancy, gestational diabetes mellitus and preterm labour, using continuous lifestyle monitoring combined with multiomics data on the genome, transcriptome, proteome, metabolome and microbiome.
Participants: Pregnant women were recruited at the timing of first routine antenatal visits at Tohoku University Hospital, Sendai, Japan, between September 2015 and November 2016. Of the eligible women who were invited, 65.
Personalized healthcare (PHC) based on an individual's genetic make-up is one of the most advanced, yet feasible, forms of medical care. The Tohoku Medical Megabank (TMM) Project aims to combine population genomics, medical genetics and prospective cohort studies to develop a critical infrastructure for the establishment of PHC. To date, a TMM CommCohort (adult general population) and a TMM BirThree Cohort (birth+three-generation families) have conducted recruitments and baseline surveys.
View Article and Find Full Text PDFHuman leukocyte antigen (HLA) is a gene complex known for its exceptional diversity across populations, importance in organ and blood stem cell transplantation, and associations of specific alleles with various diseases. We constructed a Japanese reference panel of class I HLA genes (ToMMo HLA panel), comprising a distinct set of HLA-A, HLA-B, HLA-C, and HLA-H alleles, by single-molecule, real-time (SMRT) sequencing of 208 individuals included in the 1070 whole-genome Japanese reference panel (1KJPN). For high-quality allele reconstruction, we developed a novel pipeline, Primer-Separation Assembly and Refinement Pipeline (PSARP), in which the SMRT sequencing and additional short-read data were used.
View Article and Find Full Text PDFBackground: In the estimation of repeat numbers in a short tandem repeat (STR) region from high-throughput sequencing data, two types of strategies are mainly taken: a strategy based on counting repeat patterns included in sequence reads spanning the region and a strategy based on estimating the difference between the actual insert size and the insert size inferred from paired-end reads. The quality of sequence alignment is crucial, especially in the former approaches although usual alignment methods have difficulty in STR regions due to insertions and deletions caused by the variations of repeat numbers.
Results: We proposed a new dynamic programming based realignment method named STR-realigner that considers repeat patterns in STR regions as prior knowledge.
Background: Genome-wide association studies have revealed associations between single-nucleotide polymorphisms (SNPs) and phenotypes such as disease symptoms and drug tolerance. To address the small sample size for rare variants, association studies tend to group gene or pathway level variants and evaluate the effect on the set of variants. One of such strategies, known as the sequential kernel association test (SKAT), is a widely used collapsing method.
View Article and Find Full Text PDFBackground: Two types of approaches are mainly considered for the repeat number estimation in short tandem repeat (STR) regions from high-throughput sequencing data: approaches directly counting repeat patterns included in sequence reads spanning the region and approaches based on detecting the difference between the insert size inferred from aligned paired-end reads and the actual insert size. Although the accuracy of repeat numbers estimated with the former approaches is high, the size of target STR regions is limited to the length of sequence reads. On the other hand, the latter approaches can handle STR regions longer than the length of sequence reads.
View Article and Find Full Text PDFBackground: RNA-sequencing (RNA-Seq) has become a popular tool for transcriptome profiling in mammals. However, accurate estimation of allele-specific expression (ASE) based on alignments of reads to the reference genome is challenging, because it contains only one allele on a mosaic haploid genome. Even with the information of diploid genome sequences, precise alignment of reads to the correct allele is difficult because of the high-similarity between the corresponding allele sequences.
View Article and Find Full Text PDFThe Tohoku Medical Megabank Organization reports the whole-genome sequences of 1,070 healthy Japanese individuals and construction of a Japanese population reference panel (1KJPN). Here we identify through this high-coverage sequencing (32.4 × on average), 21.
View Article and Find Full Text PDFThe Tohoku Medical Megabank Organization constructed the reference panel (referred to as the 1KJPN panel), which contains >20 million single nucleotide polymorphisms (SNPs), from whole-genome sequence data from 1070 Japanese individuals. The 1KJPN panel contains the largest number of haplotypes of Japanese ancestry to date. Here, from the 1KJPN panel, we designed a novel custom-made SNP array, named the Japonica array, which is suitable for whole-genome imputation of Japanese individuals.
View Article and Find Full Text PDFBMC Genomics
December 2015
Background: Human leucocyte antigen (HLA) genes play an important role in determining the outcome of organ transplantation and are linked to many human diseases. Because of the diversity and polymorphisms of HLA loci, HLA typing at high resolution is challenging even with whole-genome sequencing data.
Results: We have developed a computational tool, HLA-VBSeq, to estimate the most probable HLA alleles at full (8-digit) resolution from whole-genome sequence data.
BMC Bioinformatics
May 2015
Background: With the recent development of microarray and high-throughput sequencing (HTS) technologies, a number of studies have revealed catalogs of copy number variants (CNVs) and their association with phenotypes and complex traits. In parallel, a number of approaches to predict CNV regions and genotypes are proposed for both microarray and HTS data. However, only a few approaches focus on haplotyping of CNV loci.
View Article and Find Full Text PDFBackground: High-throughput RNA sequencing (RNA-Seq) enables quantification and identification of transcripts at single-base resolution. Recently, longer sequence reads become available thanks to the development of new types of sequencing technologies as well as improvements in chemical reagents for the Next Generation Sequencers. Although several computational methods have been proposed for quantifying gene expression levels from RNA-Seq data, they are not sufficiently optimized for longer reads (e.
View Article and Find Full Text PDFBMC Syst Biol
November 2014
Background: Structural variations (SVs), such as insertions, deletions, inversions, and duplications, are a common feature in human genomes, and a number of studies have reported that such SVs are associated with human diseases. Although the progress of next generation sequencing (NGS) technologies has led to the discovery of a large number of SVs, accurate and genome-wide detection of SVs remains challenging. Thus far, various calling algorithms based on NGS data have been proposed.
View Article and Find Full Text PDFMotivation: Variant calling from genome-wide sequencing data is essential for the analysis of disease-causing mutations and elucidation of disease mechanisms. However, variant calling in low coverage regions is difficult due to sequence read errors and mapping errors. Hence, variant calling approaches that are robust to low coverage data are demanded.
View Article and Find Full Text PDF