Publications by authors named "Ling-Hong Hung"

Background: Post-acute sequelae of SARS-CoV-2 infection (PASC) affects millions globally, yet the molecular mechanisms underlying acute COVID-19 and its chronic sequelae remain poorly understood.

Methods: We performed an integrative transcriptomic analysis of three independent RNA-seq datasets, capturing the complete COVID-19 pathophysiology from health through acute severe infection to post-acute sequelae and mortality (n=142 total samples). We implemented a containerized analytical pipeline from data download, quantification, differential gene expression to uniformly process these three RNA-seq datasets.

View Article and Find Full Text PDF

Spatial proteomics provides a spatially resolved view of protein expression and localization within cells and tissues by mapping the location and abundance of proteins. There is a need for fully-integrated end-to-end imaging workflows for spatial proteomic analysis that are flexible, high-throughput, and support graphical and interactive visualizations. We present a modular and interactive spatial proteomic image analysis workflow with individual steps containerized that empowers biomedical researchers to reproducibly execute and customize complex analyses.

View Article and Find Full Text PDF

Cancer data is widely available in repositories such as the National Cancer Institute (NCI) Genomic Data Commons (GDC). These datasets could serve as controls or comparisons in compendium analyses with user data, avoiding the expense and time of generating additional datasets. However, the user must be able to process their new data in the same manner for these comparisons to be useful.

View Article and Find Full Text PDF

Recent advances in functional genomics and human cellular models have substantially enhanced our understanding of the structure and regulation of the human genome. However, our grasp of the molecular functions of human genes remains incomplete and biased towards specific gene classes. The Molecular Phenotypes of Null Alleles in Cells (MorPhiC) Consortium aims to address this gap by creating a comprehensive catalogue of the molecular and cellular phenotypes associated with null alleles of all human genes using in vitro multicellular systems.

View Article and Find Full Text PDF

We present the Biodepot Launcher, a desktop application that facilitates installation, management and deployment of bioinformatics workflows using the Biodepot-workflow-builder (Bwb). With the new app, Bwb can be started by double-clicking on an icon, eliminating the need for typing cryptic start up commands into the terminal. This creates an end-to-end graphical and easy-to-use interface to manage and launch containerized workflows on the local computer or cloud instances.

View Article and Find Full Text PDF

Extracellular matrices direct the formation of mineral constituents into self-assembled mineralized tissues. We investigate the protein and mineral constituents to better understand the underlying mechanisms that lead to mineralized tissue formation. Specifically, we study the protein-hydroxyapatite interactions that govern the development and homeostasis of teeth and bone in the oral cavity.

View Article and Find Full Text PDF

Introduction: Metabolic syndrome (MetS) is a threat to the active component military as it impacts health, readiness, retention, and cost to the Military Health System. The most prevalent risk factors documented in service members' health records are high blood pressure (BP), low high-density lipoprotein cholesterol, and elevated triglycerides. Other risk factors include abdominal obesity and elevated fasting blood glucose.

View Article and Find Full Text PDF

Recurrent gene fusions are common drivers of disease pathophysiology in leukemias. Identifying these structural variants helps stratify disease by risk and assists with therapy choice. Precise molecular diagnosis in low-and-middle-income countries (LMIC) is challenging given the complexity of assays, trained technical support, and the availability of reliable electricity.

View Article and Find Full Text PDF

Background: This article presents the Container Profiler, a software tool that measures and records the resource usage of any containerized task. Our tool profiles the CPU, memory, disk, and network utilization of containerized tasks collecting over 60 Linux operating system metrics at the virtual machine, container, and process levels. The Container Profiler supports performing time-series profiling at a configurable sampling interval to enable continuous monitoring of the resources consumed by containerized tasks and pipelines.

View Article and Find Full Text PDF

Modern biomedical image analyses workflows contain multiple computational processing tasks giving rise to problems in reproducibility. In addition, image datasets can span both spatial and temporal dimensions, with additional channels for fluorescence and other data, resulting in datasets that are too large to be processed locally on a laptop. For omics analyses, software containers have been shown to enhance reproducibility, facilitate installation and provide access to scalable computational resources on the cloud.

View Article and Find Full Text PDF

Background: Long-read sequencing has great promise in enabling portable, rapid molecular-assisted cancer diagnoses. A key challenge in democratizing long-read sequencing technology in the biomedical and clinical community is the lack of graphical bioinformatics software tools which can efficiently process the raw nanopore reads, support graphical output and interactive visualizations for interpretations of results. Another obstacle is that high performance software tools for long-read sequencing data analyses often leverage graphics processing units (GPU), which is challenging and time-consuming to configure, especially on the cloud.

View Article and Find Full Text PDF

We present the BioDepot-workflow-builder (Bwb), a software tool that allows users to create and execute reproducible bioinformatics workflows using a drag-and-drop interface. Graphical widgets represent Docker containers executing a modular task. Widgets are linked graphically to build bioinformatics workflows that can be reproducibly deployed across different local and cloud platforms.

View Article and Find Full Text PDF

Summary: For many next generation-sequencing pipelines, the most computationally intensive step is the alignment of reads to a reference sequence. As a result, alignment software such as the Burrows-Wheeler Aligner is optimized for speed and is often executed in parallel on the cloud. However, there are other less demanding steps that can also be optimized to significantly increase the speed especially when using many threads.

View Article and Find Full Text PDF

Background: Using software containers has become standard practice to reproducibly deploy and execute biomedical workflows on the cloud. However, some applications that contain time-consuming initialization steps will produce unnecessary costs for repeated executions.

Findings: We demonstrate that hot-starting from containers that have been frozen after the application has already begun execution can speed up bioinformatics workflows by avoiding repetitive initialization steps.

View Article and Find Full Text PDF

Objective: Bioinformatics publications typically include complex software workflows that are difficult to describe in a manuscript. We describe and demonstrate the use of interactive software notebooks to document and distribute bioinformatics research. We provide a user-friendly tool, BiocImageBuilder, that allows users to easily distribute their bioinformatics protocols through interactive notebooks uploaded to either a GitHub repository or a private server.

View Article and Find Full Text PDF

Inferring genetic networks from genome-wide expression data is extremely demanding computationally. We have developed fastBMA, a distributed, parallel, and scalable implementation of Bayesian model averaging (BMA) for this purpose. fastBMA also includes a computationally efficient module for eliminating redundant indirect edges in the network by mapping the transitive reduction to an easily solved shortest-path problem.

View Article and Find Full Text PDF

Background: Software container technology such as Docker can be used to package and distribute bioinformatics workflows consisting of multiple software implementations and dependencies. However, Docker is a command line-based tool, and many bioinformatics pipelines consist of components that require a graphical user interface.

Results: We present a container tool called GUIdock-VNC that uses a graphical desktop sharing system to provide a browser-based interface for containerized software.

View Article and Find Full Text PDF

Reproducibility is vital in science. For complex computational methods, it is often necessary, not just to recreate the code, but also the software and hardware environment to reproduce results. Virtual machines, and container software such as Docker, make it possible to reproduce the exact environment regardless of the underlying hardware and operating system.

View Article and Find Full Text PDF

The 1,000 plants (1KP) project is an international multi-disciplinary consortium that has generated transcriptome data from over 1,000 plant species, with exemplars for all of the major lineages across the Viridiplantae (green plants) clade. Here, we describe how to access the data used in a phylogenomics analysis of the first 85 species, and how to visualize our gene and species trees. Users can develop computational pipelines to analyse these data, in conjunction with data of their own that they can upload.

View Article and Find Full Text PDF

Motivation: fast_protein_cluster is a fast, parallel and memory efficient package used to cluster 60 000 sets of protein models (with up to 550 000 models per set) generated by the Nutritious Rice for the World project.

Results: fast_protein_cluster is an optimized and extensible toolkit that supports Root Mean Square Deviation after optimal superposition (RMSD) and Template Modeling score (TM-score) as metrics. RMSD calculations using a laptop CPU are 60× faster than qcprot and 3× faster than current graphics processing unit (GPU) implementations.

View Article and Find Full Text PDF

Protein structure information is essential to understand protein function. Computational methods to accurately predict protein structure from the sequence have primarily been evaluated on protein sequences representing full-length native proteins. Here, we demonstrate that top-performing structure prediction methods can accurately predict the partial structures of proteins encoded by sequences that contain approximately 50% or more of the full-length protein sequence.

View Article and Find Full Text PDF

Motivation: Accurate comparisons of different protein structures play important roles in structural biology, structure prediction and functional annotation. The root-mean-square-deviation (RMSD) after optimal superposition is the predominant measure of similarity due to the ease and speed of computation. However, global RMSD is dependent on the length of the protein and can be dominated by divergent loops that can obscure local regions of similarity.

View Article and Find Full Text PDF

Background: Calculation of the root mean square deviation (RMSD) between the atomic coordinates of two optimally superposed structures is a basic component of structural comparison techniques. We describe a quaternion based method, GPU-Q-J, that is stable with single precision calculations and suitable for graphics processor units (GPUs). The application was implemented on an ATI 4770 graphics card in C/C++ and Brook+ in Linux where it was 260 to 760 times faster than existing unoptimized CPU methods.

View Article and Find Full Text PDF

De novo protein structure prediction methods attempt to predict tertiary structures from sequences based on general principles that govern protein folding energetics and/or statistical tendencies of conformational features that native structures acquire, without the use of explicit templates. A general paradigm for de novo prediction involves sampling the conformational space, guided by scoring functions and other sequence-dependent biases, such that a large set of candidate ("decoy") structures are generated, and then selecting native-like conformations from those decoys using scoring functions as well as conformer clustering. High-resolution refinement is sometimes used as a final step to fine-tune native-like structures.

View Article and Find Full Text PDF