Tremendous advancements have been made to broaden NGS applications from research to the clinic, especially as genomics becomes more integrated with precision medicine initiatives. In spite of this, enormous challenges for NGS still exist including real
time sequencing, data storage, processing, scaling, quality control management, security and compliance in the cloud, and interpretation. Track 7 presents case studies on these challenges.
Tuesday, May 15
7:00 am Workshop Registration Open (Commonwealth Hall) and Morning Coffee (Foyer)
8:00 – 11:30 Recommended Morning Pre-Conference Workshops*
W4. Introduction to Scalable and Reproducible RNA-Seq Data Processing, Analysis, and Result Reporting Using AWS, R, knitr, and LaTex
12:30 – 4:00 pm Recommended Afternoon Pre-Conference Workshops*
W11. Data Science Driving Better Informed Decisions
* Separate registration required.
2:00 – 6:30 Main Conference Registration Open (Commonwealth Hall)
4:00 PLENARY KEYNOTE SESSION (Amphitheater & Harborview 2)
5:00 – 7:00 Welcome Reception in the Exhibit Hall with Poster Viewing (Commonwealth Hall)
Wednesday, May 16
7:00 am Registration Open (Commonwealth Hall) and Morning Coffee (Foyer)
8:00 PLENARY KEYNOTE SESSION (Amphitheater & Harborview 2)
9:45 Coffee Break in the Exhibit Hall with Poster Viewing (Commonwealth Hall)
10:50 Chairperson’s Remarks
Johannes Goll, Director, Bioinformatics, The Emmes Corporation
11:00 KEYNOTE PRESENTATION: RNA-Seq X: Look Back and Look Ahead
Shanrong Zhao, PhD, Director, Computational Biology and Bioinformatics, Pfizer, Inc.
Since Dr. Mortazavi published his groundbreaking research entitled “Mapping and Quantifying Mammalian Transcriptomes by RNA-Seq” in Nature Methods in 2008, RNA-seq has evolved rapidly and revolutionized biological research, drug development
and clinical diagnostics. 2018 is the 10-year anniversary of RNA-seq, and it’s the right time to look back and look forward.
11:30 LCA: A Robust and Scalable Algorithm to Reveal Subtle Diversity in Large-Scale Single-Cell RNA Sequencing Data
Xiang Chen, PhD, Assistant Member, Department of Computational Biology, St. Jude Children’s Research Hospital
We developed Latent Cellular Analysis (LCA), a machine learning based single-cell RNA sequencing (scRNA-seq) analytical pipeline that combines similarity measurement by latent cellular states and a graph based clustering algorithm featuring dual-space
model search for both the optimal number of subpopulations and the informative cellular states distinguishing them. LCA has proved to be robust, accurate and powerful by comparison to multiple state-of-the-art computational methods on large-scale
real and simulated scRNA-seq data.
12:00 pm Sponsored Presentation (Opportunity Available)
12:15 RSEQREP: An Open-Source Cloud-Enabled Framework for Reproducible RNA-Seq Data Processing, Analysis & Result Reporting
Johannes Goll, Director, Bioinformatics, The Emmes Corporation
RSEQREP (RNA-Seq Reports) is a new open-source cloud-enabled framework that allows researchers to execute start-to-end RNA-Seq analysis to characterize transcriptomics changes in human cells following treatment. It outputs dynamically generated reports
using R and LaTeX. We provide results for a published RNA-Seq study to characterize transcriptomics changes following influenza vaccination.
12:30 Session Break
12:40 Luncheon
Presentation I: Querying of 100k Genomes Using Google Cloud
Hákon Gudbjartsson, PhD, Chief Informatics Officer, WuXi NextCODE
Hákon Gudbjartsson will demonstrate the power of the GOR database in real time. GORdb is used to organize, mine and share massive genome datasets, providing a global architecture for the largest precision medicine efforts worldwide. It’s
designed to enable fast, computationally-efficient use of sequence data, and allows for the query and application of data in the context of reference sets.
1:10 Luncheon Presentation II (Sponsorship Opportunity Available) or Enjoy Lunch on Your Own
1:40 Session Break
1:50 Chairperson’s Remarks
Leonard Lipovich, PhD, Associate Professor with Tenure, Center for Molecular Medicine and Genetics, Wayne State University
1:55 Analysis of Codon Optimized Therapeutic Proteins Using Ribosome Profiling
Chava Kimchi-Sarfaty, PhD, Research Chemist, Principal Investigator, OTAT Acting Deputy Associate Director for Research, Division of Plasma Protein Therapeutics, Office of Tissues and Advanced Therapies, FDA | CBER | OTAT
Codon optimization is a genetic engineering technique used to improve the yield of recombinant therapeutic proteins. Despite being used ubiquitously to increase protein expression, codon optimization requires widespread substitution of synonymous
codons across the native expression sequence. This degree of genetic manipulation can carry consequences, including altered conformation of the recombinant product. These unforeseen modifications can have impacts on protein function and health
outcomes, and are of high regulatory importance. To study these techniques, we have used ribosome profiling, a technique used to characterize the translation pattern of the ribosome across the mRNA transcript. In this technique, actively translating
ribosomes are cross‐linked to mRNA and is followed by nuclease digestion of mRNA not protected by a ribosome, generating short mRNA fragments (called “ribosome footprints”). These fragments are sequenced and aligned to generate a differential
coverage map across portions of the transcript. This technique provides insight into the relative translation efficiency in a given area of the transcript. We have analyzed the ribosome profiling data for relationships to codon usage. By identifying
regions of differential ribosome profiling patterns between wild type and codon optimized transcripts, we aim to create a method of selecting regions to leave unmodified, allowing recombinant proteins to benefit from increased expression while
maintaining the integrity and safety of the protein product. Codon optimization as a technique relies heavily on accurate codon usage statistics of the organism in question, to identify rare codons to be replaced with common codons for an increase
in translation efficiency. However, previous databases containing this information were either outdated or limited in scope. To address this gap in knowledge, we constructed a new database containing codon usage tables for all the species in GenBank
and RefSeq. We designed a program in Python to download, parse, and organize all the sequence data available in these two repositories, and in Javascript designed an accessible web portal available to the public to query the new database. The
new HIVE‐CUTs database contains substantially more organisms and coding sequence data and is a dramatic improvement upon prior databases. This tool will aid in the effective implementation of codon optimization techniques and other areas of recombinant
protein design.
2:25 Multidimensional Global Proteogenomics Identifies Persistent Ribosomal In-Frame Mis-Translation of Stop Codons as Amino Acids in Multiple Open Reading Frames from a Human Breast Cancer Long Non-Coding RNA
Leonard Lipovich, PhD, Associate Professor with Tenure, Center for Molecular Medicine and Genetics, Wayne State University
Two-thirds of the ~60,000 human genes (www.gencodegenes.org) do not encode known proteins, and aside from long non-coding RNA (lncRNA) genes with recently characterized functions, the possibility that these poorly understood genes’ transcripts
serve as de-facto unconventional messenger RNAs has not been formally excluded. Our group was the first to use direct evidence from protein mass spectrometry, preceding efforts that employed indirect evidence from ribosome profiling, to demonstrate
that specific lncRNAs are recurrently and nonrandomly translated in human cells (Bánfai et al 2012, Genome Research 22:1646-1657). In our current study, we integrated RNAseq, ribosome profiling, and mass spectrometry to globally assess
lncRNA translation in human estrogen receptor alpha positive MCF7 breast cancer cells. We identified 27 peptides, mapping to multiple sense-strand open reading frames (ORFs) of the lncRNA gene MMP24-AS1, united by a novel and highly unconventional
property: the existence of these peptides can only be explained by stop-to-nonstop in-frame replacements of specific UAG and UGA (but not UAA) stop codons by amino acids. This result, validated by the absence of any genomic mutations, polymorphisms,
and RNA editing events in genomic and cDNA targeted resequencing, represents an unprecedented apparent gene-specific violation of the Genetic Code in human breast cancer cells, and hints at a new mechanism enhancing the combinatorial complexity
of the cancer proteome.
[Note 1: This work has been funded in its entirety by the NIH Director’s New Innovator Award 1DP2-CA196375 to LL.]
[Note 2: This project encompasses collaborations. A full listing of co-authors will be
shown during the talk.]
2:55 CO-PRESENTATION: Workflow Optimization for NGS Discovery - How to Drive BIX Insights
Jack DiGiovanna, PhD, General Manager, NGS Applications and Services, Seven Bridges
Isaac M. Neuhaus, PhD, Director, Computational Genomics, Bristol Myers Squibb
3:25 Refreshment Break in the Exhibit Hall with Poster Viewing (Commonwealth Hall)
4:00 Variant Query Tool: Drag & Drop for a Scalable, Server-Less, Web UI to Querying Annotated Variants
William Van Etten, Senior Scientific Consultant, BioTeam
It’s a challenge to build an environment that provides real-time querying of reads and annotated variants for genomics research, requiring significant human and computational resources. Whether tens or thousands of genomes, the barrier to entry
can be high for the biologists/geneticist, who might not also be computer scientist. BioTeam has developed a simple tool that leverages several AWS services (S3, Athena, Lambda, Cognito, IAM, CloudWatch) to enable a biologists/geneticist to drag
& drop VCF and BAM files onto an S3 bucket, then point their web browser at this bucket, to provide a scalable, server-less, web UI to querying the reads and annotated variants within these files. We aim to demonstrate, explain, and promote
what we’ve learned from this proof of concept software development in the hope that others might benefit from our experience.
4:30 Building a GXP Validated Platform for NGS Analysis Pipelines
Anthony Rowe, PhD, Business Technology Leader, R&D IT, Janssen R&D LLC
An NGS applications approach the clinic the bioinformatics pipelines used to analyze the data have to be validated to demonstrate their correctness. This talk will present Janssen approach to deploying validated NGS applications with specific focus
in microbiome metagnomics.
5:00 LIMS or ELN, Which Do You Need?
Kevin Cramer, CEO, Sapio Sciences
Both Biotech and Pharma need Laboratory Information Management (LIMS) and Electronic Lab Notebook (ELN) capabilities. Sapio has eliminated the barriers between these two product areas by leveraging its more than decade of unique experience offering
both LIMS and ELN solutions and combining the key features of each solution into one, best of breed, product: Exemplar ELN Pro.
5:30 Best of Show Awards Reception in the Exhibit Hall with Poster Viewing (Commonwealth Hall)
7:00 – 10:00 Bio-IT World After Hours @Lawn on D
**Conference
Registration Required. Please bring your conference badge, wristband, and photo ID for entry.
Thursday, May 17
7:30 am Registration Open(Commonwealth Hall) and Morning Coffee (Foyer)
8:00 PLENARY KEYNOTE SESSION & AWARDS PROGRAM (Amphitheater & Harborview 2)
9:45 Coffee Break in the Exhibit Hall and Poster Competition Winners Announced (Commonwealth Hall)
10:30 Chairperson’s Remarks
Bruce Press, Executive Vice President, Business Development & Strategy, Seven Bridges Genomics
10:40 Instantiating a Single Point of Truth for Genomic Reference Data
David Herzig, Scientist, Research Informatics, Roche Pharmaceuticals
This talk will exemplify how expression and mutation data were made actionable by consolidating a scattered landscape of genomic reference data into a real SPoT.
11:10 A Network-Based Approach to Understanding Drug Toxicity
Yue Webster, PhD, Principal Research Scientist, Informatics Capabilities, Research IT, Eli Lilly and Company
Despite investment in toxicogenomics, nonclinical safety studies are still used to predict clinical liabilities for new drug candidates. Network-based approaches for genomic analysis help overcome challenges with whole-genome transcriptional profiling
using limited numbers of treatments for phenotypes of interest. Herein, we apply co-expression network analysis to safety assessment using rat liver gene expression data to define 415 modules, exhibiting unique transcriptional control, organized
in a visual representation of the transcriptome. Compared to gene-level analysis alone, the network approach identifies significantly more phenotype-gene associations, including established and novel biomarkers of liver injury.
11:40 Advancing Clinical NGS Test Development Using Thousands of Pediatric Cancer Samples on St. Jude Cloud
Michael Rusch, Director of Bioinformatics Research Development, St. Jude Children's Research Hospital
12:10 pm Enjoy Lunch on Your Own
1:20 Dessert Refreshment Break in the Exhibit Hall with Poster Viewing (Commonwealth Hall)
1:55 Chairperson’s Remarks
John Methot, Director, Health Informatics Architecture, Dana-Farber Cancer Institute
2:00 Disease Classification in the Era of Data-Intensive Medicine
Kanix Wang, PhD, Research Professional, Booth School of Business, Institute for Genomics & Systems Biology, University of Chicago
We used insurance claims for over one-third of the U.S. population to create a subset of 128,989 families (481,657 unique individuals). Using these data, we estimated the heritability and familial environmental patterns of 149 diseases. We then
computed the environmental and genetic disease classifications for a set of 29 complex diseases after inferring their pairwise genetic and environmental correlations.
2:30 Enviro-Geno-Pheno State Approach and State-Based Biomarkers for Differentiation, Prognosis, Subtypes, and Staging
Lei Xu, PhD, Director, Centre for Cognitive Machines and Computational Health; Zhiyuan Chair Professor, Department of Computer Science and Engineering, Shanghai Jiao Tong University
In the joint space of geno-measures, pheno-measures, and enviro-measures, one point represents a bio-system behavior and a subset of points that locate adjacently and share a common system status represents a ‘state’. The system is
characterized by such states learned from samples. This enviro-geno-pheno state is considered a biomarker, indicating ‘health/normal’ versus ‘risk/abnormal’ together with its associated enviro-geno-pheno condition.
3:00 PANEL DISCUSSION: Can We Improve Breast Cancer Patient Outcomes through Artificial Intelligence?
Maya Said, ScD, President & CEO, Outcomes4me, Inc. (Moderator)
Panelists:
Regina Barzilay, PhD, MacArthur Fellow and Delta Electronics Professor, Massachusetts Institute of Technology (MIT) Department of Electrical Engineering and Computer Science; Member, Computer Science and Artificial Intelligence
Laboratory, MIT
Kevin Hughes, MD, Co-Director, Avon Breast Evaluation Program, Massachusetts General Hospital; Associate Professor of Surgery, Harvard Medical School; Medical Director, Bermuda Cancer Genetics Risk Assessment Clinic
Osama Rahma, MD, Assistant Professor of Medicine, Center For Immuno-Oncology, Dana-Farber Cancer Institute
Newly diagnosed cancer patients attempting to understand their treatment options face the overwhelming task of filtering an information deluge, much of which is irrelevant, outdated and occasionally inaccurate. Additionally, matching their
diagnosis to best-in-class treatments or potential clinical trials, while simultaneously learning to navigate an extremely complex healthcare system is daunting, even for the most highly trained physicians. We will explore various platforms
aimed at improving patient outcomes by leveraging technology to help educate, track, and connect patients with personalized resources while simultaneously working to improve the care continuum and the development of new treatments. We
will explore the nexus of healthcare networks and their IT systems, clinical decision-making and delivery, R&D, and patients, for whom we all create our innovation solutions. Attendees will be interested to understand how various groups
are working to increase value across the entire system by bringing laboratory, clinical and pharmaceutical science, real-world evidence and patient-reported data together with technology and artificial intelligence to solve health challenges.
These approaches offer the opportunity to generate deeper insights into how therapies perform in the real world and harness that understanding to improve efficiency, effectiveness, value, and ultimately, patient care.
4:00 Conference Adjourns