Tremendous advancements have been made to broaden NGS applications from research to the clinic. Especially as genomics becomes more integrated with precision medicine initiatives. In spite of this, enormous challenges for NGS still exist including data
analysis pipelines and platforms; data integration, interpretation and visualization; application of sequencing to cancer, immunology, diagnostics, and therapeutic development and emerging sequencing technologies. The Next-Gen Sequencing Informatics
track presents case studies on these challenges.
Final Agenda
Tuesday, April 16
7:00 am Workshop Registration Open and Morning Coffee
8:00 – 11:30 Recommended Morning Pre-Conference Workshops*
W6. DNA Sequencing 101
12:30 – 4:00 pm Recommended Afternoon Pre-Conference Workshops*
W12. Data Science Driving Better Informed Decisions
* Separate registration required.
2:00 – 6:30 Main Conference Registration Open
4:00 PLENARY KEYNOTE SESSION
Amphitheater
5:00 – 7:00 Welcome Reception in the Exhibit Hall with Poster Viewing
Wednesday, April 17
7:30 am Registration Open and Morning Coffee
8:00 PLENARY KEYNOTE SESSION
Amphitheater
9:45 Coffee Break in the Exhibit Hall with Poster Viewing
10:50 Chairperson’s Remarks
David LaBrosse, Director, Genomics, Research, Life Sciences & Healthcare, NetApp
11:00 Long Read Sequencing
Justin Zook, PhD, Researcher, National Institute of Standards and Technology
11:20 NovoGraph: Loading 7 Human Genomes into Graphs
Evan Biederstedt, Computational Biologist, Memorial Sloan Kettering Cancer Center
11:40 Building a Usable Human Pangenome: A Human Pangenomics Hackathon Run by NCBI at UCSC
Ben Busby, PhD, Scientific Lead, NCBI Hackathons Group, National Center for Biotechnology Information (NCBI)
12:00 pm Co-Presentation: Faster Genomic Data
Michael Hultner, PhD, Senior Vice President, Strategy; General Manager, US Operations, PetaGene
David LaBrosse, Director, Genomics, Research, Life Sciences & Healthcare, NetApp
Genetic testing demand is driving up the volume of genomic data that must be processed, analyzed, and stored. Gigabyte-scale genome sample files and terabyte- to petabyte-scale cohort data sets must be moved from data generation to processing to analysis
sites, historically a slow, arduous process. NetApp and PetaGene will describe compression and data transfer technologies that overcome I/O bottlenecks to accelerate the movement of genomic data and reduce the time to process and analyze it.
12:30 Session Break
12:40 Luncheon Presentation I: Deep Phenotypic and Genomic Analysis of UK Biobank Data on the WuXi NextCODE Platform
Saliha Yilmaz, PhD, Research Geneticist, WuXi NextCODE
The increasing size and complexity of genetic and phenotypic data to include hundreds of thousands of participants poses a significant challenge for data storage and analysis. We demonstrate use of the GOR database and query language underlying our
platform to mine UK Biobank and other datasets for efficient phenotype selection, GWAS and PheWAS, and to archive and query the results.
1:10 NEW: Luncheon Co-Presentation II: Optimizing Drug Discovery and Development with Data-Driven Insights
Christian Frech, PhD, Associate Director, Scientific Operations, Seven Bridges
Serhat Tetikol, Research & Development Engineer, Seven Bridges
1:40 Session Break
1:50 Chairperson’s Remarks
Jeffrey Rosenfeld, PhD, Manager of the Biomedical Informatics Shared Resource and Assistant Professor of Pathology, Rutgers Cancer Institute of NJ
1:55 AbbVie’s Target and Genomics Compilation (ATGC): A Target Knowledge Platform
Rishi Gupta, PhD, Senior Research Scientist, Information Research, AbbVie, Inc.
Author: Anne-Sophie Barthelet, Scientific Developer, Discngine
ATGC is a web-based platform that allows AbbVie scientists to gather relevant information to make accurate decisions on target ID, target validation, biomarker selection and drug discovery. This platform provides in-depth information on several key
pieces of information such as gene expression, RNA expression, protein expression, mouse knockout studies, etc. for each target. This talk focuses on key aspects of this application including application architecture, currently available tool
sets and how various pieces of information are provided to the user.
2:25 Self Service Data Visualization and Exploration at Genentech Research
Kiran Mukhyala, Senior Software Engineer, Bioinformatics and Computational Biology, Genentech Research and Early Development
Genomic data requires specialized infrastructure to enable data exploration and analysis at scale. We built an integrated, modular, end-to-end gene expression analysis platform implementing data import, storage, processing, analysis and visualization.
The multi-layered architecture of the platform supports general, high-level applications for self-service analytics, as well as infrastructure for prototyping, incubating and integrating scientist-driven innovations. The platform coexists with
other in-house and commercial software to provide a wide range of genomic data analysis and visualization options for Research scientists.
2:55 Exploring and Visualizing Single-cell RNA Sequencing Data
Michael DeRan, PhD, Scientific Consultant, Diamond Age Data Science
Recent advances in single-cell RNA sequencing (scRNA-seq) technology have made this powerful method accessible to many researchers, but have not brought with them a clear, simple workflow for data analysis. As the number of scRNA-seq datasets has
increased, so too has the number of analysis tools available; for those looking to perform their first scRNA-seq analysis the range of options can seem daunting. In working with our clients, I have had the opportunity to apply many different tools
to scRNA-seq data from a variety of tissues and organisms. I have used this experience to select a set of tools that are flexible and suitable to many common scRNA-seq analysis tasks. In this talk I will introduce popular tools and methods for
identifying cell populations, assessing differential expression and visualizing biological processes. I will discuss common pitfalls encountered in analyzing this data and make recommendations that anyone can use in their own analysis.
3:25 Refreshment Break in the Exhibit Hall with Poster Viewing, Meet the Experts: Bio-IT World Editorial Team, and Book Signing with Joseph Kvedar, MD, Author, The Internet of Healthy Things℠ (Book will be available
for purchase onsite)
4:00 Comparison of Different Approaches for Clinical Cancer Sequencing
Jeffrey Rosenfeld, PhD, Manager of the Biomedical Informatics Shared Resource and Assistant Professor of Pathology, Rutgers Cancer Institute of NJ
The sequencing of tumors is important for guiding the treatment of cancer patients. While it is agreed that there is a need to perform sequencing of the tumor, there are a wide variety of approaches ranging from paired whole genome tumor-normal
sequencing to tumor-only small panel sequencing with many intermediate possibilities. Each of the approaches has a different cost and associated benefit. I will present a comparison of different methods and their efficacy for guiding cancer
treatment.
4:30 Integrated NGS Analysis to Accelerate Disease Understanding for Drug Discovery
Helen Li, Director- Research IT - Biologics & Informatics, Eli Lilly and Company
5:00 Identification of Cancer Biomarker Genes
Maryam Nazarieh, PhD, Postdoctoral Researcher, Center for Bioinformatics, Universität des Saarlandes, Saarbrücken, Germany
Identification of biomarker genes plays a crucial role in disease detection and treatment. Computational approaches enhance the insights derived from experiments and reduce the efforts of biologists and experimentalists to identify biomarker genes
which play key roles in complex diseases. This is essentially achieved through prioritizing a set of genes with certain attributes (1). Here, I propose a set of transcription factors that make the largest strongly connected component of the
pluripotency network in embryonic stem cells as the global regulators that control differentiation process determining cell fate. This component can be controlled by a set of master regulatory genes. The regulatory mechanisms underlying
stem cells inspired us to formulate the problem where a set of master regulatory genes in regulatory networks is identified with two combinatorial optimization problems namely as minimum dominating set and minimum connected dominating set in
weakly and strongly connected components. The developed methods were applied to regulatory cancer networks to identify disease-associated genes and anti-cancer drug targets in breast cancer and hepatocellular carcinoma. As not all the
nodes in the solutions are critical, a prioritization method was developed named TopControl to rank a set of candidate genes which relate to a certain disease based on systematic analysis of the genes that are differentially expressed in tumor
and normal conditions. To this purpose, the NGS data were utilized taken from The Cancer Genome Atlas for matched tumor and normal samples of liver hepatocellular carcinoma (LIHC) and breast invasive carcinoma (BRCA) datasets. Moreover, the
topological features were demonstrated in regulatory networks surrounding differentially expressed genes that were highly consistent in terms of using the output of several analysis tools. We present several web servers and software packages
that are publicly available at no cost. The Cytoscape plugin of minimum connected dominating set identifies a set of key regulatory genes in a user provided regulatory network based on a heuristic approach. The ILP formulations of minimum dominating
set and minimum connected dominating set return the optimal solutions for the aforementioned problems. Our source code is publicly available. The web servers TFmiR and TFmiR2 construct disease-, tissue-, process-specific networks for the sets
of deregulated genes and miRNAs provided by a user. They highlight topological hotspots and offer detection of three- and four-node FFL motifs as a separate web service for both organisms mouse and human. 1) Maryam Nazarieh, Understanding regulatory
mechanisms underlying stem cells helps to identify cancer biomarkers. Ph.D. thesis, Saarland University, Saarbrücken, Germany (2018).
5:30 Best of Show Awards Reception in the Exhibit Hall with Poster Viewing
Thursday, April 18
7:30 am Registration Open and Morning Coffee
8:00 PLENARY KEYNOTE SESSION & AWARDS PROGRAM
Amphitheater
9:45 Coffee Break in the Exhibit Hall and Poster Competition Winners Announced
10:30 Chairperson’s Remarks
Konrad Karczewski, PhD, Computational Biologist, Broad Institute
10:40 Leveraging Human Genetic Electronic Medical Record-Linked Biobank Data to Guide Drug Discovery
Ron Do, PhD, Assistant Professor, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai
High failure rates of drug development in clinical trials are due in large part to inefficacy of drug therapeutics, and unforeseen adverse side effects. Genetic associations from genome-wide association studies have shown potential in guiding
drug target prioritization. Electronic medical record (EMR)-linked biobank data have recently emerged as a source to conduct genome-wide association scans on a broad spectrum of medical and clinical phenotypes. My talk will evaluate the utility
of such data in the context of drug research and development. Specifically, I will present results on utilizing genetic association data from a large EMR-linked biobank, for the purposes of informing efficacy and side effect prediction of
drug therapeutics in clinical trials. I expect attendees to learn about the following: 1) genome-wide association studies; 2) EMR-linked biobanks; 3) how this genetic data can be used to guide drug target prioritization.
11:10 VCPA - A Cloud-Based SNP/Indel Variant Calling Pipeline and Data Management Tool Used for Analysis of WGS/WES for the Alzheimer’s Disease Sequencing Project
Yuk Yee Leung, PhD, Research Assistant Professor, Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania
The Alzheimer's Disease Sequencing Project (ADSP), an integral component of the National Alzheimer’s Project Act towards a cure of Alzheimer’s Disease (AD), will eventually analyze whole-genome sequencing (WGS) and whole-exome sequencing
(WES) data from > 20,000 late-onset AD patients and cognitively normal elderly to find new genetic variants associated with disease risk. To ensure all sequencing data are processed consistently and efficiently according to best
practices, a common workflow called “Variant Calling Pipeline and Data Management Tool” (VCPA) was developed by the Genome Center for Alzheimer's Disease (GCAD) in collaboration with ADSP. VCPA is capable to process any kind
of germline DNA sequencing data and available for general use. VCPA 1) is optimized for large-scale production of WGS and WES data, 2) includes a tracking database with web frontend for users to track production process and review quality
metrics; 3) is implemented using the Workflow Description Language (WDL) for better deployment and maintenance, 4) is designed for the latest human reference genome build (GRCh38/hg38, version GRCh38DH) and follows best practices for WGS analysis
with input from TOPMed (Trans-Omics for Precision Medicine) and CCDG (Centers for Common Disease Genomics).
11:40 Variation Across 141,456 Individuals Reveals the Spectrum of Loss-of-Function Intolerance of the Human Genome
Konrad Karczewski, PhD, Computational Biologist, Broad Institute
12:10 pm Session Break
12:20 Luncheon Co-Presentation: The Future State of NGS
Data Analysis
Anthony Philippakis, MD, PhD,
Chief Data Officer, Broad Institute of MIT and Harvard
Pankaj Srivastava,
Computer Science BSc, Vice President of Software and Informatics,
Bioinformatics, Illumina
Data analysis is the key to unlocking the power of the genome – turning raw sequencing information into the answers that matter most. Join Illumina and the Broad Institute for a discussion around the future state of next generation sequencing
data analysis, and an update on the Illumina ® DRAGEN ™ Bio-IT Platform.
12:50 Session Break
1:20 Dessert Refreshment Break in the Exhibit Hall with Poster Viewing
1:55 Chairperson’s Remarks
Yuval Itan, PhD, Assistant Professor, Department of Genetics and Genomic Sciences; Member, Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai
2:00 Pinpointing Transcript-Damaging Disease-Causing Variants as a Major Step towards RNA Therapeutics
Sahar Gelfman, PhD, Associate Research Scientist, Columbia University Medical Center
The difficulty in capturing pathogenic variants that indirectly damage mRNA formation results in overlooking synonymous and intronic variants when searching for disease risk in sequenced genomes. The Transcript-inferred Pathogenicity (TraP) model
was developed to identify sequence context changes that affect splicing decisions and the formation of the final transcript. A random forest model is trained on previously described pathogenic and benign synonymous mutations and identifies
damaging variants with over 97% specificity and with a sensitivity three-four times higher than other available scores. Importantly, the specific mode of action of TraP damaging variants can be rescued using carefully designed small molecules,
thus identifying these variants is a big step towards personalized treatments for mutation carriers. Since its publication in 2017, TraP has become a major resource for genetic diagnostics that is helping to change the common conception that
pathogenic genetic variation is caused solely by coding mutations. TraP has been incorporated in diagnostic pipelines in tens of research institutes worldwide, among which are the NIH, Nationwide Children’s Hospital, SickKids foundation,
Massachusetts General Hospital and others. TraP is also available as a website for single queries (www.trap-score.org) that is used systematically by over 1,500 users from clinics and genetic institutes
in over 40 countries worldwide, providing successful diagnosis of genetic disorders and affecting treatment decisions.
2:30 AI Assisted Rapid Clinical Whole Genome Sequencing for Clinical Care
Ray Veeraraghavan, PhD, Director of IT & Informatics, Rady Children's Institute for Genomic Medicine
3:00 Deciphering the Complex Heterogeneity of Cancer
Patrice M. Milos, PhD, Co-Founder/President and CEO, Medley Genomics, Inc.
In 2017, 1.7 million people in the US were diagnosed with cancer, and even though cancer survival rates have increased, it still accounts for 1 in 4 deaths annually. Cancer, a heterogeneous disease, has significant tumor cell variability within
individual patients, as well as across categories of patients, creating complex barriers to effective and lasting cures for patients. Understanding this heterogeneity will be required to individualize care for patients. Medley Genomics provides a
software platform that uses patent-pending algorithms and advanced data analytics to describe a patient's diverse tumor cell mixture. This enables creation of unique molecular diagnostic fingerprints for improving patient diagnosis,
monitoring and treatment of cancer, and helps to improve novel oncology therapies and therapeutic combinations including individual cancer vaccine development.
3:30 Estimating Genotypic Heterogeneity Underlying Human Disease
Yuval Itan, PhD, Assistant Professor, Department of Genetics and Genomic Sciences; Member, Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai
Whole exome and whole genome sequencing provide hundreds of thousands of genetic variants per patient, of them only very few are pathogenic. Current computational methods are inefficient in differentiating pathogenic mutations from neutral genetic
variants that are predicted to be damaging, and cannot predict the functional outcome of mutations. We will present: (1) a deep learning approach to efficiently detect pathogenic mutations by utilizing extensive annotations and patients’
phenotypic data; (2) a machine learning method combined with natural language processing to estimate whether a mutation results in gain- or loss-of-function; and (3) a cases-controls gene burden study to detect genes and pathways enriched
with rare and high impact disease-causing mutations in exomes of over 2,000 Ashkenazi Jewish patients suffering from inflammatory bowel disorder. Finally, we will present new tools to visualize and extract useful information of human, mutations,
and DNA/protein sequences for better utilization of next generation sequencing data and understanding of human disease genomics.
4:00 Conference Adjourns