2018 Archive | Track 2- Data Computing

2018 Archived Content

OVERVIEW | DOWNLOAD BROCHURE | SPEAKERS | WORKSHOPS

There is an increased demand in computing power from life science researchers and scientists in genomics tackling big data issues. Track 2 explores techniques and new methods of data transfer and workflows. Themes covered include but aren’t limited to workforce and equipment mobility, HPC across the enterprise vs. HPC as a service, reconfigurable hardware for HPC and Hadoop.

Tuesday, May 15

7:00 am Workshop Registration Open (Commonwealth Hall) and Morning Coffee (Foyer)

8:00 – 11:30 Recommended Morning Pre-Conference Workshops*

W4. Introduction to Scalable and Reproducible RNA-Seq Data Processing, Analysis, and Result Reporting Using AWS, R, knitr, and LaTex

12:30 – 4:00 pm Recommended Afternoon Pre-Conference Workshops*

W8. Automating Data Analysis with Excel

* Separate registration required.

2:00 – 6:30 Main Conference Registration Open (Commonwealth Hall)

4:00 PLENARY KEYNOTE SESSION (Amphitheater & Harborview 2)

5:00 – 7:00 Welcome Reception in the Exhibit Hall with Poster Viewing (Commonwealth Hall)

Wednesday, May 16

7:00 am Registration Open (Commonwealth Hall) and Morning Coffee (Foyer)

8:00 PLENARY KEYNOTE SESSION (Amphitheater & Harborview 2)

9:45 Coffee Break in the Exhibit Hall with Poster Viewing (Commonwealth Hall)

DATA MANAGEMENT AND ANALYTICS: PLATFORMS & TOOLS
Skyline

10:50 Chairperson’s Remarks

Raj Gondhali, Vice President,Client Success, Saama Technologies

11:00 Sequence. Store. Sign Out: Implementing Hail, Kudu and Spark for a Clinical NGS Platform

Ramesh Sringeri, Senior Applications Developer, Mobile Solutions, Children’s Healthcare of Atlanta

In 2017, Children’s Healthcare of Atlanta undertook Next Generation Sequencing (NGS) as a new initiative. Using open-source tools such as Hail, Apache Spark and Apache Kudu, Children’s built a robust, scalable and secure platform to support NGS in the clinical setting. The resulting infrastructure, which co-locates genomic and phenotypic data, enables variant review and sign out as well as analytics and translational medicine using familiar tools like SQL. The platform comprises the entire clinical pipeline from raw reads to HGVS-called variants, informative QC and variant reports and data storage in Hail VDS’s in a Kudu storage layer in Hadoop. The upstream data is then presented to the clinician in a friendly web application for streamlined variant review and sign out. Tools are utilized on top of this platform that will be described in a separate talk by Dr Alexis Carter, Physician Informaticist, Children’s Healthcare of Atlanta, in Track 4 Software Applications & Services.

11:30 Robust Multi-Array Average (RMA) Using Spark

Michael Neylon, Senior Analyst, Eli Lilly and Company

Robust Multi-Array Average (RMA) is a well-established method for normalizing transcriptome data involving background correction, quantile normalization, annotation and media polish. However, this method has never been deployed to a scalable architecture suitable for high throughput, large volume processing of samples. We show an implementation using Spark and Amazon Web Services (AWS) that is linearly scalable, cost effective and outperforms standard RMA deployments.

12:00 pm Genomics Analysis - Improving Scale, Speed and Ease of Deployment

Lee Lichtenstein, Associate Director, Somatic Computational Methods, Data Sciences Platform, Broad Institute

Intel has worked with the industry leader, Broad Institute of MIT and Harvard, to optimize and provide a deployment receipt to implement the required infrastructure to analyze the rapidly increasing sequencing data. Hear what optimizations have been done, the technology utilized, and understand the tools that have been created in the solution.

12:30 Session Break

12:40 Luncheon Presentation I: High Performance Data & AI for Genomics

Frank Lee, PhD, Global Industry Leader for Healthcare Life Sciences, Chief Architect, IBM, Reference Architecture for Genomics Member of IBM Storage CTO office, IBM Systems Group

Through real use cases and live demo, we will illustrate the architecture and solution for high performance data and AI platforms, deployed for cloud-scale data management, multi-cloud workload orchestration and converged HPC with deep learning. We will also demonstrate the four key values of HPDA with client case studies -- high performance, low cost, easy of use and collaborative.

1:10 Session Break

DATA MANAGEMENT AND ANALYTICS: ENABLING DATA SCIENCE AND MACHINE LEARNING
Waterfront 2

1:50 Chairperson’s Remarks
Sanjay Joshi, Chief of Technology, Healthcare and Life Sciences, H2O.ai

1:55 On the Journey towards a New Assay Data Analysis Landscape

Joerg Degen, PhD, Project Leader Research Informatics, Roche

In pharmaceutical research, drug project teams depend on the availability, reliability, and interpretability of all relevant experimental results as they form the basis for many every day decisions to design, create and progress compounds through the pipeline. Applications and workflows for capturing the diverse experimental results are tailored to the needs of the laboratories that run the experiments, leading to a dynamic and complex system landscape. Some of the biggest challenges that arise in this context are linked to data integration and quality as well as comprehensive access to the data. While these circumstances are not new and may look straight forward to solve at first, they become more apparent and problematic as data variety and complexity of questions in research are ever increasing. Another important factor is the steadily growing amount of results that are produced outside of the organization, i.e. in the context of collaborations with academia or other industry partners. Over the past two years we have fundamentally modernized our landscape for assay data analysis in order to strengthen the ability of our scientists to access and interpret all relevant drug project data more efficiently, and to collaborate more effectively. In this context, we have refactored some of our foundational systems and significantly increased our capabilities in the areas of data analysis and application of predictive models, e.g. using in-house machine learning approaches. The data and capabilities are now integrated and made available to all researchers via a single, comprehensive platform. Recently we have started a programme to leverage this platform not just for small molecules and peptides but also for therapeutic proteins and oligonucleotides.

2:25 AI and Health: A Data and Process Approach

Sanjay Joshi, Chief of Technology, Healthcare and Life Sciences, H2O.ai

There has been a lot of hype recently regarding AI in Health and Life Sciences. Sanjay will present an introduction to a use-case based approach -- he will cover the broad categories of financial, operational and clinical use-cases. The learning objectives are practical, business-oriented Machine Learning and Deep Learning with specific focus on the data and processes involved. Summary results from two specific clinical use-cases will be presented.

2:55 Project Asaka, Harnessing Underutilized Computing Resources for Deep Learning

Jack Harwood, Distinguished Engineer Office, CTO, Dell EMC

Maximizing application performance through GPUs and FPGAs (aka ‘accelerators’) is not new for genomics workflows. Recent advances in machine and deep learning methods leverage the accelerators to operate on larger data sets in an ever shrinking time span.

3:10 A Modern Approach to Data Storage for Next Generation Sequencing & Medical Imaging

Peter Godman, Co-founder & CTO, Qumulo

File storage is a critical component of the life sciences research workflow. For researchers to be able to do their work, their storage must be able to scale to and handle billions of files efficiently. They must also be able to access their research data from anywhere in the world. Learn how universal-scale file storage allows research organizations to manage massive, globally distributed file sets with ease.

3:25 Refreshment Break in the Exhibit Hall with Poster Viewing (Commonwealth Hall)

WORKFLOW MANAGEMENT: AUTOMATION AND PROVENANCE
Waterfront 2

4:00 Chairperson
Brian Bissett, IT Specialist, IEEE USA

4:00 Informatics at a Digital Biotech

Dave Johnson, PhD, Director, Informatics, Moderna Therapeutics

Moderna Therapeutics is pioneering mRNA medicines, a ‘software-like’ approach whereby we direct the body’s own cells to create proteins to fight and prevent disease. To maximize the enormous potential this new drug approach may have in addressing unmet needs across a broad spectrum of diseases, we are working to be a truly digital biotech. We’ve built a novel integrated informatics platform to capture data, drive automation, and deliver insight to researchers. The platform is built using modern web development technologies like AWS, Docker, and microservices with rapid, agile delivery. In this talk, we’ll share our platform architecture, design considerations, and deployment philosophy. We’ll also share some success stories, including the use of AWS autoscaling servers to process spikes in computational workflows. Computation tools that are utilized on top of this platform that will be described in a separate talk by Iain McFadyen, Senior Director, Computational Sciences, Moderna Therapeutics, in Track 6 Bioinformatics.

4:30 Reproducible Bioinformatics Software Management with GNU Guix

Ricardo Wurmus, System Administrator/Software Developer, Bioinformatics Platform, Max Delbrück Center for Molecular Medicine, Berlin

We will introduce functional package management with GNU Guix, demonstrate some of the benefits it enables for research, such as reproducible software deployment, workflow-specific profiles, and user-managed environments, and share our experiences with using GNU Guix for bioinformatics research at the Max Delbrück Center. Functional package management differs from other software management methodologies in that reproducibility is a primary goal. We will also compare the properties and guarantees of functional package management with the properties of other application deployment tools such as Docker or Conda.

5:00 SELECTED POSTER PRESENTATION: In Vivo PK Workflow
Andrew Lemon, CEO, The Edge Software Consultancy Ltd.

5:30 Best of Show Awards Reception in the Exhibit Hall with Poster Viewing (Commonwealth Hall)

7:00 – 10:00 Bio-IT World After Hours @Lawn on D
**Conference Registration Required. Please bring your conference badge, wristband, and photo ID for entry.

Thursday, May 17

7:30 am Registration Open (Commonwealth Hall) and Morning Coffee (Foyer)

8:00 PLENARY KEYNOTE SESSION & AWARDS PROGRAM (Amphitheater & Harborview 2)

9:45 Coffee Break in the Exhibit Hall and Poster Competition Winners Announced (Commonwealth Hall)

WORKFLOW
Waterfront 1

10:30 Chairperson’s Remarks
Susan Roberts, Senior Director, Scientific Computing, Vertex Pharmaceuticals

10:40 A New qPCR Workflow: How Much Information Have We Missed Out?

Matteo Cassotti, PhD, Data and Workflow Scientist, Roche

This talk will discuss overcoming hurdles afflicting the qPCR workflow for LNA-containing oligonucleotides in the recently established RNA Therapeutics Research Unit at Roche.

11:10 Architecting and Delivering a Comprehensive Biologics Data Systems to Enable Data Driven Molecule Design and Discovery
Jayanthi Subramani, Senior Manager, Research and Development Informatics(R & DI), Amgen
Amgen has developed a comprehensive data system that enables execution of "high throughput biologics". The platform is flexible to support the greater sample numbers, biological complexity and molecular diversity required for large molecule discovery. The accessible data architecture supports advanced self-service analytics capabilities and drives broad organizational decision processes.

11:40 Using Agile Techniques in Wet Labs to Speed the Creation of Even More Big Data
Bruce Kozuma, Principal System Analyst, Broad Information Technology Services, The Broad Institute of MIT and Harvard

12:10 pm Enjoy Lunch on Your Own

1:20 Dessert Refreshment Break in the Exhibit Hall with Poster Viewing

FEATURED SESSION: BIOTEAM TOWN HALL
Amphitheater

1:55 Sponsor Introduction

Scott Jeschonek, Director, Cloud Services, Avere Systems

2:05 – 4:00 Panel Session: BioTeam Town Hall: 2018 Bio-IT Trends

Chris Dwan, Senior Technologist and Independent Life Sciences Consultant (Moderator)

Ari Berman, PhD, Vice President and General Manager of Consulting Services, BioTeam, Inc.

Tanya Cashorali, Founder, TCB Analytics

Kristen Cleveland, PMP, Director of Operations, BioTeam, Inc.

Chris Dagdigian, Co-Founder and Senior Director, Infrastructure, BioTeam, Inc.

Karl Gutwin, PhD, Senior Scientific Consultant, BioTeam, Inc.

Adam Kraut, Director of Infrastructure and Cloud Architecture, BioTeam, Inc.

Since 2010, the “Trends in the Trenches” presentation, given by Chris Dagdigian, has been one of the most popular annual traditions on the Bio-IT Program. The intent of the talk was to deliver a candid (and occasionally blunt) assessment of the best, the worthwhile, and the most overhyped information technologies (IT) for life sciences. The presentation tried to recap the prior year by discussing what has changed (or not) around infrastructure, storage, computing, and networks. This presentation has helped scientists, leadership, and IT professionals understand the basic topics involved in supporting data intensive science. In 2017, the “Trends in the Trenches” presentation evolved and expanded from 60 minutes to 120 minutes and featured more content, speakers, and interactive discussion. We will continue this format for 2018, featuring short focused podium talks on current trends related to computing, storage/data transfer, networks, cloud, and managing successful IT projects. An interactive Q&A moderated discussion with the audience follows. Come prepared with your questions and commentary for this informative and lively session.

4:00 Conference Adjourns

Conference Tracks

T1: Data Platforms & Storage Infrastructure