There is an increased demand in computing power from life science researchers and scientists in genomics tackling big data issues. Track 2 explores techniques and new methods of data transfer and workflows. Themes covered include but aren’t limited
to workforce and equipment mobility, HPC across the enterprise vs. HPC as a service, reconfigurable hardware for HPC and Hadoop.
Tuesday, May 15
7:00 am Workshop Registration Open (Commonwealth Hall) and Morning Coffee (Foyer)
8:00 – 11:30 Recommended Morning Pre-Conference Workshops*
W4. Introduction to Scalable and Reproducible RNA-Seq Data Processing, Analysis, and Result Reporting Using AWS, R, knitr, and LaTex
12:30 – 4:00 pm Recommended Afternoon Pre-Conference Workshops*
W8. Automating Data Analysis with Excel
* Separate registration required.
2:00 – 6:30 Main Conference Registration Open (Commonwealth Hall)
4:00 PLENARY KEYNOTE SESSION (Amphitheater & Harborview 2)
5:00 – 7:00 Welcome Reception in the Exhibit Hall with Poster Viewing (Commonwealth Hall)
Wednesday, May 16
7:00 am Registration Open (Commonwealth Hall) and Morning Coffee (Foyer)
8:00 PLENARY KEYNOTE SESSION (Amphitheater & Harborview 2)
9:45 Coffee Break in the Exhibit Hall with Poster Viewing (Commonwealth Hall)
10:50 Chairperson’s Remarks
Raj Gondhali, Vice President,Client Success, Saama Technologies
11:00 Sequence. Store. Sign Out: Implementing Hail, Kudu and Spark for a Clinical NGS Platform
Ramesh Sringeri, Senior Applications Developer, Mobile Solutions, Children’s Healthcare of Atlanta
In 2017, Children’s Healthcare of Atlanta undertook Next Generation Sequencing (NGS) as a new initiative. Using open-source tools such as Hail, Apache Spark and Apache Kudu, Children’s built a robust, scalable and secure platform to support
NGS in the clinical setting. The resulting infrastructure, which co-locates genomic and phenotypic data, enables variant review and sign out as well as analytics and translational medicine using familiar tools like SQL. The platform comprises
the entire clinical pipeline from raw reads to HGVS-called variants, informative QC and variant reports and data storage in Hail VDS’s in a Kudu storage layer in Hadoop. The upstream data is then presented to the clinician in a friendly
web application for streamlined variant review and sign out. Tools are utilized on top of this platform that will be described in a separate talk by Dr Alexis Carter, Physician Informaticist, Children’s Healthcare of Atlanta, in
Track 4 Software Applications & Services.
11:30 Robust Multi-Array Average (RMA) Using Spark
Michael Neylon, Senior Analyst, Eli Lilly and Company
Robust Multi-Array Average (RMA) is a well-established method for normalizing transcriptome data involving background correction, quantile normalization, annotation and media polish. However, this method has never been deployed to a scalable architecture
suitable for high throughput, large volume processing of samples. We show an implementation using Spark and Amazon Web Services (AWS) that is linearly scalable, cost effective and outperforms standard RMA deployments.
12:00 pm Genomics Analysis - Improving Scale, Speed and Ease of Deployment
Lee Lichtenstein, Associate Director, Somatic Computational Methods, Data Sciences Platform, Broad Institute
Intel has worked with the industry leader, Broad Institute of MIT and Harvard, to optimize and provide a deployment receipt to implement the required infrastructure to analyze the rapidly increasing sequencing data. Hear what optimizations have been
done, the technology utilized, and understand the tools that have been created in the solution.
12:30 Session Break
12:40 Luncheon Presentation
I: High Performance Data & AI for Genomics
Frank Lee, PhD, Global Industry Leader for Healthcare Life Sciences, Chief Architect, IBM, Reference Architecture for Genomics Member of IBM Storage CTO office, IBM Systems Group
Through real use cases and live demo, we will illustrate the architecture and solution for high performance data and AI platforms, deployed for cloud-scale data management, multi-cloud workload orchestration and converged HPC with deep learning. We
will also demonstrate the four key values of HPDA with client case studies -- high performance, low cost, easy of use and collaborative.
1:10 Session Break
1:50 Chairperson’s Remarks
Sanjay Joshi, Chief of Technology, Healthcare and Life Sciences, H2O.ai
1:55 On the Journey towards a New Assay Data Analysis Landscape
Joerg Degen, PhD, Project Leader Research Informatics, Roche
In pharmaceutical research, drug project teams depend on the availability, reliability, and interpretability of all relevant experimental results as they form the basis for many every day decisions to design, create and progress compounds through
the pipeline. Applications and workflows for capturing the diverse experimental results are tailored to the needs of the laboratories that run the experiments, leading to a dynamic and complex system landscape. Some of the biggest challenges that
arise in this context are linked to data integration and quality as well as comprehensive access to the data. While these circumstances are not new and may look straight forward to solve at first, they become more apparent and problematic as data
variety and complexity of questions in research are ever increasing. Another important factor is the steadily growing amount of results that are produced outside of the organization, i.e. in the context of collaborations with academia or other
industry partners. Over the past two years we have fundamentally modernized our landscape for assay data analysis in order to strengthen the ability of our scientists to access and interpret all relevant drug project data more efficiently, and
to collaborate more effectively. In this context, we have refactored some of our foundational systems and significantly increased our capabilities in the areas of data analysis and application of predictive models, e.g. using in-house machine
learning approaches. The data and capabilities are now integrated and made available to all researchers via a single, comprehensive platform. Recently we have started a programme to leverage this platform not just for small molecules and peptides
but also for therapeutic proteins and oligonucleotides.
2:25 AI and Health: A Data and Process Approach
Sanjay Joshi, Chief of Technology, Healthcare and Life Sciences, H2O.ai
There has been a lot of hype recently regarding AI in Health and Life Sciences. Sanjay will present an introduction to a use-case based approach -- he will cover the broad categories of financial, operational and clinical use-cases. The learning objectives
are practical, business-oriented Machine Learning and Deep Learning with specific focus on the data and processes involved. Summary results from two specific clinical use-cases will be presented.
2:55 Project Asaka, Harnessing Underutilized Computing Resources for Deep Learning
Jack Harwood, Distinguished Engineer Office, CTO, Dell EMC
Maximizing application performance through GPUs and FPGAs (aka ‘accelerators’) is not new for genomics workflows. Recent advances in machine and deep learning methods leverage the accelerators to operate on larger data sets in an ever
shrinking time span.
3:10 A Modern Approach
to Data Storage for Next Generation Sequencing & Medical Imaging
Peter Godman, Co-founder & CTO, Qumulo
File storage is a critical component of the life sciences research workflow. For researchers to be able to do their work, their storage must be able to scale to and handle billions of files efficiently. They must also be able to access their research
data from anywhere in the world. Learn how universal-scale file storage allows research organizations to manage massive, globally distributed file sets with ease.
3:25 Refreshment Break in the Exhibit Hall with Poster Viewing (Commonwealth Hall)
4:00 Chairperson
Brian Bissett, IT Specialist, IEEE USA
4:00 Informatics at a Digital Biotech
Dave Johnson, PhD, Director, Informatics, Moderna Therapeutics
Moderna Therapeutics is pioneering mRNA medicines, a ‘software-like’ approach whereby we direct the body’s own cells to create proteins to fight and prevent disease. To maximize the enormous potential this new drug approach may have
in addressing unmet needs across a broad spectrum of diseases, we are working to be a truly digital biotech. We’ve built a novel integrated informatics platform to capture data, drive automation, and deliver insight to researchers. The platform
is built using modern web development technologies like AWS, Docker, and microservices with rapid, agile delivery. In this talk, we’ll share our platform architecture, design considerations, and deployment philosophy. We’ll also share
some success stories, including the use of AWS autoscaling servers to process spikes in computational workflows. Computation tools that are utilized on top of this platform that will be described in a separate talk by Iain McFadyen, Senior Director, Computational Sciences, Moderna Therapeutics,
in Track 6 Bioinformatics.
4:30 Reproducible Bioinformatics Software Management with GNU Guix
Ricardo Wurmus, System Administrator/Software Developer, Bioinformatics Platform, Max Delbrück Center for Molecular Medicine, Berlin
We will introduce functional package management with GNU Guix, demonstrate some of the benefits it enables for research, such as reproducible software deployment, workflow-specific profiles, and user-managed environments, and share our experiences
with using GNU Guix for bioinformatics research at the Max Delbrück Center. Functional package management differs from other software management methodologies in that reproducibility is a primary goal. We will also compare the properties
and guarantees of functional package management with the properties of other application deployment tools such as Docker or Conda.
5:00 SELECTED POSTER PRESENTATION: In Vivo PK Workflow
Andrew Lemon, CEO, The Edge Software Consultancy Ltd.
5:30 Best of Show Awards Reception in the Exhibit Hall with Poster Viewing (Commonwealth Hall)
7:00 – 10:00 Bio-IT World After Hours @Lawn on D
**Conference
Registration Required. Please bring your conference badge, wristband, and photo ID for entry.
Thursday, May 17
7:30 am Registration Open (Commonwealth Hall) and Morning Coffee (Foyer)
8:00 PLENARY KEYNOTE SESSION & AWARDS PROGRAM (Amphitheater & Harborview 2)
9:45 Coffee Break in the Exhibit Hall and Poster Competition Winners Announced (Commonwealth Hall)
10:30 Chairperson’s Remarks
Susan Roberts, Senior Director, Scientific Computing, Vertex Pharmaceuticals
10:40 A New qPCR Workflow: How Much Information Have We Missed Out?
Matteo Cassotti, PhD, Data and Workflow Scientist, Roche
This talk will discuss overcoming hurdles afflicting the qPCR workflow for LNA-containing oligonucleotides in the recently established RNA Therapeutics Research Unit at Roche.
11:10 Architecting and Delivering a Comprehensive Biologics Data Systems to Enable Data Driven Molecule Design and Discovery
Jayanthi Subramani, Senior Manager, Research and Development Informatics(R & DI), Amgen
Amgen has developed a comprehensive data system that enables execution of "high throughput biologics". The platform is flexible to support the greater sample numbers, biological complexity and molecular diversity required for large molecule discovery. The accessible data architecture supports advanced self-service analytics capabilities and drives broad organizational decision processes.
11:40 Using Agile Techniques in Wet Labs to Speed the Creation of Even More Big Data
Bruce Kozuma, Principal System Analyst, Broad Information Technology Services, The Broad Institute of MIT and Harvard
12:10 pm Enjoy Lunch on Your Own
1:20 Dessert Refreshment Break in the Exhibit Hall with Poster Viewing
1:55 Sponsor Introduction
Scott Jeschonek, Director, Cloud Services, Avere Systems
2:05 – 4:00 Panel Session: BioTeam Town Hall: 2018 Bio-IT Trends
Chris Dwan, Senior Technologist and Independent Life Sciences Consultant (Moderator)
Ari Berman, PhD, Vice President and General Manager of Consulting Services, BioTeam, Inc.
Tanya Cashorali, Founder, TCB Analytics
Kristen Cleveland, PMP, Director of Operations, BioTeam, Inc.
Chris Dagdigian, Co-Founder and Senior Director, Infrastructure, BioTeam, Inc.
Karl Gutwin, PhD, Senior Scientific Consultant, BioTeam, Inc.
Adam Kraut, Director of Infrastructure and Cloud Architecture, BioTeam, Inc.
Since 2010, the “Trends in the Trenches” presentation, given by Chris Dagdigian, has been one of the most popular annual traditions on the Bio-IT Program. The intent of the talk was to deliver a candid (and occasionally blunt)
assessment of the best, the worthwhile, and the most overhyped information technologies (IT) for life sciences. The presentation tried to recap the prior year by discussing what has changed (or not) around infrastructure, storage,
computing, and networks. This presentation has helped scientists, leadership, and IT professionals understand the basic topics involved in supporting data intensive science. In 2017, the “Trends in the Trenches” presentation
evolved and expanded from 60 minutes to 120 minutes and featured more content, speakers, and interactive discussion. We will continue this format for 2018, featuring short focused podium talks on current trends related to computing,
storage/data transfer, networks, cloud, and managing successful IT projects. An interactive Q&A moderated discussion with the audience follows. Come prepared with your questions and commentary for this informative and lively session.
4:00 Conference Adjourns