The unprecedented growth of data generation and research storage isn’t slowing down anytime soon. As such, storage is becoming a major cost element in the genomic IT world where organizations are spending millions on systems and platforms. The role
of data engineering is critical in orchestrating, configuring, managing, and monitoring solutions to manage the data bloat problem. Track 1 assembles thought leaders and organizations from data centers and “centers of excellence” who have
pioneered advances in large-scale data management, predictive analytics, and workflow automation. Presentations will focus on people, process and technology issues related to storage platforms, integration and migration plans, architectures, governance,
and scalability.
Tuesday, May 23
7:00 am Workshop Registration and Morning Coffee
8:00 – 11:30 Recommended Morning Pre-Conference Workshops*
(W7) Introduction to Hadoop for Bioinformatics
12:30 – 4:00 pm Afternoon Pre-Conference Workshops*
* Separate registration required.
2:00 – 6:00 Main Conference Registration Open
4:00 PLENARY KEYNOTE SESSION
5:00 – 7:00 Welcome Reception in the Exhibit Hall with Poster Viewing
Wednesday, May 24
7:00 am Registration Open and Morning Coffee
8:00 PLENARY KEYNOTE SESSION
9:50 Coffee Break in the Exhibit Hall with Poster Viewing
10:50 Chairperson’s Remarks
Chris Dwan, Senior Technologist and Independent Consultant
11:00 IT Design Patterns to Support Genomic Science in the Age of the Cloud: Challenges and Possibilities
Chris Dwan, Senior Technologist and Independent Consultant
11:30 Design Considerations for Genomic Data Archival
Saira Kazmi, Ph.D., Scientific Data Architect, Research Information Technology, The Jackson Laboratory
This talk focuses on issues related to managing and tiering storage for genomic sequence data. Architectural considerations for designing a solution for scalability, governance, and discoverability will be presented. The presentation will discuss some
of the current hardware and software technologies and a solution using metadata indexing will be presented. The presentation will conclude with lessons learned and next steps.
12:00 pm Life Sciences at EXAScale: Applying a Novel IO System to Critical Workflows
James Coomer, Technical Director, UK-Pre-Sales, Engineering, DDN Storage
Today, we generally assume that an IO interface and the filesystem choice are related and indeed this is usually true. We present an IOSystem based upon Flash which embeds within a filesystem, replacing the IO interface with one that removes common constraints – particularly for IO patterns seen in Life Sciences. DDN’s Infinite Memory Engine radically changes the way IO is handled, providing new opportunities for complex life science workflows.
12:15 Novel
Systems and Approaches for the Next Generation of Genomic Analysis and Data Management
Christopher Davidson, Life Science Solutions Manager, HPE
The pace and scale of genomics research is now less defined by the science itself than by the compute and storage architectures used to extract insight from the genomic data generated. This session focuses on how to enable genomic workflow acceleration and the democratization of data through flexible and scalable systems.
12:30 Session Break
12:40 Luncheon Presentation I: Broad Institute & Intel GATK 4.0 Optimization Overview
Eric Banks, Director, Data Science and Data Engineering Group, Broad Institute
Geraldine Van der Auwera, Associate Director, Outreach and Communications, GATK, Broad Institute
Mark Bagley, Director, Center for Genomic Data Engineering, Intel
Paolo Narvaez, Senior Director, Engineering, Intel
Genomics research leader the Broad Institute of MIT and Harvard joins Intel to describe their collaboration to enhance the GATK environment and scale researchers’ ability to analyze massive amounts of genomic data from diverse sources worldwide.
Topics include performance best practices and the latest on Genomics DB and FireCloud.
1:10 Luncheon Presentation II: The New Era of Integrated Data Storage
Mark Pastor, Director, Archive & Technical Workflow Solutions, Marketing, Quantum Corporation
Clinicians and researchers need high-performance and easy access to data — even as data repositories reach petabyte scale levels. Designing a future-proof, cost effective storage infrastructure to deliver performance access and protection for
all your data, and can act as a gateway to all types of storage media including public cloud is now easier than ever before.
1:40 Session Break
1:50 Chairperson’s Remarks
Asya Shklyar, Senior Scientific Consultant Infrastructure, BioTeam
1:55 Tools and Techniques for Making Data Less Scary and More Visible
Asya Shklyar, Senior Scientific Consultant Infrastructure, BioTeam
Data management has been and is an ongoing and rapidly escalating problem in the research and commercial world. The talk aims to summarize the tools that can be leveraged to wrangle the data across multiple heterogenous sources and make it more quantifiable,
searchable, parsable, and actionable, including software and hardware, open source, and commercial.
2:25 Reducing the Size and Cost of NGS Data Storage and Transfer
Dan Greenfield, Ph.D., CEO, PetaGene Ltd.*
PetaGene provides a software suite which allows up to 5x compression of NGS data and better utilizes your compute nodes. PetaGene's file system interface allows existing tools and pipelines to be used without modification, and this is provided as
a free download so everyone everywhere can use the compressed files. * Bio-IT World 2016 Best of Show winner
2:55 Pushing the Limits of Discovery with Internet2
Dan Taylor, Director, Business Development, Network Services, Internet2
3:10 Time to Results Matters: The Case for Performance Scale-Out NAS
David Sallak, Vice President, Product Management & Industry Marketing, Panasas, Inc.
Acceleration. Decoding human genomes has gone from decades to mere days and hours. Your research infrastructure must keep ahead of the demands placed on it by the best scientists. Building the right storage foundation positions your organization for
groundbreaking research insights. Learn how the Panasas accelerated scale-out NAS solution helped the Garvan Institute drive innovation faster and simplified their workflows.
3:25 Refreshment Break in the Exhibit Hall with Poster Viewing
4:00 Evaluating Full Flash Scale-Out NAS Technologies for Some Bioinformatics Workloads
Youssef Ghorbal, Design and Technical Solutions Group Manager, Institut Pasteur
Scale-out NAS technology is an effective backend for bioinformatics workloads at Institut Pasteur. In our presentation, we will go through the shortcomings of the current setup for some identified use cases. We will also be giving feedback on how
newly tested flash array technologies may overcome those limitations.
4:30 Integrating Data, Tools and Infrastructure to Enable Efficient Collaboration and Management in Large Scale Biomedical Studies
Sven Nahnsen, Ph.D., Head, Quantitative Biology Center (QBiC), University of Tübingen
We established an infrastructure that builds on multi-layer omics (genomics, transcriptomics and metabolomics), as well as imaging data from mice and from human material that is gained from clinical oncology studies. Furthermore, we developed a data
and project management facility that facilitates the modeling of the experimental design, the interplay with the data acquisition facilities and the bioinformatics analysis.
5:00 Extreme Durability for Your Bioinformatics
Data
Kent Ritchie, Solutions Architect, HGST, a Western Digital brand
HGST, a Western Digital brand, one of the largest storage companies in the world, can help with every stage of your workflow. Come visit us to hear about extreme durability for your bioinformatics data, focusing on high performance computing to archiving
and analytics for your long-term discoveries. We are here to help you deliver possibilities at every stage.
5:30 – 6:30 15th Anniversary Celebration in the Exhibit Hall with Poster Viewing and Best of Show Awards
Thursday, May 25
7:00 am Registration Open
8:00 PLENARY KEYNOTE SESSION & AWARDS PROGRAM
8:05 Benjamin Franklin Awards and Laureate Presentation
8:35 Best Practices Awards Program
8:50 Plenary Keynote
9:45 Coffee Break in the Exhibit Hall and Poster Competition Winners Announced
10:30 Chairperson’s Remarks
Martin R. Gollery, CEO, Tahoe Informatics
10:40 Real World Data Platform and Analytics
Minnie Chou, Director, Information Systems, Amgen
The Real World Data (RWD) Platform is a game changer in Amgen's pursuit of serving patients by delivering innovative human therapeutic products faster. It provides a common high performance analytics ecosystem hosting large volumes of real world patient
claims data and electronic medical records, enabling epidemiologists, analysts and scientists to deliver insights in a timely and cost effective manner. We used Agile approach to implement an enterprise RWD workbench on top of Hadoop based enterprise
data lake to harmonize real world patient data assets, patient cohorts with diseases and/or receiving Amgen/competitor therapies to consistently address questions across the drug commercialization lifecycle.
11:10 Healthcare: Foundational Building Blocks: The Establishment of a Healthcare Data Ecosystem in a Hadoop Environment
Amy M. Andrade, MS, PMP, Assistant Vice President of Research, Meharry Medical College
Charles Boicey, MS, RN-BC, Associate Clinical Professor, Stony Brook Medicine
This talks presents insight and a working framework of how data and storage management, and clinical informatics in a Hadoop environment is plausible in months instead of years. A Data Science Center at a community-based academic health center focused
on serving the underserved and minority populations has implemented a low cost HIPAA compliant cloud approach to “Big Data”. Utilizing technologies new to healthcare, data from both within and outside of the healthcare environment
was processed.
11:40 Kickstarting Breakthroughs in Life Sciences with Intelligent, Next-Generation Scale-Out Storage
Peter Godman, CTO & Co-Founder, Qumulo
Unprecedented storage and data management challenges resulting from advances in genomic IT are plaguing life sciences companies. How can companies stay competitive and handle the challenge of managing billions of small and large files? Discover how
intelligent scale-out storage systems are providing enterprises with real-time answers into their data footprints at scale, providing breakthrough performance while balancing capacity and cost.
11:55 IBM Cloud Object Storage Solutions Enabling Better Patient Outcomes
Piers Nash, Ph.D., Global Solutions Consultant, Genomics & Healthcare, IBM Cloud Object Storage, IBM
IBM and University of Chicago’s Center for Data Intensive Science (CDIS) are accelerating medical discoveries. Utilizing IBM Cloud Object Storage, CDIS centrally stores and manages vast amounts of genomic and clinical data at web-scale. Discover how IBM Watson for Genomics can help researchers to collaborate via shared access to harmonized data sets, speeding discovery and enabling precision medicine.
12:10 pm Session Break
12:20 Luncheon Presentation I: AI + RWD – Next Steps in your Big Data Journey
Arun Ghosh, Principal Data & Analytics, KPMG LLP
Organizations' data repositories have evolved to accommodate real-world data. This moves the industry incrementally closer to the automated data management research scientists and epidemiologists need to accelerate improved patient outcomes. This
presentation will illuminate how machine learning (as a service) and artificial intelligence can use data to accelerate value in small measures today while laying the foundation for what's next.
12:50 Luncheon
Presentation II: Optimized Scaling for NGS: Transfer, Storage and Archiving
Rafael Feitelberg, CEO, Geneformics Inc.
As the volume of NGS data is increasing, so are the challenges of IT costs and infrastructure. In this session, we will cover solutions implemented by leading global organizations to reduce NGS footprint by up to 90%, with scalable Enterprise-Grade architectures that are lossless and transparent to bioinformatics applications.
1:20 Dessert Refreshment Break in the Exhibit Hall with Poster Viewing
1:55 Chairperson’s Remarks
Chris Dwan, Senior Technologist and Independent Consultant
2:00 BioTeam Micro-Symposium: 2017 Bio-IT Trends
Chris Dwan, Senior Technologist and Independent Consultant (Moderator)
Ari E. Berman, Ph.D., Vice President and General Manager of Consulting Services, BioTeam, Inc.
Chris Dagdigian, Founding Partner & Director, Technology, BioTeam, Inc.
Aaron Gardner, Senior Scientific Consultant, BioTeam, Inc.
Adam Kraut, Director of Infrastructure and Cloud Architecture, BioTeam, Inc.
Asya Shklyar, Senior Scientific Consultant, Infrastructure, BioTeam, Inc.
Since 2010, the “Trends in the Trenches” presentation, given by Chris Dagdigian, has been one of the most popular annual traditions on the Bio-IT Program. The intent of the talk was to deliver a candid (and occasionally blunt) assessment of the best, the worthwhile, and the most overhyped information technologies (IT) for life sciences. The presentation tried to recap the prior year by discussing what has changed (or not) around infrastructure, storage, computing, and networks. This presentation has helped scientists, leadership, and IT professionals understand the basic topics involved in supporting data intensive science. For 2017, the “Trends in the Trenches” presentation will evolve and expand from 60-minutes to 120-minutes and feature more content, speakers, and interactive discussion. Short focused podium talks on current trends related to computing, storage/data transfer, networks, and cloud will be presented. A Q&A moderated discussion follows. Come prepared with your questions and commentary for this informative and lively session.
4:00 Conference Adjourns