Track 1- Data Storage & Management

2019 Archived Content

Track 1: Data Storage & Management

Is the burden of managing your data growing larger every day? Do you have a scalable and robust data management infrastructure in place to process, analyze, and store vast quantities of data according to your organization's policies? Is your organization using new tools and analytical processes such as AI and deep learning that stress your supporting IT infrastructure beyond the expectations of system designers? Managing data has become a prevalent issue in the life sciences industry. Organizations are spending millions on systems and platforms to manage and store many types of data (e.g., experimental, operational, clinical) from many disparate sources. The role of data engineering is critical in orchestrating, configuring, managing, and scaling solutions to manage the data bloat problem. The Data & Storage Management track presents in-depth case studies from leading life sciences organizations who are implementing solutions to address these data issues. Presentations will focus on people, process and technology issues related to storage platforms, architectures, integration and migration plans, governance, collaboration, scalability and cost efficiencies.

Final Agenda

Tuesday, April 16

7:00 am Workshop Registration Open and Morning Coffee

8:00 – 11:30 Recommended Morning Pre-Conference Workshops*

W5. Managing Sensitive and HIPAA-Controlled Data with Globus

12:30 – 4:00 pm Recommended Afternoon Pre-Conference Workshops*

W12. Data Science Driving Better Informed Decisions

* Separate registration required.

2:00 – 6:30 Main Conference Registration Open

4:00 PLENARY KEYNOTE SESSION
Amphitheater

Co-Sponsored by

5:00 – 7:00 Welcome Reception in the Exhibit Hall with Poster Viewing and Meet the Experts: Plenary Keynote Speaker

Wednesday, April 17

7:30 am Registration Open and Morning Coffee

8:00 PLENARY KEYNOTE SESSION
cambridge Complex

Co-Sponsored by

9:45 Coffee Break in the Exhibit Hall with Poster Viewing

10:50 Chairperson's Remarks

Hongliang Tang, Senior Director and Chief Architect, Huawei American Storage Research Lab, Futurewei Technologies, Inc.

11:00 DB4Sci, Open Source Database as a Service (DBaaS) for On-Prem and Cloud

John Dey, HPC Systems Engineer, Scientific Computing, Fred Hutchinson Cancer Research Center

Cloud based databases as a service (DBaaS) have extremely simplified database management. We can create database instances using best practice configuration including backup and DR plans with a single push of a button. However, databases are sensitive to latency and cloud-based databases cannot be used effectively from on-prem. Supporting Postgres, MongoDB, MariaDB/MySQL and Neo4J graph databases DB4Sci is the ideal DBaaS solution for on-premise and multi-cloud deployments that supports high performance backup to cloud storage. The audience will learn how one can deploy a very robust and fast database service with a simple architecture. At its core db4sci is a rather simple Python-Flask app that uses docker commands to manage database instances in containers. The simplistic architecture is intentionally not designed around enterprise features such as High-Availability (HA) and business continuity. Instead we focus on our ability to recover from disasters (DR). Data is backed up to cloud storage at regular intervals and can be restored by an administrator or by the end user, for example on a server in a different cloud. We can demonstrate that users can be back in business within a few minutes after a major failure.

11:30 Data Centralization for Any Lab, Any Equipment, Any Software

Charles Fracchia, Founder and CEO, BioBright
Jarrod Medeiros, Director of Informatics and IT, Casma Therapeutics

It’s all too easy to end up with cloud infrastructures that mirror the shortcomings of local data management. In this talk, we will present how carefully designed software can make data available seamlessly, removing the need for scientists to dig through disparate systems to find what they need and analyze it. We will present a new model that allows data centralization and cloud-based data analysis while minimizing the burden on the scientist. We will share concrete use-cases for how to effectively migrate to a data-centric workflow that takes advantage of cloud storage and analytics. Attendees will leave with five steps to help plan and evaluate their approach to cloud data management: 1) examining the current flow of data, 2) finding out the scientific needs for data, 3) calculating storage needs for seamless scale up, 4) connecting the dots between storage and analysis, and 5) designing for future integrations/growth.

12:00 pm Architecting for Success with Machine Learning Data Platforms for Image Analysis and Precision Medicine

William Beaudin, Director of Solutions Engineering, DDN Storage

Aspects of precision medicine, including automated image analysis or mining patient data to better target therapies, leverage AI and deep learning. While early training data fits in-node, successful approaches attract more data. Forward thinking organizations adopt scalable architectures; the unprepared fall behind. We review key considerations for machine-learning platforms ensuring effortless scaling, deeper insights, and shorter path to value.

12:15 Internet2: Leveraging Distributed Resources to Speed Discovery

Dan Taylor, Director, Business Development, Internet2

Few Life Sciences organizations take advantage of the vast resources available to R&D organizations for continuous innovation and keeping pace with big data. This session will discuss the infrastructure underlying collaborations that use private, academic and public resources – including commercial cloud and supercomputing centers storage and processing - to maximize options and speed discovery.

12:30 Session Break

12:40 NEW: Luncheon Co-Presentation I: Accelerating Life Sciences Workflows Using Software Defined Storage

David Hiatt, Director, Product Marketing and Business Development, WekaIO

Chris Dagdigian, Co-Founder and Senior Director, Infrastructure, BioTeam, Inc.

In this presentation we will compare the results of Cryo-EM and genomic pipelines run on a traditional storage architecture to those run on a modern scale-out storage system. See how the modern scale-out system can meet the mixed workload challenges of life sciences and outperform the storage system for the largest supercomputer in the world.

1:10 Luncheon Presentation II: Accelerate Precision Medicine with High Performance Data and AI

Frank Lee, PhD, Global Industry Leader for Healthcare and Life Sciences - IBM Systems

Get your data and apps ready for precision medicine and research in the multicloud era, to derive faster insights with high performance data and AI architecture. Join Frank Lee, PhD, Global Industry Leader for Healthcare and Life Sciences, as he presents real-life use cases and best practices for high performance genomics and imaging with deep learning that will help you deliver new records for speed and scale, cost efficiencies, collaboration and ease of use.

1:40 Session Break

1:50 Chairperson’s Remarks

Brigitte Raumann, Product Manager, Globus, University of Chicago

1:55 Achieving Compliant Collaboration: Securely Managing Protected Data to Accelerate Discovery

Brigitte Raumann, Product Manager, Globus, University of Chicago

Researchers working with protected data -- such as HIPAA-regulated data and controlled unclassified information -- face many challenges in managing this data and sharing it with colleagues. Meeting compliance requirements is complicated, and investigators must often either slow their process to address this burden, or resort to using distilled, de-identified data instead. With higher assurance levels provided by Globus, the leading research data management service, users can optimize their protected data environments by integrating secure, scalable data management capabilities into existing workflows and applications. attendees will learn about features and enhancements to the Globus service that make it possible to manage protected data in a compliant manner, and will gain an understanding for how their organization can benefit from these features.

2:25 Research, Privacy and Security

Kris Torgerson, Chief Information and Privacy Officer, Oak Ridge National Laboratory

Research, Privacy, and Security… can they coexist? How to enable research, influence outcomes, and protect the mission responsibly. In a world where well-funded bad actors are actively working to own your data, what are strategies to minimize risk?

2:55 Solving Genomic Data Privacy in the Age of AI

Esteban Rubens, Global Principal, Enterprise Imaging Healthcare, Pure Storage

Health data protection is of paramount importance, with all stakeholders in the healthcare industry looking to adopt AI to improve patient care. We will provide examples of an API-driven Data Hub solution that enables life-science & healthcare organizations to leverage the advancements of AI to help improve diagnoses, find better treatments, and discover new drugs while protecting confidential patient information.

3:25 Refreshment Break in the Exhibit Hall with Poster Viewing, Meet the Experts: Bio-IT World Editorial Team, and Book Signing with Joseph Kvedar, MD, Author, The Internet of Healthy Things℠ (Book will be available for purchase onsite)

4:00 Infrastructure Automation: Real Examples

Karl Gutwin, Senior Scientific Consultant, BioTeam, Inc.

A wide variety of infrastructure automation tooling has existed in many forms for many years; however, it has yet to achieve consistent presence within our day-to-day systems and processes. Automation, when successful, has the potential to measurably improve clarity, reliability and capacity for engineering and operations teams. This talk will walk through the most prevalent automation tools that I have seen in practice, and give real, working examples that you can take back to your office and try yourself. We’ll cover the why, the how and what could possibly go wrong when using automation for your IT infrastructure.

4:30 A Novel Psychiatric Registry System and Its Utilization for Clinical and Pharmaceutical Research

András London, PhD, Assistant Professor, University of Szeged

The development of medical IT systems has opened new opportunities and challenges in many fields of healthcare. One goal is to create a "learning health system" that incorporate data from patients, clinicians, laboratories, and many other information sources to translate information to knowledge. There has been a continuously growing demand to create patient registries where the collected data is readily applicable for statistical analysis using both standard and advanced methods, such as machine learning. In spite of the wide-range applicability of registry databases, the development and spread of them is yet highly limited due to the significant additional extra effort needed besides (e.g., the daily patient care and other administrative obligations). A possible solution to the problem can be the integration of patient registries with the standard EHR patient administration systems. In this talk we present our experiences through the development of a psychiatric registry, its integration to patient administration systems and data mining to investigate the effects of negative symptoms of schizophrenia.

5:00 Managing Genomic Data with Regional Encryption for Efficient Storage, Regulated Access and Proven Compliance

Dan Greenfield, PhD, Co-founder & CEO, PetaGene

PetaGene has added encryption and data management to its award-winning compression. This enables organizations to manage access to their genomic data by internal and external teams, secured with fine-grain regional encryption and deep auditing of data usage. Moreover, this is done in a manner transparent to existing tools and pipelines and integrates with existing on-premises and cloud storage infrastructure.

5:30 Best of Show Awards Reception in the Exhibit Hall with Poster Viewing

Thursday, April 18

7:30 am Registration Open and Morning Coffee

8:00 PLENARY KEYNOTE SESSION & AWARDS PROGRAM
Amphitheater

9:45 Coffee Break in the Exhibit Hall and Poster Competition Winners Announced

10:30 Chairperson’s Remarks

Bill Fox, Vice President, Vertical Strategy & Chief Strategist, Healthcare & Life Sciences, MarkLogic

10:40 The Evolution of DNA Encoded Library Data Management: Lessons Learned Along the Way

Neil Carlson, Investigator, Medicinal Science & Technology, GSK Cambridge R&D

GSK’s DNA Encoded Library Technology (ELT) generates hundreds of millions of sequences each week as the primary readout for analysis. We have developed a robust data management, tracking and delivery platform to meet the often-changing needs of our diverse user base. This talk will review the 12-year evolution of our informatics platform, focusing on the specific challenges we've faced and how we've chosen to address them. Attendees will benefit from the lessons we've learned and will be able to apply these learnings when designing their own storage and delivery solutions.

11:10 The Usage of DNA Encoded Libraries to Predict Target Tractability: Application of the Informatics Platform

Ken Lind, PhD, Computational Chemist, GSK Cambridge R&D

Recent advances in both genome-wide screening and genome-wide analyses have enabled the identification of numerous putative therapeutically relevant targets for hit identification programs. Pursuing all of these targets in small molecule hit ID programs is neither feasible nor warranted. We have developed and deployed Encoded Library Technology (ELT) protocols that rapidly predict the small molecule tractability of novel targets. Attendees will learn how we leverage our analysis platform to quickly prioritize novel therapeutically relevant targets and focus small molecule hit identification efforts on those that are most likely to succeed.

11:40 "Data Wars" What R&D Organizations Need to Do In Order to Survive The Near Future

John F. Conway, Global Head of R&D&C IT, Science and Enabling Units IT, AstraZeneca

R&D organizations, from startup to mature need to quickly transform a culture around Data, Information, and Knowledge as an Asset and Emulate a Data company. R&D organizations need improved stringency from data capture to contextualization to reuse. The FAIR principles are criteria to measure success in the journey, but it starts with a written scientific data strategy that outlines the what, the who and the how from a change management and cadence perspective. Simply put we have to stop treating our data like trash but instead as another form of currency that has immense value.

12:10 pm Session Break

12:20 Luncheon Co-Presentation: Building a Modern Research Data Hub

Bill Fox, Vice President, Vertical Strategy & Chief Strategist, Healthcare & Life Sciences, MarkLogic

Imran Chaudhri, Chief Architect, Healthcare & Life Sciences, MarkLogic

One of the primary challenges in transforming real world data into valuable real world evidence lies in its diverse format and structure. The need to extract greater value from this multi-structured data is compelling pharmaceutical companies to move away from outdated and siloed IT infrastructures in favor of more agile, modern data management solutions. In this discussion, you will learn how the MarkLogic data hub framework empowering many of the top global pharmaceutical companies to quickly breakthrough data silos to build innovative applications at less time and cost. To illustrate, the talk will also include a demo of a pharmacovigilance application built for a Top 10 pharma by just two people in four weeks.

12:50 Session Break

1:20 Dessert Refreshment Break in the Exhibit Hall with Poster Viewing

1:55 Chairperson’s Remarks

Chris Dwan, Senior Technologist and Independent Life Sciences Consultant

2:00 PANEL DISCUSSION: High Performance Consultancies

Moderator:

Chris Dwan, Senior Technologist and Independent Life Sciences Consultant

Panelists:

Tanya Cashorali, CEO, Founder, TCB Analytics

Aaron Gardner, Director of Technology, BioTeam, Inc.

Eleanor Howe, PhD, Founder and CEO, Diamond Age Data Science

An organization must learn and understand the value of why, when and how to use a consultancy. Highly trained and skilled professional experts gather to discuss their role in leading and managing projects for organizations to help them achieve goals. They will discuss a variety of themes including the best kinds of projects to hire a consultancy for, the timeline of when an organization should hire a consultant vs. full time staff, and big challenges on the horizon. The session will feature short podium presentations, followed by a moderated Q&A panel with attendees. The topic of hiring a consulting company came up in the data science plenary keynote at Bio-IT 2018. We want to spend time at Bio-IT 2019 exploring this topic in finer detail.

3:20 KEYNOTE PRESENTATION: Trends from the Trenches 2019

Chris Dagdigian, Co-Founder and Senior Director, Infrastructure, BioTeam, Inc.

The “Trends from the Trenches” in its original “state of the state address” returns to Bio-IT! Since 2010, the “Trends from the Trenches” presentation, given by Chris Dagdigian, has been one of the most popular annual traditions on the Bio-IT Program. The intent of the talk is to deliver a candid (and occasionally blunt) assessment of the best, the worthwhile, and the most overhyped information technologies (IT) for life sciences. The presentation has helped scientists, leadership, and IT professionals understand the basic topics related to computing, storage, data transfer, networks, and cloud that are involved in supporting data intensive science.

4:00 Conference Adjourns