Hackathon at Bio-IT World Expo and Conference

Stay Tuned for 2026 Project Information!

2026 Hackathon Sponsor

Tuesday, April 1 – Wednesday, April 2, 2025

The Bio-IT World Hackathon is a cornerstone of the Bio-IT World Conference & Expo, bringing together data scientists, developers, and life science professionals to tackle real-world data challenges. Focused on Open Source and FAIR Data (Findable, Accessible, Interoperable, Reusable) principles, this two-day event fosters innovation and collaboration to deliver practical solutions.

2025 Projects to Date:

Project 1: GlycoEnzyme Expression Atlas: Linking Differential Expression to Pathway Dysregulation
Institution: CFDE/GlyGen
Team Lead: Vlado Dancik, PhD, Computational Chemical Biologist, Broad Institute
About the Project: The GlycoEnzyme Expression Atlas project aims to establish connections between glycoenzyme expression patterns and pathway dysregulation across various disease states. This bioinformatics initiative will integrate multiple data types and analytical approaches: RNA-seq data preprocessing using DESeq2/EdgeR for differential expression, Mapping of glycoenzyme genes using CAZy and GlyGen databases Integration with KEGG/Reactome pathway annotations Network analysis via Cytoscape/STRING for interaction mapping.
Why this Project Applicable to Others in the Community? The GlycoEnzyme Expression Atlas project aims to establish connections between glycoenzyme expression patterns and pathway dysregulation across various disease states. This bioinformatics initiative will integrate multiple data types and analytical approaches: RNA-seq data preprocessing using DESeq2/EdgeR for differential expression, Mapping of glycoenzyme genes using CAZy and GlyGen databases Integration with KEGG/Reactome pathway annotations Network analysis via Cytoscape/STRING for interaction mapping.
How is the Project Open Source and/or FAIR? The GlycoEnzyme Expression Atlas project follows FAIR (Findable, Accessible, Interoperable, and Reusable) principles and is designed as an open-source initiative to enhance glycoenzyme-related research. By integrating glycosyltransferase (GT) and glycohydrolase (GH) gene lists, differential expression data, and pathway analysis, the project ensures that data is findable through public repositories like GlyGen, GEO, and the EMBL-EBI Expression Atlas, using standardized metadata and persistent identifiers. To ensure accessibility, all datasets, analysis pipelines, and visualization tools will be freely available on platforms like GitHub and Zenodo, following open-access policies. The project prioritizes interoperability by adhering to standardized ontologies such as EDAM, OBI, and Human Disease Ontology, and using widely accepted data formats like FASTA, CSV, and JSON to facilitate integration with existing bioinformatics tools and databases. By maintaining well-documented pipelines, standardized methodologies, and open-source licensing (e.g., MIT, Apache 2.0, or Creative Commons CC-BY-4.0), the project guarantees reusability.

Project 2: DrugCentral Based Review and Profiles of Targets for Approved Drugs
Institution: CFDE/IDG
Team Leads: Ben Busby, PhD, Senior Alliances Manager, Genomics, NVIDIA
About the Project: Dive into the world of pharmacology with a dynamic project centered on DrugCentral’s comprehensive database of approved drugs and their molecular targets. This project offers multiple pathways to innovation, catering to all levels of expertise—whether you prefer an intuitive, no-code experience or want to dive into advanced programming. Utilize DrugCentral’s interactive web interface to explore drug-target profiles through seamless hyperlinks to interoperable databases like Pharos, or get hands-on with powerful data analysis tools. Harness SQL with PostgreSQL or leverage Python, potentially with Jupyter notebooks, to generate insightful, visually engaging statistics. Focus your efforts on broad descriptive statistics for the entire pharmacopeia, or zoom in on specific drug classes, target families, or disease areas. DrugCentral’s curated target associations—complete with PubMed references—provide a rich, evidence-based foundation for deep exploration and storytelling. Whether you’re driven by data, inspired by disease pathways, or motivated by meaningful drug discovery insights, this project invites you to craft impactful narratives in drug-target research. Jump in and see where your curiosity takes you!
Why this Project Applicable to Others in the Community? DrugCentral is an online compendium of drug information focused on approved drugs, created and maintained by the University of New Mexico and the IDG program. DrugCentral can be accessed via web UI, as a PostgreSQL db cloud or local instance, or via Python API. One of the critical areas of pharmaceutical discovery and development is the identification and validation of biomolecular targets, and novel targets in particular, which can facilitate new and improved therapies, and this has been the overarching goal of the IDG program. DrugCentral can assist in this research area by representing the high-confidence known targets for approved drugs.
How is the Project Open Source and/or FAIR? DrugCentral is fully public and open access via several methods and channels. Entities including chemicals, diseases, genes, and proteins, are identified via community standard vocabularies and IDs for semantically rigorous interoperability.

Project 3: Mapping Disease at the Cellular Level with HuBMAP
Institution: CFDE/ HuBMAP
Team Lead: Nicholas Lucarelli, PhD; Sumanth Devarasetty; Suhas Katari Chaluva Kumar - University of Florida
About the Project: Join us in developing interactive Jupyter notebooks using HuBMAP Workspaces to compare healthy tissue single-cell reference data with diseased tissue data from other consortia. This hackathon will empower researchers to characterize disease states at the cellular level by integrating data from programs like CFDE, KPMP, and beyond. Help bridge the gap between healthy and diseased tissues to drive new insights in biomedical research!
Why this Project Applicable to Others in the Community? Jupyter notebooks developed by this project can be easily shared, re-used, and improved by members of the broader community through the HuBMAP Portal Workspaces. These notebooks can also be turned into templates that make it easy for others to tackle similar projects.
How is the Project Open Source and/or FAIR? As mentioned above, the results will be openly shared on HuBMAP Portal Workspaces.

Project 4: Unraveling Exercise Resilience: Multi-Omics Meets Machine Learning
Institution: CFDE/MoTrPAC
Team Lead: Mihir Samdarshi, Software/Bioinformatics Engineer II, Stanford University School of Medicine
About the Project: Join us in leveraging the unique multi-omics dataset from MoTrPAC to explore how endurance exercise training affects various biological processes in rats, from transcripts and proteins to metabolites, lipids, and more. The MoTrPAC dataset captures the long-term adaptive changes in rats undergoing endurance exercise training, monitored at one, two, four, and eight weeks. We have isolated the long-term effects of exercise, providing a "healthy" reference dataset. The benefits of exercise are well known, and this data can be compared and contrasted with various disease models in rats and integrated with other Common Fund Data Ecosystem projects like LINCs or the Druggable Genome. Our project involves adapting existing R-based analyses and datasets to a Python framework, focusing on machine learning and data integration approaches. We envision participants creating time-series models, performing unsupervised learning (e.g., clustering and dimensionality reduction), and, most importantly, linking the dataset to publicly available data where rats were used as the model organism. The goal would be to identify protective or high-risk molecular signatures. By leveraging Python's strong machine-learning ecosystem, we can foster new analytical approaches, facilitate broader collaboration, and uncover fresh insights into how exercise confers resilience against disease.
Why this Project Applicable to Others in the Community? Exercise is known to protect against numerous diseases, yet the underlying molecular mechanisms are still being explored. Creating a Python-based workflow for the MoTrPAC multi-omics dataset empowers researchers who prefer Python tools to engage with this rich resource. This helps unify a broader scientific community, bridging the gap between R-based and Python-based analyses. Beyond exercise research, the project highlights a universal challenge: integrating diverse omics datasets to reveal biological insights. The methods developed, such as clustering, dimensionality reduction, and time-series modeling, can be applied to other large-scale, multi-omics studies. Scientists investigating cardiovascular disease, metabolic disorders, or even basic biology can adapt these approaches to their own datasets, expanding the project’s relevance. Additionally, we plan to integrate data from publicly available rat disease studies, which helps researchers from various fields discover protective or risk-associated signatures. This collaboration will foster knowledge sharing, reproducible methodologies, and community-driven improvements. By producing reusable, open-source code, the project lowers barriers to entry for future analyses, ensuring that a wider audience can benefit from, and build upon, our findings.
How is the Project Open Source and/or FAIR? The project adopts FAIR principles and open-source practices. MoTrPAC data are publicly available through multiple platforms, ensuring the dataset is findable and accessible to all participants. In transitioning from R to Python, we will document every step, data conversion, code structure, and analysis workflows, so the methods remain transparent and interoperable across multiple platforms. We plan to share our code in a public repository (e.g., GitHub), complete with detailed instructions, environment files, and licensing statements, ensuring that others can easily reuse and adapt our work. Open-source development fosters community feedback and encourages contributors to expand or refine the workflows, further enhancing reusability. Moreover, by demonstrating how to link MoTrPAC data with other public rat datasets, we show how scientific data can be combined, creating new opportunities for discovery. This interoperability underscores the value of FAIR data practices and positions the project as a resource that can continually evolve with community input and collaboration.

Project 5: Interactive Analysis with Biological Pathways
Institution: Broad Institute
Team Lead: Eric Weitz, Principal Software Engineer, Data Sciences Platform, Broad Institute of MIT and Harvard
About the Project: We will use Jupyter notebooks and RStudio on Terra, along with WikiPathways diagrams, to interactively analyze data from various modalities, e.g. variation, expression, and proteomic datasets. Public data will be exported from the AnVIL Data Explorer. Underlying content will be retrieved via DRS URIs, a GA4GH standard that enables interoperable data access regardless of underlying cloud infrastructure. Genes output by our analyses -- e.g. differentially expressed genes, or genes containing pathogenic variants -- will be queried against the WikiPathways corpus to return biological pathways enriched in those genes. These pathways will be rendered and made interactive with JavaScript / TypeScript, to e.g. color genes by size and significance of DE, and display gene metadata upon hovering over nodes.
Why this Project Applicable to Others in the Community? Pathway diagrams from WikiPathways give uniquely rich causal insight for disease mechanisms and gene regulatory networks. This is due to the diagrams' graphics and programmability. WikiPathways diagrams use a close interpretation of source graphics from biomedical literature, preserving the visual arrangement of cell membranes, organelles, and relative positioning of gene product nodes among these compartments in a manner that makes them easy to interpret at a glance. The SVG and GPML artifacts for WikiPathways diagrams are also highly inspectable and manipulable in both server (e.g. R, Python) and client (JS / TS) programming languages. Providing integrated, interactive pathway diagrams as exploratory tools for biological data visualization will accelerate basic and translational research in biomedicine.
How is the Project Open Source and/or FAIR? All code will be freely-licensed, and available on Terra and GitHub. Pathway diagrams and other content from WikiPathways is in the public domain (CC0). The genomic data we will use is public and openly available through the NHGRI Genomic Analysis, Visualization and Informatics Lab-space (AnVIL) Data Explorer. DRS URIs are a GA4GH standard for interoperable data across cloud providers. This hackathon project will combine these FAIR components in a new, useful product for biomedical research across various domains of genomics.

Project 6: FAIR Maturity Matrix Assessment: The DATA Dimension
Institution: Pistoia Alliance
Team Lead: Giovanni Nisato, PhD, Consultant, Project Manager FAIR implementation, Pistoia Alliance
About the Project: The Pistoia Alliance produced the FAIR Maturity Matrix (FAIRMM) (CC by 4.0), a framework to evaluate the maturity of organizations and guide their FAIR implementation journeys.
- Day 1 AM: Hackathon participants are encouraged to “bring their own FAIR case” (a specific group or department they know ) to apply the FAIR MM framework to assess the maturity level according to the 7 dimensions of the model. This activity will introduce interactively the FAIR MM framework be deploying it to real case situations.
- Day 1 PM and Day 2: “If you want to know how FAIR your data really is, ask a machine”. The goal of the hackathon is to work on methods and tools to assess the state of FAIR data sets (cf: “FAIR data” dimension). Such a tool or method itself could be integrated in future version of the FAIRMM as part of the “FAIR tools and infrastructure dimension”. Several FAIR data assessment exist, as summarized for example "The Road to FAIRness: An Evaluation of FAIR Data Assessment Tools" (The Hyve). The goal is to adapt, create or improve one reference FAIR data assessment instrument and to provide an expert community recommendation as to which tool to use.
Why this Project Applicable to Others in the Community? While Several FAIR data assessment instruments exist, there are several of them and they provide different outcomes for the same data sets. This creates confusion and may lead to contradictory assessment of data set “FAIRness”. FAIR data is the tip of the iceberg and it is important that different experts performing a FAIR MM assessment reach similar conclusions, especially for the “FAIR DATA” dimension.
How is the Project Open Source and/or FAIR? FAIR data assessment methodology need to rely on instruments that are, as much as possible abiding themselves to FAIR data principles (all of them). FAIR data assessment instrument can be built on the Pistoia Alliance FAIR community of experts (public) Github.Teams are encouraged to “bring their own data” (or better: UIDs to accessible “FAIR” data) to test Alternatively, code can be contributed to other existing GitHub repositories.

Project 7: Mapping Immune States in SLE: A FAIR Pipeline for Integrating Spatial, Single-Cell, and Flow Cytometry Data
Institution: Science and Technology Consulting, LLC
Team Lead: Anne Deslattes Mays, PhD, Principal Consultant, Science and Technology Consulting LLC
About the Project: Can we build a scalable, reproducible pipeline to integrate spatial and single-cell RNA sequencing data with flow cytometry to define immune states in Systemic Lupus Erythematosus (SLE) patients across flare, managed, and treated conditions?
Why this Project Applicable to Others in the Community? Scientific Rationale Why this matters: There are existing SLE single-cell RNA-seq studies that lack a spatial context for their cells. Additionally, other studies provide flow cytometry based immune profiling, characterizing shifts in cell populations using specific markers, also lacking a specific spatial context regarding the particular tissue origins of these cells. Newer spatial single-cell RNA-seq (e.g., Visium, Xenium, CosMx, MERSCOPE) can reveal immune cell localization and their states in affected tissues (e.g., kidney, synovium, skin in SLE). By integrating all three modalities, we could reconstruct a multi-scale immune map of SLE.
How is the Project Open Source and/or FAIR? This project is fully open and FAIR by design, ensuring transparency, reproducibility, and accessibility at every level. The GitHub repository will host all code, including those for the container construction and the workflows and notebooks used for batch and interactive processing. Lifebit's newly available free resource, NF-Copilot, will orchestrate workflow execution. This free resource enables researchers to run Nextflow workflows, pulling code directly from GitHub, ensuring reproducibility. All data is open data from cellxgene and ImmPort. Derivative data products will be deposited on Zenodo with DOI assignments for proper citation and long-term accessibility. Workflow development will follow the approach highlighted in the forthcoming book "Elements of Style in Creating Workflows for Biomedical Research," to be published by Springer Verlag by Dr. Deslattes Mays. AWS will be providing the credits for this work. By prioritizing accessibility, openness, and computational reproducibility, this initiative will accelerate SLE research and establish best practices for FAIR biomedical workflow development.

What to Expect in 2025:

The 2025 hackathon will continue to unite life science and IT professionals to address pressing data challenges using Open Source and FAIR Data approaches. Facilitated by leaders from the NIH Common Fund Data Ecosystem (CFDE), this year’s event will emphasize projects leveraging omics data and integrating CFDE tools, improving interoperability across datasets to accelerate discoveries.

The CFDE ensures Common Fund data is accessible and reusable, providing researchers with a centralized online platform for integrating multiple resources seamlessly—enabling new insights and scalable solutions.

Why Participate?

Solve Real-World Challenges – Address critical data problems using Open Source and FAIR Data principles.
Collaborate with Experts – Partner with peers to develop workflows, datasets, and tools that advance biomedical discovery.
Gain Hands-On Experience – Work with cutting-edge technologies in bioinformatics, AI, and cloud-based data analysis.

The Hackathon is free and in-person only. Discounted registration rates are available for access to Bio-IT World’s conference tracks, keynote sessions, and exhibit hall.

How to Get Involved:

Have an idea?
Submit your project proposal for review.
Deadline for Submission: February 28.

Submit Proposal

Want to Join a Team?
Complete this form to tell us a little bit about yourself, and we will follow up with you regarding the status.

Complete Form

For more details on the Hackathon, please contact:

Cindy Crowninshield, Executive Event Director
(781) 247-6258
ccrowninshield@healthtech.com

For partnering and sponsorship information, please contact:

Companies A-F
Rod Eymael, Mgr., Business Development
(781) 247-6286
reymael@healthtech.com

Companies G-Z
Aimee Croke, Business Development Manager
(781) 292-0777
acroke@cambridgeinnovationinstitute.com

Project Highlights from Past Hackathons:

Gene Trends (Broad Institute): Tools to track gene popularity in biomedical research
SRA Workflow Integration (NIH/NCBI): Cloud-based genomic analysis
MYC Amplification Research (NIH): Pediatric disease workflows
Knowledge Graphs for Disease Subtyping (DNAnexus): Personalized medicine tools
kidSIDES (Regeneron): Pediatric drug safety database
Iterative Cluster Analysis Using Multi-Omics Modalities (NIH): Multi-omics clustering for oncology and immune response research
Creating Computable Knowledge (NVIDIA): NLP pipelines for biomedical data supporting drug discovery
Visualization of NCBI ALFA Variants (NIH): Tools for navigating allele frequency data
BLAST, Pipelines, and FAIR (NIH): Workflow enhancements for FAIR bioinformatics pipelines
FAIR Beyond Data (Jackson Laboratory): Platform-agnostic FAIR-compliant applications
Integrating Globus into Galaxy (University of Chicago): Improved FAIRifying workflows
Single-Cell RNA-Seq Cancer Data (Broad Institute): FAIR genomics for cancer research
Generating a Fungal Index (Find Bioscience): Web-based fungal data indexing
DOE JGI Genomics Data Set (U.S. Department of Energy Joint Genome Institute): Assessed the FAIRness of environmental genomics data systems, linking to community efforts
BioAssay Express: Applying FAIR Principles to Bioassay Protocols (Collaborative Drug Discovery): Developed annotation templates for experimental assay protocols to improve FAIR methodology reporting for qPCR, microarray, and other bioassays
NCATS Biomedical Data Translator (NIH/NCATS): Tested the interoperability of federated tools integrating biomedical knowledge across domains

Conference Tracks

T1: Data Platforms & Storage Infrastructure