hackathon-header

Hackathon Sponsor - ORAU

Tuesday, April 1 – Wednesday, April 2, 2025

The Bio-IT World Hackathon is a cornerstone of the Bio-IT World Conference & Expo, bringing together data scientists, developers, and life science professionals to tackle real-world data challenges. Focused on Open Source and FAIR Data (Findable, Accessible, Interoperable, Reusable) principles, this two-day event fosters innovation and collaboration to deliver practical solutions.

2025 Projects to Date (we expect 8-10 projects):

Project 1: GlycoEnzyme Expression Atlas: Linking Differential Expression to Pathway Dysregulation
Institution: 
CFDE
About the Project: The GlycoEnzyme Expression Atlas project aims to establish connections between glycoenzyme expression patterns and pathway dysregulation across various disease states. This bioinformatics initiative will integrate multiple data types and analytical approaches: RNA-seq data preprocessing using DESeq2/EdgeR for differential expression, Mapping of glycoenzyme genes using CAZy and GlyGen databases Integration with KEGG/Reactome pathway annotations Network analysis via Cytoscape/STRING for interaction mapping.
Why this Project Applicable to Others in the Community? The GlycoEnzyme Expression Atlas project aims to establish connections between glycoenzyme expression patterns and pathway dysregulation across various disease states. This bioinformatics initiative will integrate multiple data types and analytical approaches: RNA-seq data preprocessing using DESeq2/EdgeR for differential expression, Mapping of glycoenzyme genes using CAZy and GlyGen databases Integration with KEGG/Reactome pathway annotations Network analysis via Cytoscape/STRING for interaction mapping.
How is the Project Open Source and/or FAIR? 
The GlycoEnzyme Expression Atlas project follows FAIR (Findable, Accessible, Interoperable, and Reusable) principles and is designed as an open-source initiative to enhance glycoenzyme-related research. By integrating glycosyltransferase (GT) and glycohydrolase (GH) gene lists, differential expression data, and pathway analysis, the project ensures that data is findable through public repositories like GlyGen, GEO, and the EMBL-EBI Expression Atlas, using standardized metadata and persistent identifiers. To ensure accessibility, all datasets, analysis pipelines, and visualization tools will be freely available on platforms like GitHub and Zenodo, following open-access policies. The project prioritizes interoperability by adhering to standardized ontologies such as EDAM, OBI, and Human Disease Ontology, and using widely accepted data formats like FASTA, CSV, and JSON to facilitate integration with existing bioinformatics tools and databases. By maintaining well-documented pipelines, standardized methodologies, and open-source licensing (e.g., MIT, Apache 2.0, or Creative Commons CC-BY-4.0), the project guarantees reusability.


Project 2: 
DrugCentral Based Review and Profiles of Targets for Approved Drugs
Institution: CFDE
About the Project: Dive into the world of pharmacology with a dynamic project centered on DrugCentral’s comprehensive database of approved drugs and their molecular targets. This project offers multiple pathways to innovation, catering to all levels of expertise—whether you prefer an intuitive, no-code experience or want to dive into advanced programming. Utilize DrugCentral’s interactive web interface to explore drug-target profiles through seamless hyperlinks to interoperable databases like Pharos, or get hands-on with powerful data analysis tools. Harness SQL with PostgreSQL or leverage Python, potentially with Jupyter notebooks, to generate insightful, visually engaging statistics. Focus your efforts on broad descriptive statistics for the entire pharmacopeia, or zoom in on specific drug classes, target families, or disease areas. DrugCentral’s curated target associations—complete with PubMed references—provide a rich, evidence-based foundation for deep exploration and storytelling. Whether you’re driven by data, inspired by disease pathways, or motivated by meaningful drug discovery insights, this project invites you to craft impactful narratives in drug-target research. Jump in and see where your curiosity takes you!
Why this Project Applicable to Others in the Community? DrugCentral is an online compendium of drug information focused on approved drugs, created and maintained by the University of New Mexico and the IDG program. DrugCentral can be accessed via web UI, as a PostgreSQL db cloud or local instance, or via Python API. One of the critical areas of pharmaceutical discovery and development is the identification and validation of biomolecular targets, and novel targets in particular, which can facilitate new and improved therapies, and this has been the overarching goal of the IDG program. DrugCentral can assist in this research area by representing the high-confidence known targets for approved drugs.
How is the Project Open Source and/or FAIR? 
DrugCentral is fully public and open access via several methods and channels. Entities including chemicals, diseases, genes, and proteins, are identified via community standard vocabularies and IDs for semantically rigorous interoperability. 


Project 3: 
Interactive Analysis with Biological Pathways
Institution: 
Broad Institute
About the Project: 
We will use Jupyter notebooks on Terra and WikiPathways diagrams to interactively analyze data from various modalities, e.g. variation, expression, and proteomic datasets.
Why this Project Applicable to Others in the Community? Pathway diagrams give causal insight for disease mechanisms and gene regulatory networks. Integrating these as exploratory tools for biological data visualization will accelerate basic and translational research.
How is the Project Open Source and/or FAIR? 
All code will be freely-licensed, and available.  


Project 4: FAIR Maturity Matrix Assessment: The DATA Dimension
Institution: 
Pistoia Alliance
About the Project: 
The Pistoia Alliance produced the FAIR Maturity Matrix (FAIRMM) (CC by 4.0), a framework to evaluate the maturity of organizations and guide their FAIR implementation journeys.
- Day 1 AM: Hackathon participants are encouraged to “bring their own FAIR case” (a specific group or department they know ) to apply the FAIR MM framework to assess the maturity level according to the 7 dimensions of the model. This activity will introduce interactively the FAIR MM framework be deploying it to real case situations. 
- Day 1 PM and Day 2: “If you want to know how FAIR your data really is, ask a machine”. The goal of the hackathon is to work on methods and tools to assess the state of FAIR data sets (cf: “FAIR data” dimension). Such a tool or method itself could be integrated in future version of the FAIRMM as part of the “FAIR tools and infrastructure dimension”.  Several FAIR data assessment exist, as summarized for example "The Road to FAIRness: An Evaluation of FAIR Data Assessment Tools" (The Hyve). The goal is to adapt, create or improve one reference FAIR data assessment instrument and to provide an expert community recommendation as to which tool to use.
Why this Project Applicable to Others in the Community? While Several FAIR data assessment instruments exist, there are several of them and they provide different outcomes for the same data sets.  This creates confusion and may lead to contradictory assessment of data set “FAIRness”. FAIR data is the tip of the iceberg and it is important that different experts performing a FAIR MM assessment reach similar conclusions, especially for the “FAIR DATA” dimension.
How is the Project Open Source and/or FAIR? 
FAIR data assessment methodology need to rely on instruments that are, as much as possible abiding themselves to FAIR data principles (all of them). FAIR data assessment instrument can be built on the Pistoia Alliance FAIR community of experts (public) Github.Teams are encouraged to “bring their own data” (or better: UIDs to accessible “FAIR” data) to test  Alternatively, code can be contributed to other existing GitHub repositories.

 


What to Expect in 2025:

The 2025 hackathon will continue to unite life science and IT professionals to address pressing data challenges using Open Source and FAIR Data approaches. Facilitated by leaders from the NIH Common Fund Data Ecosystem (CFDE), this year’s event will emphasize projects leveraging omics data and integrating CFDE tools, improving interoperability across datasets to accelerate discoveries.

The CFDE ensures Common Fund data is accessible and reusable, providing researchers with a centralized online platform for integrating multiple resources seamlessly—enabling new insights and scalable solutions.



Why Participate?

  • Solve Real-World Challenges – Address critical data problems using Open Source and FAIR Data principles.
  • Collaborate with Experts – Partner with peers to develop workflows, datasets, and tools that advance biomedical discovery.
  • Gain Hands-On Experience – Work with cutting-edge technologies in bioinformatics, AI, and cloud-based data analysis.

The Hackathon is free and in-person only. Discounted registration rates are available for access to Bio-IT World’s conference tracks, keynote sessions, and exhibit hall.


How to Get Involved:

Have an idea?
Submit your project proposal for review.
Deadline for Submission: February 28.

Submit Proposal

Want to Join a Team?
Complete this form to tell us a little bit about yourself, and we will follow up with you regarding the status.

Complete Form



For more details on the Hackathon, please contact:

Cindy Crowninshield, Executive Event Director
(781) 247-6258
ccrowninshield@healthtech.com

For partnering and sponsorship information, please contact:

Companies A-F
Rod Eymael, Mgr., Business Development
(781) 247-6286
reymael@healthtech.com

Companies G-Z
Aimee Croke, Business Development Manager
(781) 292-0777
acroke@cambridgeinnovationinstitute.com

 

Project Highlights from Past Hackathons:

  • Gene Trends (Broad Institute): Tools to track gene popularity in biomedical research
  • SRA Workflow Integration (NIH/NCBI): Cloud-based genomic analysis
  • MYC Amplification Research (NIH): Pediatric disease workflows
  • Knowledge Graphs for Disease Subtyping (DNAnexus): Personalized medicine tools
  • kidSIDES (Regeneron): Pediatric drug safety database
  • Iterative Cluster Analysis Using Multi-Omics Modalities (NIH): Multi-omics clustering for oncology and immune response research
  • Creating Computable Knowledge (NVIDIA): NLP pipelines for biomedical data supporting drug discovery
  • Visualization of NCBI ALFA Variants (NIH): Tools for navigating allele frequency data
  • BLAST, Pipelines, and FAIR (NIH): Workflow enhancements for FAIR bioinformatics pipelines
  • FAIR Beyond Data (Jackson Laboratory): Platform-agnostic FAIR-compliant applications
  • Integrating Globus into Galaxy (University of Chicago): Improved FAIRifying workflows
  • Single-Cell RNA-Seq Cancer Data (Broad Institute): FAIR genomics for cancer research
  • Generating a Fungal Index (Find Bioscience): Web-based fungal data indexing
  • DOE JGI Genomics Data Set (U.S. Department of Energy Joint Genome Institute): Assessed the FAIRness of environmental genomics data systems, linking to community efforts
  • BioAssay Express: Applying FAIR Principles to Bioassay Protocols (Collaborative Drug Discovery): Developed annotation templates for experimental assay protocols to improve FAIR methodology reporting for qPCR, microarray, and other bioassays
  • NCATS Biomedical Data Translator (NIH/NCATS): Tested the interoperability of federated tools integrating biomedical knowledge across domains

 


Register

Search Agenda

Conference Tracks

Data Platforms & Storage Infrastructure