Skip to content

man4ish/omnibioai-workflow-bundles

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

99 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OmniBioAI Workflow Bundles

Overview

omnibioai-workflow-bundles is the canonical repository for engine-agnostic, versioned bioinformatics workflow bundles used by the OmniBioAI Workflow Registry Service.

This repository is used for authoring, versioning, and testing workflows, and is not required in deployed OmniBioAI runtime environments.

All workflows in this repository are:

  • Authored and version-controlled in Git
  • Packaged as immutable workflow bundles
  • Uploaded via CLI into OmniBioAI
  • Stored as objects in OmniObjectService
  • Indexed in the Workflow Registry Service

Important: This repository is not accessed at runtime by OmniBioAI services or plugins. Runtime execution always resolves workflows via the Workflow Registry + Object Storage layer, never directly from Git.


What Is a Workflow Bundle?

A workflow bundle is a versioned, self-contained artifact that defines everything required to execute a computational pipeline using a specific workflow engine.

Each bundle may include:

  • Workflow definition files:
    • WDL (Cromwell)
    • Nextflow
    • Snakemake
    • CWL
  • Engine-specific configuration files
  • Container definitions (Docker / Conda / Apptainer)
  • Reference datasets or helper scripts (optional)
  • A strict manifest.json describing metadata and entrypoints

Key Principle: Versioned Immutability

Bundles in Git are mutable during development, but:

Every upload to OmniBioAI produces an immutable runtime artifact

Once registered:

  • Bundles are never modified
  • Each update creates a new version
  • Each version receives a unique object_id

Supported Workflow Engines

This repository supports multiple workflow engines:

  • WDL (Cromwell-compatible)
  • Nextflow
  • Snakemake
  • CWL

Important Rule

Each workflow bundle targets exactly one engine.

Equivalent workflows implemented in different engines are stored as separate bundles.


Repository Structure

This repository is organized by biological domain, with each subfolder containing multiple workflow bundles.

omnibioai-workflow-bundles/
├── admet_prediction/        (1 bundle)
├── ancient_dna/             (1 bundle)
├── atacseq/                 (1 bundle)
├── bio_kg/                  (1 bundle)
├── cellranger/              (1 bundle)
├── chipseq/                 (1 bundle)
├── circrna/                 (1 bundle)
├── cite_seq/                (1 bundle)
├── clinical/                (115 bundles)
├── clinical_trial_matching/ (1 bundle)
├── crispr/                  (1 bundle)
├── crispr_ml/               (1 bundle)
├── ctdna_analysis/          (1 bundle)
├── drug_response_prediction/(1 bundle)
├── drug_synergy/            (1 bundle)
├── epigenomics/             (49 bundles)
├── exome_clinical/          (1 bundle)
├── foundation_model/        (1 bundle)
├── funcgen/                 (94 bundles)
├── genome_assembly/         (1 bundle)
├── hic_analysis/            (1 bundle)
├── hla_typing/              (1 bundle)
├── immune_deconvolution/    (1 bundle)
├── lncrna_analysis/         (1 bundle)
├── long_read_rna/           (1 bundle)
├── longread/                (34 bundles)
├── metagenomics/            (1 bundle)
├── methylation/             (1 bundle)
├── microbiome/              (23 bundles)
├── mirna_seq/               (1 bundle)
├── multimodal/              (109 bundles)
├── multimodal_integration/  (1 bundle)
├── multiomics/              (1 bundle)
├── nanopore_methylation/    (1 bundle)
├── obsolete/                (3 bundles)
├── pangenome/               (1 bundle)
├── pharmacogenomics/        (1 bundle)
├── polygenic_risk_score/    (1 bundle)
├── population_genetics/     (1 bundle)
├── proteomics/              (28 bundles)
├── proteomics_ms/           (1 bundle)
├── riboseq/                 (1 bundle)
├── rna_editing/             (1 bundle)
├── rnaseq/                  (48 bundles)
├── rnaseq_v1/               (1 bundle)
├── scatac_seq/              (1 bundle)
├── single_cell_multiome/    (1 bundle)
├── single_cell_multiomics/  (1 bundle)
├── single_cell_vdj/         (1 bundle)
├── singlecell/              (36 bundles)
├── spatial/                 (14 bundles)
├── spatial_hd/              (1 bundle)
├── spatial_multiomics/      (1 bundle)
├── spatial_proteomics/      (1 bundle)
├── sv/                      (1 bundle)
├── target_identification/   (1 bundle)
├── variant_ml/              (1 bundle)
├── wes/                     (1 bundle)
└── wgs/                     (1 bundle)

Key Rules

  • Each directory under a domain (e.g., wes/, rnaseq/) contains multiple workflow bundles
  • Each bundle is self-contained and versioned
  • Directory names are human-readable only
  • Canonical identity is defined by manifest.json, not filesystem paths

Bundle Status

Auto-generated | Last updated: 2026-06-10

Metric Count
Total bundles 601
Nextflow 593
WDL 5
Snakemake 2
CWL 1
Docker enabled 600
Has nf-tests 597
Fully production-ready 597

By Domain

Domain Bundles Docker Has Tests
admet_prediction 1 1 1
ancient_dna 1 1 1
atacseq 1 1 1
bio_kg 1 1 1
cellranger 1 1 1
chipseq 1 1 1
circrna 1 1 1
cite_seq 1 1 1
clinical (adapter_trimming, adaptive_designs, batch_processing, bayesian_trials, bowtie2_alignment +110 more) 115 115 115
clinical_trial_matching 1 1 1
crispr 1 1 1
crispr_ml 1 1 1
ctdna_analysis 1 1 1
drug_response_prediction 1 1 1
drug_synergy 1 1 1
epigenomics (epigenomics, ago_clip_mirna_targets, allele_specific_binding, binding_site_annotation, bismark_alignment +44 more) 49 49 49
exome_clinical 1 1 1
foundation_model 1 1 1
funcgen (ancestral_reconstruction, association_testing, base_editing_design, bed_file_basics, bedgraph_handling +89 more) 94 94 94
genome_assembly 1 1 1
hic_analysis 1 1 1
hla_typing 1 1 1
immune_deconvolution 1 1 1
lncrna_analysis 1 1 1
long_read_rna 1 1 1
longread (assembly_polishing, assembly_qc, basecalling, clair3_variants, contamination_detection +29 more) 34 34 34
metagenomics 1 1 1
methylation 1 1 1
microbiome (abundance_estimation, amplicon_16s, amplicon_processing, amr_detection, amr_surveillance +18 more) 23 23 23
mirna_seq 1 1 1
multimodal (alphafold2, alphafold_predictions, atlas_mapping, bio_nlp_transformers, biomarker_discovery +104 more) 109 109 109
multimodal_integration 1 1 1
multiomics 1 1 1
nanopore_methylation 1 1 1
obsolete (chipseq_wdl_v1, omnibioai_wes_snakemake_v1, spatial_single_cell_alevin_fry_wdl_v1) 3 2 0
pangenome 1 1 1
pharmacogenomics 1 1 1
polygenic_risk_score 1 1 1
population_genetics 1 1 1
proteomics (alphafold2_structure_prediction, alphafold_structure, bindcraft_design, boltz_generative, boltz_structure +23 more) 28 28 28
proteomics_ms 1 1 1
riboseq 1 1 1
rna_editing 1 1 1
rnaseq (alignment_free_quant, alt_splicing_quantification, alt_splicing_rmats, bulk_rnaseq_deseq2, count_matrix_qc +43 more) 48 48 47
rnaseq_v1 1 1 1
scatac_seq 1 1 1
single_cell_multiome 1 1 1
single_cell_multiomics 1 1 1
single_cell_vdj 1 1 1
singlecell (anndata_io, anndata_sc_io, archr_scatac, cell_communication, cellbender_ambient +31 more) 36 36 36
spatial (imc_annotation, imc_differential, imc_phenotyping, imc_preprocessing, imc_qc +9 more) 14 14 14
spatial_hd 1 1 1
spatial_multiomics 1 1 1
spatial_proteomics 1 1 1
sv 1 1 1
target_identification 1 1 1
variant_ml 1 1 1
wes 1 1 1
wgs 1 1 1

Bundle Identity and Versioning

Each workflow bundle is uniquely identified by:


(category, engine, name, version)

Example


category: wes
engine: snakemake
name: omnibioai_wes_snakemake_v1
version: 1.0.0

Versioning Rules

When a new version is uploaded:

  • A new immutable bundle is created
  • A new object_id is generated
  • A new registry entry is inserted
  • Previous versions remain fully accessible and unchanged

Manifest Contract (manifest.json)

Each bundle MUST include a manifest.json file describing its canonical metadata.

Example

{
  "name": "omnibioai_wes_snakemake_v1",
  "display_name": "Whole Exome Sequencing Pipeline (Snakemake)",
  "category": "wes",
  "engine": "snakemake",
  "version": "1.0.0",
  "entrypoint": "workflow/Snakefile",
  "configs": ["config/inputs.json"],
  "description": "End-to-end WES pipeline: QC, trimming, alignment, variant calling",
  "container_support": {
    "docker": true,
    "conda": false,
    "apptainer": false
  },
  "container_image": "ghcr.io/man4ish/omnibioai-wes:1.0.0",
  "tools": ["trimmomatic", "bwa", "samtools", "gatk"],
  "has_tests": true
}

Contract Rules

Required (9): name, display_name, category, engine, version, entrypoint, configs, description, container_support

Optional: container_image, tools, has_tests, inputs_schema, outputs

  • Manifest is the single source of truth
  • Registry never infers metadata from file paths
  • Entry points must be explicit
  • Tool dependencies should be declared

Dependency & Execution Model

Primary Execution Mode: Docker via Nextflow

600 of 601 bundles use Docker. The standard pattern declares it in nextflow.config:

docker {
    enabled = true
}
process {
    container = 'ghcr.io/man4ish/omnibioai-<domain>:<version>'
}

Other Execution Modes

  • Apptainer/Singularity: for HPC environments (1 bundle)
  • Physical Dockerfile: present in bundles that build custom images (76 bundles)
  • Conda: supported where declared via conda process directive (0 currently)

Key Requirement

Each process must declare its runtime environment explicitly. Workflows must NOT assume globally installed bioinformatics tools.


Relationship to the Workflow Registry

The Workflow Registry Service is the authoritative metadata index for all OmniBioAI workflows.

Separation of Responsibilities

Component Responsibility
Workflow Bundles Repo Authoring & version control
CLI Upload Tool Validation & packaging
Workflow Registry Metadata indexing & discovery
OmniObjectService Immutable bundle storage
Execution Engine Workflow materialization & runtime

Registry = metadata Object Store = immutable artifacts

The registry does not store files or paths — only object_id references.


Workflow Ingestion (CLI-first)

Bundles are uploaded via the OmniBioAI CLI.

Example Staging Structure

input/
  omnibioai_wes_snakemake_v1/
    manifest.json
    workflow/
    config/
    envs/

Upload Command

python manage.py workflow_upload \
  --bundle input/omnibioai_wes_snakemake_v1 \
  --created-by manish \
  --enable

Ingestion Process

  1. Validate bundle structure
  2. Parse manifest.json
  3. Validate entrypoint & configs
  4. Package bundle into archive
  5. Upload to OmniObjectService
  6. Receive object_id
  7. Create immutable registry entry

Testing Workflow Bundles

All Nextflow bundles use nf-test for unit and integration testing.

Bundle Test Structure

<bundle>/
├── nf-test.config
└── tests/
    ├── main.nf.test           # pipeline-level test
    └── <PROCESS_NAME>.nf.test # per-process tests (if ≤ 10 processes)

Running Tests

# Syntax check (no execution)
cd <bundle_dir> && nf-test list

# Run all tests for one bundle
cd <bundle_dir> && nf-test test

# Run across all bundles in a domain
for bundle in atacseq/*/; do
  echo "=== $bundle ===" && (cd "$bundle" && nf-test test 2>&1 | tail -3)
done

Writing New Tests

  1. Read the workflow entrypoint — extract process names and params block
  2. Create nf-test.config (copy from any existing bundle)
  3. Write tests/main.nf.test using real process/param names from step 1
  4. Run nf-test list to validate syntax
  5. Set "has_tests": true in manifest.json

Test coverage is tracked in reports/tests_written.json.


How Workflows Are Used at Runtime

OmniBioAI plugins do not access this repository directly.

Instead:

  1. Plugin queries Workflow Catalog Service

  2. User selects workflow + version

  3. Execution request is submitted

  4. Runtime system:

    • Resolves registry entry
    • Fetches bundle via object_id
    • Materializes workflow in execution environment

Guarantees

  • No Git dependency at runtime
  • No filesystem-based discovery
  • Fully reproducible execution
  • Complete audit trail per object_id

Design Principles

  • ID-first architecture

  • Immutable workflow artifacts

  • Engine-agnostic bundle specification

  • Metadata-driven discovery

  • CLI-first ingestion workflow

  • Strict separation of:

    • Authoring
    • Registry
    • Storage
    • Execution

Intended Audience

This repository is intended for:

  • Workflow engineers
  • Bioinformatics developers
  • OmniBioAI platform maintainers

End users interact only through the OmniBioAI UI and APIs, not directly with this repository.


What This Repository Is NOT

  • ❌ A runtime execution environment
  • ❌ A plugin system
  • ❌ A database or registry
  • ❌ A production workflow scheduler
  • ❌ A mutable shared execution workspace

Bundle Index

admet_prediction

ancient_dna

atacseq

bio_kg

cellranger

chipseq

circrna

cite_seq

clinical

clinical_trial_matching

crispr

crispr_ml

ctdna_analysis

drug_response_prediction

drug_synergy

epigenomics

exome_clinical

foundation_model

funcgen

genome_assembly

hic_analysis

hla_typing

immune_deconvolution

lncrna_analysis

long_read_rna

longread

metagenomics

methylation

  • methylation (README missing)

microbiome

mirna_seq

multimodal

multimodal_integration

multiomics

nanopore_methylation

obsolete

pangenome

pharmacogenomics

polygenic_risk_score

population_genetics

proteomics

proteomics_ms

riboseq

rna_editing

rnaseq

rnaseq_v1

  • rnaseq_v1 (README missing)

scatac_seq

single_cell_multiome

single_cell_multiomics

single_cell_vdj

singlecell

spatial

spatial_hd

spatial_multiomics

spatial_proteomics

sv

target_identification

variant_ml

wes

wgs


One-Sentence Summary

omnibioai-workflow-bundles is a version-controlled authoring repository for engine-specific bioinformatics workflows that are packaged into immutable artifacts and registered in the OmniBioAI Workflow Registry for reproducible, metadata-driven execution across multiple workflow engines.

About

Engine-agnostic, versioned bioinformatics workflow bundles for OmniBioAI, supporting WDL, Nextflow, Snakemake, and CWL. Designed for reproducible, registry-driven pipeline plugins across genomics and multi-omics.

Topics

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors