CellPathway: Cell type-specific enhancer informs pathway-based approaches for association analysis in whole-genome sequencing studies
CellPathway is a framework that directly tests associations between noncoding variants and cell type-specific pathways defined by enhancer activity.
git clone https://github.com/WGLab/CellPathway.git
cd CellPathwayconda create -n cellpathway python=3.10 -y
conda activate cellpathwayconda install -c bioconda bedtools pybigwig -ypip install -r requirements.txtpython cellpathway_enrich.py \
--enhancer-dir data/Atlas \
--dnm-file example/autism_dnm.txt \
--output-dir example \
--cadd-threshold 10After enrichment, run TAD annotation for a specific cell type using the overlap BED file generated in the previous step:
python cellpathway_tad.py \
--overlap-bed example/dnm_enhc_overlap_cadd_10/Fetal_brain_dnm.bed \
--tad-file data/tad_w_boundary_08.bed \
--elements-bb data/genes_w_noncoding.bb \
--gene-list example/SFARI_Gene.csv \
--output example/tad_Fetal_brain_autism.csvThe --overlap-bed path comes from the enrichment output. Replace Fetal_brain with any cell type from your results.
| File | Description |
|---|---|
example/autism_dnm.txt |
De novo mutations with CADD scores (tab-delimited: Chr, Start, End, CADD_PHRED) |
data/Atlas/ |
Cell type-specific enhancer BED files (51 tissues, *.hg38.bed) |
data/Single_cell/ |
Single-cell enhancer BED files (169 cell types, *.hg38.bed) |
data/tad_w_boundary_08.bed |
TAD regions and boundaries (hg38) |
data/genes_w_noncoding.bb |
Gene annotations in BigWig format |
example/SFARI_Gene.csv |
SFARI autism-associated gene list |
- Enrichment table (
enrichment_FC_<cadd>.csv): cell type, enhancer bp, DNM overlap count, fold enrichment, p-value, FDR-adjusted p-value - TAD annotation (
tad_<cell>_autism.csv): enhancer-to-TAD mapping with gene names and known disease gene overlaps
Before running CellPathway, noncoding de novo mutations must be prepared for all samples. We recommend using ANNOVAR to identify and filter rare noncoding variants and to annotate variant pathogenicity using Combined Annotation Dependent Depletion (CADD) scores.