Skip to content

daklab/ATSEmapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ATSEmapper

ATSEmapper takes per-sample or per-cell splice junction files (from regtools) and identifies alternative splicing events (ATSEs) by building splice graphs across all samples and annotating junctions against a reference genome. Designed for single-cell and bulk RNA-seq data.

Overview

ATSEmapper contains two modules:

  1. ATSEmapper — maps ATSEs from splice junction files; outputs a compressed TSV of splicing events with PSI-based usage scores
  2. ATSEviz — Python plotting library for visualizing splicing events and isoforms in Jupyter notebooks

ATSEmapper

ATSEmapper efficiently identifies alternative splicing events from RNA-seq splice junction data generated by regtools.

Features

  • Batch processing of multiple splice junction files
  • Integration with genome annotations (GTF/GFF3 via gffutils)
  • Canonical splice site verification against genome sequence
  • Junction annotation and filtering
  • PSI-based splice site usage quantification

Input

ATSEmapper processes splice junction files generated by regtools:

regtools junctions extract -s 0 sample.bam -o sample.junctions.bed
  • Each file corresponds to one sample (bulk RNA-seq or single-cell)
  • Optimized for bulk RNA-seq and plate-based single-cell data
  • Input can be a directory of .bed/.junc files or a .txt file listing paths (one per line)
  • 10X Genomics (multi-cell BAM) support is under development

Installation

From GitHub (first time):

git clone https://github.com/yourusername/ATSEmapper.git
pip install -e ATSEmapper

If you already have the repo locally:

pip install -e /path/to/ATSEmapper

The -e flag installs in editable mode — atsemapper becomes importable anywhere in your environment and any code changes are immediately live without reinstalling.

This installs one CLI command: atsemapper.

Dependencies

  • Python ≥ 3.10
  • pandas
  • numpy
  • networkx
  • matplotlib
  • seaborn
  • pyfaidx (genome sequence access)
  • gffutils (annotation parsing)
  • tqdm

Usage

Command-line

atsemapper \
  --input path/to/junction/files \
  --annotation path/to/annotation.gtf \
  --genome path/to/genome.fa

Python API

from atsemapper.atsemapper.main import run_atsemapper

class Args:
    input = "path/to/junction/files"
    annotation = "path/to/annotation.gtf"
    genome = "path/to/genome.fa"
    output = None          # auto-generated if None
    db_path = None         # annotation.db path; created if missing
    min_intron = 50
    max_intron = 500000
    min_reads = 100
    min_cells = 2
    batch_size = 10
    num_workers = 4
    sequencing_type = "single_cell"
    annotation_status = "both"
    only_canonical = False
    tolerance = 1
    min_splice_site_usage = 0.01
    sample_size = None
    log_file = None
    verbose = False

output_file = run_atsemapper(Args())

See examples/atsemapper_demo.py for a full working example.

Parameters

Parameter Description Default
--input Directory of .bed/.junc files, or a .txt file listing paths, or a single file required
--annotation GTF/GFF3 genome annotation file required
--genome FASTA genome sequence file required
--output Output directory (auto-generated with timestamp if omitted) LeafletFA_ATSE_mapper_output_<DATE>
--db_path Path to existing gffutils SQLite annotation database {output}/annotation.db
--min-intron Minimum intron length (bp) 50
--max-intron Maximum intron length (bp) 500000
--min-reads Minimum total reads across all cells for a junction 100
--min-cells Minimum number of cells/samples with a junction 2
--batch-size Number of files to process per batch 10
--num-workers Number of parallel worker threads 4
--sequencing-type single_cell or bulk single_cell
--annotation-status Junction annotation filter: both, either, unanno_also both
--only-canonical Keep only canonical splice sites (GT-AG, GC-AG, AT-AC) False
--tolerance Tolerance (bp) for matching splice sites to annotated exons 1
--min-splice-site-usage Minimum proportion of reads a junction must contribute at a splice site 0.01
--sample-size Randomly sample N junction files (useful for testing) all files
--log-file Path to log file {output}/atsemapper_<timestamp>.log
--verbose Enable debug-level logging False

Output

ATSEmapper writes a compressed TSV to the output directory:

{output}/atse_events_<timestamp>.tsv.gz

Each row is an alternative splicing event with junction coordinates, gene annotation, and splice site usage.

Example

atsemapper \
  --input examples/junctions/ \
  --annotation /path/to/gencode.v38.annotation.gtf \
  --genome /path/to/GRCh38.primary_assembly.genome.fa \
  --sequencing-type bulk \
  --min-reads 5 \
  --only-canonical

ATSEviz

ATSEviz is a Python plotting library for visualizing splicing events and isoforms. It is designed for use in Jupyter notebooks — there is no CLI.

See examples/visualization_examples/visualize_atses.ipynb for a full walkthrough.

Features

  • Sashimi-style splice junction plots with usage-ratio coloring
  • Isoform plots with exon/CDS/intron structure
  • Intron compression for compact visualization
  • PDF export

Usage

import gffutils
from atsemapper.atseviz.main import (
    plot_exons_and_junctions,
    plot_isoforms,
    fetch_transcripts_and_annotations,
    fetch_transcripts_for_gene,
)

db = gffutils.FeatureDB("path/to/annotation.db", keep_order=True)

transcripts = fetch_transcripts_for_gene(db, "Ptbp1")
transcript_data = fetch_transcripts_and_annotations(db, transcripts)
region_start, region_end = determine_region_boundaries_from_transcripts(transcript_data)

plot_isoforms(db, transcript_data, region_start, region_end, transcript_order=transcripts)

Citing ATSEmapper

If you use ATSEmapper in your research, please cite:

Isaev et al. (2025). LeafletFA: A comprehensive framework for alternative splicing analysis
from single cell RNA-seq data. Journal Name. DOI: 10.xxxx/xxxxx

Contributing

Contributions are welcome. Please open a Pull Request:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/my-feature)
  3. Commit your changes
  4. Push and open a Pull Request

License

MIT License — see the LICENSE file for details.

Contact

Questions or bug reports: open a GitHub issue or email karin.isaev@gmail.com.

About

LeafletFA modules for mapping alternative splicing events from splice junction files and visualization.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages