Skip to content

Phase 14: STARsolo single-cell support (MVP + CellRanger matching)#90

Open
iandriver wants to merge 1 commit into
scverse:mainfrom
iandriver:phase14-starsolo
Open

Phase 14: STARsolo single-cell support (MVP + CellRanger matching)#90
iandriver wants to merge 1 commit into
scverse:mainfrom
iandriver:phase14-starsolo

Conversation

@iandriver

Copy link
Copy Markdown

What & why

Implements STARsolo Phases 14.1–14.4 (the 10x Chromium Gene-count MVP) plus the
CellRanger 4.x/5.x-matching flag set, ported faithfully from STAR's source and
verified byte-identical against real STARsolo (3/3 deterministic) via a
Linux-container differential harness.

Changes

  • src/solo/ (new):
    • mod.rs--soloType/params plumbing, barcode-read input (SoloReadReader),
      SoloContext, per-read processing, CellRanger4 adapter clip
    • whitelist.rs — 2-bit barcode packing, sorted-whitelist load, read-stage CB
      matching (exact/1MM/1MM_multi) + UMI checks (STAR SoloReadBarcode_getCBandUMI.cpp)
    • gene.rs — per-read gene assignment (--soloStrand), reuses quant::overlapping_genes
    • count.rs — UMI dedup (Exact/NoDedup/1MM_All/1MM_Directional/1MM_CR),
      MultiGeneUMI(_CR) filtering, 1MM_multi_Nbase_pseudocounts CB posterior,
      raw matrix.mtx/barcodes.tsv/features.tsv writer
  • src/lib.rs: new align_reads_solo loop (reads cDNA + barcode in lockstep,
    aligns cDNA, quantifies per cell); src/params/mod.rs: --solo* params + validation
  • CellRanger-matching flags: --clipAdapterType CellRanger4,
    --outFilterScoreMin, --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts,
    --soloUMIfiltering MultiGeneUMI_CR, --soloUMIdedup 1MM_CR
  • Tests / harnesses: tests/alignment_features.rs
    (+test_starsolo_gene_matrix, +test_starsolo_cellranger_style_matrix),
    test/solo_cellranger_diff.py + Dockerfile.solodiff + solo_diff_docker.sh
    (live STAR diff), test/solo_bench.py + Dockerfile.bench
    (CellRanger vs STARsolo vs rustar runtime/stats benchmark scaffold)

Notes for reviewers

  • 479 lib + 11 integration tests, 0 clippy warnings, cargo fmt clean.
  • The live STAR diff runs in a Linux container because STAR 2.7.11b reads 0 reads
    on Apple-Silicon macOS (a known STAR/macOS bug).
  • Design notes and sub-phase tracking in docs-old/phase14_starsolo.md + ROADMAP.md.
  • The 1MM_multi posterior cbMinP threshold and 1MM_Directional greedy variant
    are documented as approximations to revisit (default 1MM_All path is exact).

🤖 Generated with Claude Code

Implements STARsolo Phases 14.1–14.4 (the 10x Chromium Gene-count MVP) plus
the CellRanger 4.x/5.x-matching flag set, all ported faithfully from STAR's
source and verified byte-identical against real STARsolo.

New `src/solo/`:
- mod.rs    — SoloType/params plumbing, barcode-read input (SoloReadReader),
              SoloContext, per-read processing, CellRanger4 adapter clip
- whitelist.rs — 2-bit barcode packing + sorted-whitelist load + read-stage
              CB matching (exact/1MM/1MM_multi) + UMI checks (STAR
              SoloReadBarcode_getCBandUMI.cpp)
- gene.rs   — per-read gene assignment (--soloStrand), reuses
              quant::overlapping_genes
- count.rs  — UMI dedup (Exact/NoDedup/1MM_All/1MM_Directional/1MM_CR),
              MultiGeneUMI(_CR) filtering, 1MM_multi_Nbase_pseudocounts CB
              posterior, raw matrix.mtx/barcodes.tsv/features.tsv writer

Driver: new align_reads_solo loop in lib.rs (reads cDNA + barcode in lockstep,
aligns cDNA, quantifies per cell); solo params + validation in params/mod.rs.

CellRanger-matching flags (--clipAdapterType CellRanger4, --outFilterScoreMin,
--soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts, --soloUMIfiltering
MultiGeneUMI_CR, --soloUMIdedup 1MM_CR) produce a matrix byte-identical to real
STARsolo (3/3 deterministic) — verified via a Linux-container differential
harness (test/solo_cellranger_diff.py + Dockerfile.solodiff + solo_diff_docker.sh),
since STAR 2.7.11b reads 0 reads on Apple-Silicon macOS.

Also adds a CellRanger-vs-STARsolo-vs-rustar runtime/stats benchmark scaffold
(test/solo_bench.py + Dockerfile.bench).

Tests: 479 lib + 11 integration (incl. test_starsolo_cellranger_style_matrix),
0 clippy warnings. Docs in docs-old/phase14_starsolo.md + ROADMAP.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant