Phase 14: STARsolo single-cell support (MVP + CellRanger matching)#90
Open
iandriver wants to merge 1 commit into
Open
Phase 14: STARsolo single-cell support (MVP + CellRanger matching)#90iandriver wants to merge 1 commit into
iandriver wants to merge 1 commit into
Conversation
Implements STARsolo Phases 14.1–14.4 (the 10x Chromium Gene-count MVP) plus
the CellRanger 4.x/5.x-matching flag set, all ported faithfully from STAR's
source and verified byte-identical against real STARsolo.
New `src/solo/`:
- mod.rs — SoloType/params plumbing, barcode-read input (SoloReadReader),
SoloContext, per-read processing, CellRanger4 adapter clip
- whitelist.rs — 2-bit barcode packing + sorted-whitelist load + read-stage
CB matching (exact/1MM/1MM_multi) + UMI checks (STAR
SoloReadBarcode_getCBandUMI.cpp)
- gene.rs — per-read gene assignment (--soloStrand), reuses
quant::overlapping_genes
- count.rs — UMI dedup (Exact/NoDedup/1MM_All/1MM_Directional/1MM_CR),
MultiGeneUMI(_CR) filtering, 1MM_multi_Nbase_pseudocounts CB
posterior, raw matrix.mtx/barcodes.tsv/features.tsv writer
Driver: new align_reads_solo loop in lib.rs (reads cDNA + barcode in lockstep,
aligns cDNA, quantifies per cell); solo params + validation in params/mod.rs.
CellRanger-matching flags (--clipAdapterType CellRanger4, --outFilterScoreMin,
--soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts, --soloUMIfiltering
MultiGeneUMI_CR, --soloUMIdedup 1MM_CR) produce a matrix byte-identical to real
STARsolo (3/3 deterministic) — verified via a Linux-container differential
harness (test/solo_cellranger_diff.py + Dockerfile.solodiff + solo_diff_docker.sh),
since STAR 2.7.11b reads 0 reads on Apple-Silicon macOS.
Also adds a CellRanger-vs-STARsolo-vs-rustar runtime/stats benchmark scaffold
(test/solo_bench.py + Dockerfile.bench).
Tests: 479 lib + 11 integration (incl. test_starsolo_cellranger_style_matrix),
0 clippy warnings. Docs in docs-old/phase14_starsolo.md + ROADMAP.md.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
Implements STARsolo Phases 14.1–14.4 (the 10x Chromium Gene-count MVP) plus the
CellRanger 4.x/5.x-matching flag set, ported faithfully from STAR's source and
verified byte-identical against real STARsolo (3/3 deterministic) via a
Linux-container differential harness.
Changes
src/solo/(new):mod.rs—--soloType/params plumbing, barcode-read input (SoloReadReader),SoloContext, per-read processing, CellRanger4 adapter clipwhitelist.rs— 2-bit barcode packing, sorted-whitelist load, read-stage CBmatching (exact/1MM/1MM_multi) + UMI checks (STAR
SoloReadBarcode_getCBandUMI.cpp)gene.rs— per-read gene assignment (--soloStrand), reusesquant::overlapping_genescount.rs— UMI dedup (Exact/NoDedup/1MM_All/1MM_Directional/1MM_CR),MultiGeneUMI(_CR)filtering,1MM_multi_Nbase_pseudocountsCB posterior,raw
matrix.mtx/barcodes.tsv/features.tsvwritersrc/lib.rs: newalign_reads_sololoop (reads cDNA + barcode in lockstep,aligns cDNA, quantifies per cell);
src/params/mod.rs:--solo*params + validation--clipAdapterType CellRanger4,--outFilterScoreMin,--soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts,--soloUMIfiltering MultiGeneUMI_CR,--soloUMIdedup 1MM_CRtests/alignment_features.rs(+
test_starsolo_gene_matrix,+test_starsolo_cellranger_style_matrix),test/solo_cellranger_diff.py+Dockerfile.solodiff+solo_diff_docker.sh(live STAR diff),
test/solo_bench.py+Dockerfile.bench(CellRanger vs STARsolo vs rustar runtime/stats benchmark scaffold)
Notes for reviewers
cargo fmtclean.on Apple-Silicon macOS (a known STAR/macOS bug).
docs-old/phase14_starsolo.md+ROADMAP.md.cbMinPthreshold and1MM_Directionalgreedy variantare documented as approximations to revisit (default
1MM_Allpath is exact).🤖 Generated with Claude Code