MR for all Elan work done since July 2025 by elanfisher · Pull Request #16 · COMBINE-lab/seqproc

elanfisher · 2026-01-30T01:41:15Z

No description provided.

…ults per run.

…from the docs. Currently these only test length correctness and in my next push ill check content and some error handling paths. Lastly I added a benchmark so that we can compare runtime opts via criterion.

…nchor appears in the right place for sci-RNA-seq3 data. Added test which check error handling for anchor mismatches, exact geometry for no tollerence checks, and thresholded via hamming distance checks. Lastly added multi-thread consistency checks for 1v4 threads for 10x and sci-rna-seq3.

…ding of the inner workings before I make my next optimization commit.

… recent sucessful ANTISEQUENCE batch optimization.

…alignments.

…chmark

…h intervals.

- Add search_whitelist(interval, file, dist, max_pos?) function for whitelist-based barcode search - Implement mini-backtracking in interpreter for anchor-based parsing - Support position-bounded search with optional max_pos parameter - Add ExactBoundedMatch and HammingBoundedMatch for bounded search operations - Rename test files from .fgdl to .geom extension - Add comprehensive test cases for search_whitelist, search_anchor, and search_hamming - Add r_umi_bc_anchor test for UMI extraction before barcode anchor

- Add anchor_relative() function for searching anchor and extracting preceding elements relative to found position - Add search() function for forcing global search - Add search_whitelist() function for barcode whitelist search - Implement mini-backtracking in interpreter for UnboundedLen handling - Remove experimental edit_distance feature (marginal improvement, significant complexity) - Clean up code after removal of EditDistance/EditDistanceSearch

- Import fmt_expr for dynamic file path expressions - Implement per-sample file routing based on sample attribute - Create output directory automatically - Skip output validation when demux is enabled - Route unassigned reads to unassigned_{R1,R2}.fastq files Tested on SPLiT-seq 2018 dataset (100 reads): - 44 samples correctly demultiplexed - Barcode-to-sample mapping validated

- Comments start with # and extend to end of line - Comments can appear on their own line or after tokens - Added lexer tests for comment parsing

…tion - Update SearchWhitelist from tuple to struct for cleaner API - Add followed_by parameter for chained barcode+linker validation - Implement linker validation in interpreter after barcode match - Add search_whitelist_followed_by test with expected output - All existing tests continue to pass

The anchor_relative code was adding a SetOp to redirect seq2.* to seq2._r for subsequent geometry processing. However, read.set() modifies the underlying string buffer and adjusts all intersecting mappings, which corrupted the umi/bc3 mappings that were created at correct positions. The fix removes this SetOp since the label vector update (label.push(_r)) at the end of each iteration already handles redirection for subsequent pieces without modifying the shared string buffer. This fixes the anchor_relative extraction bug where UMI and BC3 were being extracted from wrong positions (after-anchor region instead of before-anchor). Results: - Before: 110K reads matched - After: 361K reads matched (matching splitcode)

…ch, search_whitelist

…ntation New Features: - anchor_relative(): Search for anchor from current position and extract preceding elements relative to anchor position. Handles variable-length regions (indels). - --unassigned1/--unassigned2 CLI flags: Output reads that fail geometry matching to separate files (uses TryOp routing). - filter_within_dist(): Whitelist filtering with hamming distance tolerance. Documentation: - docs/anchor_relative.md: Full documentation for anchor_relative function - docs/unassigned_output.md: Documentation for --unassigned CLI flags - docs/GEOMETRY_EXTENSIONS.md: Updated with all new geometry functions Tests: - Added 3 tests for anchor_relative in compile_tests.rs - All existing tests pass Bug Fixes: - Fixed SetOp corruption in anchor_relative that was modifying shared string buffer - Proper 3-way split for anchor matching (before, anchor, after) Performance validated on SPLiT-seq 2018 500k subset: - 2.4x faster than splitcode (1.5s vs 1.7s) - 3.7x less memory than splitcode (36MB vs 141MB) - 100% barcode validity with whitelist filtering

- Updated chumsky from 0.10.1 to 0.12.0 - Updated lexer to use new map_with API while preserving comment support - Updated parser with macro-based function definitions - Preserved dev features: Search, Anchor, SearchWhitelist variants - Fixed lexer_tests to use chumsky 0.12 API (parse().into_output_errors()) - All tests passing

- Made --file2 optional in CLI args - Updated execution logic to dynamically construct input graphs for 1 or 2 files - Relaxed geometry parser to accept single-read definitions - Verified performance with new single-end vs paired-end benchmark (~2x speedup for SE) - Added regression tests for single-end processing

noahcape and others added 30 commits July 22, 2024 17:09

updating antisequence api

547b998

move all but filter and map

d8ce86a

clone wars v2

cf98811

directly add to graph

67cac9d

filter and reserve prefixed underscore

ac6de6a

port simpleaf methods

a7528f9

write to named pipes

8301cc8

add filter fn + clippy/fmt

1cb426b

compile all transformations into antisequence Ops

f7f6f3c

small changes and fmt

0bb34b2

better errors

6dacafc

allow nesting inside map

e4ac02a

changed bin argument documentation, made some types more descriptive

e7c3ba8

use new bounded match

c1f5eb1

update chumsky and error handling

11e624b

Started benchmark scaffold. Criterion is now saving off benchmark res…

296db34

…ults per run.

Committing my initial tests which test 10x and SciSeq3 FGDL examples …

5605205

…from the docs. Currently these only test length correctness and in my next push ill check content and some error handling paths. Lastly I added a benchmark so that we can compare runtime opts via criterion.

Added benchmark scripts to make benchmarking easy.

3fd5cb3

Added 1m benchmark and set it.

b0c745a

Working my way through the flow of the code to get a better understan…

aa4dfa1

…ding of the inner workings before I make my next optimization commit.

Committing entire dev state with the data and notebooks from the most…

ecba7e0

… recent sucessful ANTISEQUENCE batch optimization.

Added my notebook where I do my analysis.

79227d4

saving off my data before a new run.

6b53ed6

Committing the state of things after I ran the anchor tests to force …

1ed1da4

…alignments.

Add statistics collection to seqproc and support for SPLiT-seq v1 ben…

2d10a2c

…chmark

Add SPLiT-seq V1 geometry and update benchmark script (NO stats code)

e6182b9

Merge branch 'ejfisher-test-harness' into ejfisher-test-harness-stats

4d609a9

Fixed bug where hamming could not be added to linkers aka fixed lengt…

997362c

…h intervals.

Added hamming regression test.

565a2bf

elanfisher and others added 19 commits December 3, 2025 18:20

feat: Add demux module with DemuxConfig and sample mapping support

ec8c867

feat: Add CLI integration for demultiplexing options

7011e98

Add demux regression tests and documentation tutorial

d997d50

Add module-level documentation comments to lib.rs

4de6afe

Add # comment support to geometry language

c7ede66

- Comments start with # and extend to end of line - Comments can appear on their own line or after tokens - Added lexer tests for comment parsing

updated parser

016005d

docs: Add geometry extensions documentation for anchor_relative, sear…

f1fd41d

…ch, search_whitelist

fix: Correct CompiledFunction variant name from AnchorRelative to Anchor

20a436b

Complete chumsky 0.12 merge with search_whitelist_followed_by fix

48107c7

Fix geometry parsing, compilation, and processor logic

87432a6

Removed accumulated dev scrap from repo root.

9d414b8

elanfisher requested a review from noahcape January 30, 2026 01:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MR for all Elan work done since July 2025#16

MR for all Elan work done since July 2025#16
elanfisher wants to merge 49 commits intomainfrom
dev

elanfisher commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

elanfisher commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants