Open
Conversation
…from the docs. Currently these only test length correctness and in my next push ill check content and some error handling paths. Lastly I added a benchmark so that we can compare runtime opts via criterion.
…nchor appears in the right place for sci-RNA-seq3 data. Added test which check error handling for anchor mismatches, exact geometry for no tollerence checks, and thresholded via hamming distance checks. Lastly added multi-thread consistency checks for 1v4 threads for 10x and sci-rna-seq3.
…ding of the inner workings before I make my next optimization commit.
… recent sucessful ANTISEQUENCE batch optimization.
- Add search_whitelist(interval, file, dist, max_pos?) function for whitelist-based barcode search - Implement mini-backtracking in interpreter for anchor-based parsing - Support position-bounded search with optional max_pos parameter - Add ExactBoundedMatch and HammingBoundedMatch for bounded search operations - Rename test files from .fgdl to .geom extension - Add comprehensive test cases for search_whitelist, search_anchor, and search_hamming - Add r_umi_bc_anchor test for UMI extraction before barcode anchor
- Add anchor_relative() function for searching anchor and extracting preceding elements relative to found position - Add search() function for forcing global search - Add search_whitelist() function for barcode whitelist search - Implement mini-backtracking in interpreter for UnboundedLen handling - Remove experimental edit_distance feature (marginal improvement, significant complexity) - Clean up code after removal of EditDistance/EditDistanceSearch
- Import fmt_expr for dynamic file path expressions
- Implement per-sample file routing based on sample attribute
- Create output directory automatically
- Skip output validation when demux is enabled
- Route unassigned reads to unassigned_{R1,R2}.fastq files
Tested on SPLiT-seq 2018 dataset (100 reads):
- 44 samples correctly demultiplexed
- Barcode-to-sample mapping validated
- Comments start with # and extend to end of line - Comments can appear on their own line or after tokens - Added lexer tests for comment parsing
…tion - Update SearchWhitelist from tuple to struct for cleaner API - Add followed_by parameter for chained barcode+linker validation - Implement linker validation in interpreter after barcode match - Add search_whitelist_followed_by test with expected output - All existing tests continue to pass
The anchor_relative code was adding a SetOp to redirect seq2.* to seq2._r for subsequent geometry processing. However, read.set() modifies the underlying string buffer and adjusts all intersecting mappings, which corrupted the umi/bc3 mappings that were created at correct positions. The fix removes this SetOp since the label vector update (label.push(_r)) at the end of each iteration already handles redirection for subsequent pieces without modifying the shared string buffer. This fixes the anchor_relative extraction bug where UMI and BC3 were being extracted from wrong positions (after-anchor region instead of before-anchor). Results: - Before: 110K reads matched - After: 361K reads matched (matching splitcode)
…ch, search_whitelist
…ntation New Features: - anchor_relative(): Search for anchor from current position and extract preceding elements relative to anchor position. Handles variable-length regions (indels). - --unassigned1/--unassigned2 CLI flags: Output reads that fail geometry matching to separate files (uses TryOp routing). - filter_within_dist(): Whitelist filtering with hamming distance tolerance. Documentation: - docs/anchor_relative.md: Full documentation for anchor_relative function - docs/unassigned_output.md: Documentation for --unassigned CLI flags - docs/GEOMETRY_EXTENSIONS.md: Updated with all new geometry functions Tests: - Added 3 tests for anchor_relative in compile_tests.rs - All existing tests pass Bug Fixes: - Fixed SetOp corruption in anchor_relative that was modifying shared string buffer - Proper 3-way split for anchor matching (before, anchor, after) Performance validated on SPLiT-seq 2018 500k subset: - 2.4x faster than splitcode (1.5s vs 1.7s) - 3.7x less memory than splitcode (36MB vs 141MB) - 100% barcode validity with whitelist filtering
- Updated chumsky from 0.10.1 to 0.12.0 - Updated lexer to use new map_with API while preserving comment support - Updated parser with macro-based function definitions - Preserved dev features: Search, Anchor, SearchWhitelist variants - Fixed lexer_tests to use chumsky 0.12 API (parse().into_output_errors()) - All tests passing
- Made --file2 optional in CLI args - Updated execution logic to dynamically construct input graphs for 1 or 2 files - Relaxed geometry parser to accept single-read definitions - Verified performance with new single-end vs paired-end benchmark (~2x speedup for SE) - Added regression tests for single-end processing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.