CigarModifier: emit a valid CIGAR when matching-base scan outruns the trailing M#402
Open
behrj wants to merge 1 commit intoAstraZeneca-NGS:masterfrom
Open
Conversation
romaingroux
approved these changes
Apr 28, 2026
behrj
commented
Apr 28, 2026
Author
There was a problem hiding this comment.
This is a defensive change that prevents similar issues to have an operational impact, but it has the risk of hiding issues in future. Happy to drop this from the PR. Let me know what you think.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The merge functions
threeDeletions(),twoDeletionsInsertionToComplex(), andthreeIndels()inCigarModifiercount matching bases (rn) immediately afterthe indel cluster. When a read aligns inside a homopolymer that extends past the
cluster,
rnkeeps growing past the interiorMbases (mid) and starts eatinginto the trailing
M(rm). Thetslen <= 0branch then doesrm += tslen, which can drivermnegative — and the function emits a CIGARfragment with a non-positive operator length.
For the production read
duplex_GTGTC-GTCAC_140434267-140434623_817012 @ 7:140434471 (hg19)
input CIGAR: 104M1I2M2D1M1D5M1D2M9D27M
threeDeletionsproduces the intermediateRDOFF M dlen D rm M = 11M 4D -1M,a subsequent
combineToCloseToOnecollapses1I 11M 4Dinto15D 12I, andthe final cigar is
104M15D12I-1M9D27M. That string is then fed tocaptureMisSoftly3Mismatches, which dereferencesquerySequence.charAt(143)on a 142-nt read and throws
StringIndexOutOfBoundsException: Index 143 out of bounds for length 142
at VariationUtils.isHasAndNotEquals(...:498)
at CigarModifier.captureMisSoftly3Mismatches(...:376)
After 12 such failures
VarDictLauncheraborts the whole region with"Critical exception occurs on region 7:140434293-140434869, program will be stopped."Fix
(tslen <= 0, dlen > 0, rm < 0), absorbthe negative
rmback intoRDOFFand emitRDOFF M + dlen D(mirroringthe existing
rm < 0branch underdlen < 0inthreeIndels). This keepsthe merged CIGAR valid while preserving the total reference and read
consumption of the original complex.
IllegalStateExceptionguards at every newRDOFF = RDOFF + rmsite, so any future regression that violatesRDOFF > 0fails loudly via the existingprintExceptionAndContinuepath instead of silently emitting a malformed CIGAR.
index2inVariationUtils.isHasAndEquals/isHasAndNotEquals. The previouscommented-out
if (index2 < 0) ...line suggests this was alreadyconsidered; we now return
falsefor any out-of-range index instead ofthrowing.
For the read above, the new output is
104M14D11I9D27M(reference span 154,read span 142 — both match the input).
Tests
CigarModifierTest.modifyCigarOnHomopolymerThreeDeletionsDoesNotProduceMalformedCigaruses the production read verbatim and asserts that
modifyCigar()returns a CIGARhtsjdk.samtools.TextCigarCodec.decodeacceptsand that all operator lengths are positive. Without the fix it fails with the
exact production exception; with the fix it passes alongside the existing 21
CigarModifierTestcases.