You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Release 1.7.0: Unified output format, streaming writes, summary file
Major changes:
- Unified format for .hors.tsv and .monomers.tsv (same 16 columns + parent_idx)
- Added .summary.tsv with per-array statistics and consensus for both HOR and monomer levels
- Streaming writes: constant memory usage regardless of input size
- External sorting with type priority (pred_array → flank → monomer → array → consensus)
New features:
- Consensus sequences (with IUPAC and quality) in summary file
- Edit distance metrics for base monomers
- Global indexing for base monomers with parent HOR linkage
Performance:
- Memory: radically reduced (no longer accumulates all results)
- Sorting: moved to external sort after streaming writes
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|`cut_sequence`| Anchor sequence used for splitting |
69
112
|`orientation`|`fwd` or `rev` (reverse complemented) |
70
-
|`sequence`| Actual DNA sequence |
113
+
|`sequence`| Actual DNA sequence (or `-` for pred_array/array rows) |
71
114
72
115
### Monomers TSV Columns (`.monomers.tsv`)
73
116
74
-
Contains base-level monomers after recursive decomposition of HORs. Each HOR is recursively decomposed until no further periodicity is detected (autocorrelation ≤ 0.5) or minimum length (5bp) is reached.
117
+
Contains base-level monomers after recursive HOR decomposition. **Unified format** matching `.hors.tsv` plus `parent_idx`.
118
+
119
+
Each HOR is recursively decomposed until:
120
+
- No further periodicity detected (autocorrelation ≤ 0.5)
121
+
- Minimum length (5bp) reached
122
+
123
+
**Row types** (in order):
124
+
1.`pred_array` - Array-level summary row
125
+
2.`base_monomer` - Base-level monomers from recursive decomposition
0 commit comments