Part of Santh - open source Rust security and infrastructure tooling. Follow @SanthProject on X.
A SIMD-accelerated multi-pattern pre-filtering engine with zero-allocation streaming iteration. Scans byte haystacks for up to 16 patterns simultaneously using the widest SIMD instructions available.
simdsieve is a first-pass candidate filter for substring search:
- Prefix Extraction: Takes the first 1-4 bytes of each pattern
- SIMD Filtering: Scans haystack in 32-128 byte blocks, finding positions where prefixes match using vectorized comparison
- Verification: Confirms full pattern matches before yielding
- Zero False Positives: Every yielded offset is a verified match start
Use this when you need to quickly filter large data streams for multiple patterns before applying more expensive analysis.
| Backend | Instruction Set | Block Size | Throughput* | Registers |
|---|---|---|---|---|
| AVX-512 | AVX-512F + AVX-512BW | 128 bytes | >50 GB/s | 512-bit |
| AVX2 | AVX2 | 64 bytes | >25 GB/s | 256-bit |
| NEON | AArch64 NEON | 64 bytes | >15 GB/s | 128-bit |
| Scalar | Portable Rust | 64 bytes | >2 GB/s | 64-bit |
* Single-byte pattern on modern x86_64. Multi-byte patterns scale linearly with prefix length.
The optimal backend is auto-selected at construction time:
- AVX-512: If
avx512fandavx512bwCPU features are detected - AVX2: If
avx2is detected - NEON: On
aarch64targets (guaranteed by the architecture) - Scalar: Portable fallback for all other platforms
AVX-512, AVX2, and NEON use "dual-pump" processing where each logical block is split into two halves processed together. This hides load latency by interleaving independent operations across both load ports.
- 16 patterns maximum: Unrolled register layout saturates execution ports
- 4-byte prefix: First 4 bytes used for SIMD filtering; full pattern for verification
- ASCII case-insensitive: Only
a-zfolded; non-ASCII compared verbatim
Add to Cargo.toml:
[dependencies]
simdsieve = "0.1"Basic exact matching:
use simdsieve::SimdSieve;
let haystack = b"The quick brown fox jumps over the lazy dog";
let patterns = &[b"fox"[..], b"dog"[..]];
let sieve = SimdSieve::new(haystack, patterns).unwrap();
let matches: Vec<usize> = sieve.collect();
assert_eq!(matches, vec![16, 40]); // Positions of "fox" and "dog"Case-insensitive matching:
use simdsieve::SimdSieve;
let haystack = b"Hello World HELLO";
let patterns = &[b"hello"[..]];
let sieve = SimdSieve::new_case_insensitive(haystack, patterns).unwrap();
let matches: Vec<usize> = sieve.collect();
assert_eq!(matches, vec![0, 12]); // Matches "Hello" and "HELLO"Density estimation (without full verification):
use simdsieve::SimdSieve;
let haystack = b"aaaaaa";
let patterns = &[b"a"[..]];
let count = SimdSieve::estimate_match_count(haystack, patterns, false);
assert_eq!(count, 6); // One prefix hit at each positionSimdSieve implements Iterator<Item = usize> and FusedIterator:
- Yields match offsets in ascending order
- After returning
None, all subsequent calls returnNone - Zero heap allocation during iteration
- Thread-safe:
SendandSync(immutable iterator)
Construction can fail with SimdSieveError:
EmptyPatternSet: No patterns providedPatternLimitExceeded(usize): More than 16 patterns provided
use simdsieve::{SimdSieve, SimdSieveError};
let result = SimdSieve::new(b"haystack", &[]);
assert!(matches!(result, Err(SimdSieveError::EmptyPatternSet)));| Platform | Backend | Notes |
|---|---|---|
| x86_64 | AVX-512, AVX2, Scalar | Runtime feature detection |
| aarch64 | NEON, Scalar | NEON guaranteed on AArch64 |
| wasm32 | Scalar | No SIMD backend yet |
| other | Scalar | Portable fallback |
simdsieve contains a small amount of unsafe code confined to
platform-specific backend modules. Every unsafe block has been audited for
soundness:
- SIMD loads use unaligned load intrinsics (
loadu), safe for any valid pointer alignment. - Block bounds are verified before each SIMD call; multi-byte prefixes
require
block_len >= block_size + max_prefix_len - 1trailing bytes. - Target features are detected at construction;
#[target_feature]gates prevent executing unsupported instructions. - No uninitialized memory is read; SIMD struct arrays are zero-initialized.
unsafe_op_in_unsafe_fnis forbidden at the crate level.
✅ Good for:
- Filtering large streams (>1MB) for multiple patterns
- Finding candidate positions for regex engines
- Log analysis with multiple fixed-string patterns
- Network packet inspection with signature matching
❌ Not ideal for:
- Single small haystacks (<1KB)
- Patterns longer than 4 bytes that differ only after position 4
- Pattern sets that share a long common prefix against uniform haystacks (this shifts the bottleneck from SIMD filtering to byte-by-byte verification, resulting in O(n × pattern_count) work)
- When you need capture groups or regex features
The engine automatically issues prefetch hints 512 bytes ahead. This value (8 cache lines) balances:
- Enough lookahead to hide memory latency on modern CPUs
- Not so much that we evict useful data from cache
For best performance:
- Use patterns with distinct first bytes when possible
- Place more frequent patterns earlier in the pattern list
- For case-insensitive mode, use lowercase in patterns
Run benchmarks with:
cargo benchExample throughput on AMD EPYC 9R14 (AVX-512):
| Haystack | Patterns | Throughput |
|---|---|---|
| 1 MB | 1 | ~55 GB/s |
| 1 MB | 4 | ~45 GB/s |
| 1 MB | 16 | ~30 GB/s |
Licensed under the MIT License. See LICENSE for details.
Contributions welcome! Areas of interest:
- WebAssembly SIMD backend
- Additional benchmarks and fuzz targets
- New verification strategies
Please ensure:
cargo testpassescargo clippy -- -D warningsis cleancargo fmt --checkpasses- New code has module-level and item-level documentation