BBOT use could be optimized via YARA-based prefilter to skip the per-pattern Python regex loop on the ~95% of hosts that can't possibly be a storage bucket. To build the YARA rule currently in the prototype, we take each provider.regexes["STORAGE_BUCKET_HOSTNAME"] Python pattern and string-replace the (?P<name> / (?P<region> named-group syntax (YARA's regex flavor doesn't support named groups). It works but it's a string-munging hack on the regex source.
It would be cleaner if cloudcheck exposed the pattern parts as structured data, something like:
provider.bucket_patterns = [
{
"name_charset": r"[a-z0-9_][a-z0-9-\.]{1,61}[a-z0-9]",
"suffix": r"\.s3\.amazonaws\.com",
},
{
"name_charset": r"[a-z0-9_][a-z0-9-\.]{1,61}[a-z0-9]",
"infix": r"\.s3-",
"region_charset": r"[a-z]{2}-[a-z]+-\d+",
"suffix": r"\.amazonaws\.com",
},
# ...
]
This would let consumers:
- Build YARA / Aho-Corasick / Hyperscan prefilters from the suffix anchors without parsing regex source
- Match each part independently for cheap suffix-based prefiltering
- Generate non-Python regex flavors (PCRE, RE2, JS, Go) without flavor translation
BBOT use could be optimized via YARA-based prefilter to skip the per-pattern Python regex loop on the ~95% of hosts that can't possibly be a storage bucket. To build the YARA rule currently in the prototype, we take each
provider.regexes["STORAGE_BUCKET_HOSTNAME"]Python pattern and string-replace the(?P<name>/(?P<region>named-group syntax (YARA's regex flavor doesn't support named groups). It works but it's a string-munging hack on the regex source.It would be cleaner if cloudcheck exposed the pattern parts as structured data, something like:
This would let consumers: