Skip to content

Feature: expose storage-bucket pattern parts as structured metadata #264

@liquidsec

Description

@liquidsec

BBOT use could be optimized via YARA-based prefilter to skip the per-pattern Python regex loop on the ~95% of hosts that can't possibly be a storage bucket. To build the YARA rule currently in the prototype, we take each provider.regexes["STORAGE_BUCKET_HOSTNAME"] Python pattern and string-replace the (?P<name> / (?P<region> named-group syntax (YARA's regex flavor doesn't support named groups). It works but it's a string-munging hack on the regex source.

It would be cleaner if cloudcheck exposed the pattern parts as structured data, something like:

provider.bucket_patterns = [
    {
        "name_charset": r"[a-z0-9_][a-z0-9-\.]{1,61}[a-z0-9]",
        "suffix": r"\.s3\.amazonaws\.com",
    },
    {
        "name_charset": r"[a-z0-9_][a-z0-9-\.]{1,61}[a-z0-9]",
        "infix": r"\.s3-",
        "region_charset": r"[a-z]{2}-[a-z]+-\d+",
        "suffix": r"\.amazonaws\.com",
    },
    # ...
]

This would let consumers:

  • Build YARA / Aho-Corasick / Hyperscan prefilters from the suffix anchors without parsing regex source
  • Match each part independently for cheap suffix-based prefiltering
  • Generate non-Python regex flavors (PCRE, RE2, JS, Go) without flavor translation

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions