Skip to content

hypatia: detect brittle regex matching of doc headings in inline-python workflow scripts (from #333 cohort, pattern 1) #360

@hyperpolymath

Description

@hyperpolymath

Detector spec (from hypatia#333 Pattern 1)

Pattern 1 — Brittle regex matching of documentation headings inside inline-python workflow scripts

Severity: high (estate-wide silent gate failure)

Detection (Hypatia.Rules.WorkflowAudit):

  • Parse run: | blocks in workflow YAML.
  • Inside Python python3 << 'PYEOF' … PYEOF blocks, find re.search(r'...') and re.match(r'...') calls whose pattern contains the literal token sequence that names a markdown section heading (e.g. [A-Z][a-z]+ [A-Z][a-z]+).
  • Flag any such regex that is not anchored to ^# or ^#{1,4}\s+ — the absence of heading anchoring means the regex will also match prose mentions of the same phrase, which silently swap which table the parser walks.

Worked example (this session):

  • standards/.github/workflows/governance-reusable.yml:179re.search(r'TypeScript [Ee]xemptions', line) was unanchored. Repos with a heading variant like ### TypeScript / JavaScript Exemptions (Approved) (affinescript) didn't match, so the parser fell through to a prose mention of "the TypeScript exemptions above" inside a different section and parsed the wrong table. Every PR in affinescript failed the governance / Language / package anti-pattern policy check for weeks before this session caught it.
  • Fix: standards#183 anchored the regex with ^#{1,4}\\s+.*TypeScript.*[Ee]xemption.

Remediation guidance to emit:

Anchor heading-detection regexes to ^#{1,4}\\s+ so prose mentions don't trigger table parsing. Add an inline comment naming the heading shape you intend to match.

Implementation pointers

  • Detection algorithm: Parse run: | blocks; inside python3 << 'PYEOF' heredocs, find re.search(r'...') / re.match(r'...') whose pattern matches a markdown-heading-like token sequence; flag if not anchored to ^# or ^#{1,4}\s+.
  • Real-world example: standards/.github/workflows/governance-reusable.yml:179re.search(r'TypeScript [Ee]xemptions', line) unanchored, broke governance / Language / package anti-pattern policy on affinescript for weeks.
  • Landed fix (reference): standards#183 (anchored regex to ^#{1,4}\\s+.*TypeScript.*[Ee]xemption).
  • Rule statement: Anchor heading-detection regexes to ^#{1,4}\\s+ so prose mentions don't trigger table parsing. Add an inline comment naming the heading shape you intend to match.

Acceptance

  • Rule encoded in hypatia (file path follows existing rule naming convention — lib/rules/<name>.ex if Elixir, or matching the repo's rule DSL)
  • Test fixture exercising the positive case + at least one negative case
  • Smoke test passes against the cited landed-fix repo

Source cohort: hypatia#333.

Metadata

Metadata

Assignees

No one assigned

    Labels

    cicdCI/CD pipeline, GitHub Actions, workflows, rulesets, releasesenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions