Skip to content

Sanitize bare angle brackets and quotes in markup stripping#34

Open
assisted-by-ai wants to merge 1 commit intoKicksecure:masterfrom
assisted-by-ai:claude/audit-strip-markup-security-I6mmm
Open

Sanitize bare angle brackets and quotes in markup stripping#34
assisted-by-ai wants to merge 1 commit intoKicksecure:masterfrom
assisted-by-ai:claude/audit-strip-markup-security-I6mmm

Conversation

@assisted-by-ai
Copy link
Copy Markdown

Summary

Enhanced the markup stripping and sanitization logic to handle additional security edge cases by sanitizing bare angle brackets (< and >) that are not part of valid HTML tags, and quote characters in malicious input patterns.

Key Changes

  • Expanded character sanitization in strip_markup_lib.py:

    • Modified the malicious input detection branch to sanitize quote characters (" and ') in addition to <, >, and & to prevent attribute injection attacks
    • Added a second sanitization pass after both strip attempts to catch bare < and > characters that survived the HTML parser (e.g., comparison operators like 2 < 3)
    • Restructured the conditional logic for clarity and to apply both sanitization strategies
  • Added comprehensive test coverage:

    • New _test_bare_angle_bracket_strings() test helper to verify sanitization of bare angle brackets in mathematical/comparison expressions
    • New _test_malicious_markup_quote_strings() test helper to verify quote character sanitization in malicious input patterns
    • Added corresponding test methods in both TestStripMarkup and TestSanitizeString test classes
    • Updated existing test expectations in sanitize_string tests to reflect the new quote sanitization behavior

Implementation Details

The fix addresses two distinct security concerns:

  1. Bare angle brackets: Characters like < and > in expressions such as 2 < 3 are not recognized as HTML tags by Python's HTMLParser, but could be misinterpreted if the output is later placed in an HTML context
  2. Quote injection: Malicious inputs with quotes (e.g., <<b>b "onmouseover="alert(1)<</b>/b>) are now fully sanitized to prevent attribute injection attacks

Both sanitization strategies work together: the malicious input branch handles quotes, while the final pass handles any remaining bare angle brackets.

https://claude.ai/code/session_01Wjn2KfitiA5iTADcLLfdbR

…utput

strip_markup had two security issues:
1. The malicious-input sanitization branch only replaced <, >, & with
   underscores but not " and ', which could allow HTML attribute
   injection if output is placed in an attribute context.
2. Bare < and > characters that HTMLParser doesn't treat as tags
   (e.g., "2 < 3") passed through both strip passes unchanged and
   were returned unsanitized, potentially dangerous in HTML contexts.

Now quotes are sanitized in the malicious branch, and remaining < / >
are always sanitized in the final output.

https://claude.ai/code/session_01Wjn2KfitiA5iTADcLLfdbR
@ArrayBolt3
Copy link
Copy Markdown
Contributor

Rejected, these changes are out-of-scope. strip-markup is intended to ensure that a string won't be interpreted as HTML markup in isolation. It isn't intended to prevent multiple adjacent strings from forming something weird, and it isn't intended to prepare text for use in an HTML attribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants