Skip to content

Improve stdin/stdout encoding configuration for robust text handling#35

Open
assisted-by-ai wants to merge 1 commit intoKicksecure:masterfrom
assisted-by-ai:claude/fix-stdisplay-unicode-bypass-fjqN1
Open

Improve stdin/stdout encoding configuration for robust text handling#35
assisted-by-ai wants to merge 1 commit intoKicksecure:masterfrom
assisted-by-ai:claude/fix-stdisplay-unicode-bypass-fjqN1

Conversation

@assisted-by-ai
Copy link
Copy Markdown

Summary

Enhanced the stdin and stdout stream configuration to explicitly specify encoding, error handling, and newline behavior, improving robustness when processing untrusted input and ensuring consistent output formatting.

Key Changes

  • stdin reconfiguration: Updated to explicitly set UTF-8 encoding with "replace" error handling strategy and Unix-style newlines, replacing the previous "ignore" error handling approach
  • stdout reconfiguration: Added new explicit configuration to use ASCII encoding with "replace" error handling and Unix-style newlines before writing sanitized output

Implementation Details

These changes ensure:

  • Invalid UTF-8 sequences in input are replaced with the Unicode replacement character rather than silently ignored
  • Output is constrained to ASCII with invalid characters replaced, preventing encoding errors when writing sanitized strings
  • Consistent newline handling across platforms (Unix-style \n)
  • More predictable behavior when processing untrusted input with potential encoding issues

https://claude.ai/code/session_01CKA1A2XSprq49uNtGiMBxW

…epth

sanitize_string.py was missing two defense-in-depth measures that all
st* tools (stcat, stcatn, stecho, stprint, stsponge, sttee) already had:

1. stdin used errors="ignore" without explicit encoding, silently
   dropping invalid bytes. Changed to encoding="utf-8", errors="replace"
   to match st* tools — invalid bytes become U+FFFD then get replaced
   with '_' by stdisplay, making tampering visible.

2. stdout was never reconfigured to ASCII encoding. Added
   encoding="ascii", errors="replace" to match st* tools — provides a
   safety net if a non-ASCII character ever survives the sanitization
   pipeline.

https://claude.ai/code/session_01CKA1A2XSprq49uNtGiMBxW
@ArrayBolt3
Copy link
Copy Markdown
Contributor

Researched, decided the changes were good, accepted.in ArrayBolt3@47c6252.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants