Skip to content

refactor(utils): char-safe string truncation across tool_notifier, rag, query_rewriter#30

Merged
chinkan merged 2 commits intoclaude/chat-history-rag-telegram-T4Jmofrom
copilot/sub-pr-29
Mar 25, 2026
Merged

refactor(utils): char-safe string truncation across tool_notifier, rag, query_rewriter#30
chinkan merged 2 commits intoclaude/chat-history-rag-telegram-T4Jmofrom
copilot/sub-pr-29

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 23, 2026

Byte-indexed slicing (&s[..N]) panics when the byte boundary falls inside a multi-byte UTF-8 character. Three call sites in the codebase used this pattern with fixed offsets (60, 200, 300), making them crash-prone on any Chinese/Japanese/emoji input.

Changes

  • src/utils/strings.rs (new) — shared truncate_chars(s, max_chars) helper that walks .chars() to find the safe byte boundary before slicing; named strings (not str) to avoid shadowing the Rust primitive
  • src/platform/tool_notifier.rs, src/memory/rag.rs, src/memory/query_rewriter.rs — replaced all three &s[..N] truncation sites with truncate_chars
  • rag.rs test — fixed assertion from snippet.len() == 303 to snippet.chars().count() == 303; the byte-length equality only holds for ASCII
  • Tests — regression tests co-located with each crash site reproducing the exact Chinese-character inputs that triggered the panic; max_chars=0 edge case documented in utils/strings
// before — panics when byte 60 is mid-char
format!("{}...", &args_json[..60])

// after
crate::utils::strings::truncate_chars(args_json, 60)

📱 Kick off Copilot coding agent tasks wherever you are with GitHub Mobile, available on iOS and Android.

…ng Rust primitive; fix rag test assertion to use chars().count(); add zero-max edge case test

Co-authored-by: chinkan <16433287+chinkan@users.noreply.github.com>
Agent-Logs-Url: https://github.com/chinkan/RustFox/sessions/df38c695-4336-437e-8324-5b6db9d2c378
Copilot AI changed the title [WIP] Fix tool notifier to use char-safe truncation for UTF-8 refactor(utils): char-safe string truncation across tool_notifier, rag, query_rewriter Mar 23, 2026
Copilot AI requested a review from chinkan March 23, 2026 15:42
@chinkan chinkan marked this pull request as ready for review March 24, 2026 01:22
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 24, 2026

Claude finished @chinkan's task in 1m 14s —— View job


Code Review

  • Gathering context
  • Reading changed files
  • Analyzing code quality and correctness
  • Posting review feedback

Overall: This is a well-scoped, correct fix. The implementation is sound and the test coverage is thorough. A few minor observations below.


src/utils/strings.rs

Implementation is correct. The byte-tracking approach avoids the overhead of char_indices() while remaining safe:

pub fn truncate_chars(s: &str, max_chars: usize) -> String {
    let mut byte_end = 0usize;
    for (char_count, ch) in s.chars().enumerate() {
        if char_count == max_chars {
            return format!("{}...", &s[..byte_end]);
        }
        byte_end += ch.len_utf8();
    }
    s.to_string()
}

Minor: Always allocates. The non-truncated path calls .to_string() unconditionally. For the call sites here (all formatting), this is fine. Just worth noting if this helper is ever put on a hot path.

Minor: max_chars=0 asymmetry. truncate_chars("", 0)"" but truncate_chars("a", 0)"...". The empty-string case is correct (no chars exceed the limit), but the test comment could be clearer — "loop body never entered" is correct for "", but for "a" the loop is entered and immediately returns. The current comment only describes the "" case so it reads a bit misleading in context.

Test coverage is solid — ASCII short/exact/truncated, empty, Chinese, 300-char boundary, and zero-max are all covered.


src/memory/rag.rs

Test assertion fix is correct. snippet.len() == 303 only holds for ASCII. snippet.chars().count() == 303 is semantically right (300 chars + 3 for "...").

Pre-existing issue (not this PR's fault): Line 18 uses query.trim().len() < 5 which is byte length, not char count. A 2-character Chinese query ("搜尋") is 6 bytes and would pass the filter, so practically this isn't a bug — but it's inconsistent with the char-safe approach adopted here. Fix this →


src/platform/tool_notifier.rs

The format!("\"{}\"", truncated) wrapping at line 33 correctly handles the case where truncated ends in "..." — it renders as "value..." which is reasonable UX.


Module rename (utils::strutils::strings)

Good call. utils::str shadows std::str (Rust's built-in string primitive module) which can cause confusing resolution in code that uses both. The rename is propagated consistently across all three call sites.


Summary: This PR correctly fixes a latent panic-on-multibyte-input bug with a clean, well-tested helper. No blocking issues. The one pre-existing byte-vs-char inconsistency in rag.rs:18 is optional to address.

@chinkan chinkan merged commit 4229788 into claude/chat-history-rag-telegram-T4Jmo Mar 25, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants