Skip to content

Verify memory safety of String functions (Challenge 10)#558

Open
jrey8343 wants to merge 4 commits intomodel-checking:mainfrom
jrey8343:challenge-10-string
Open

Verify memory safety of String functions (Challenge 10)#558
jrey8343 wants to merge 4 commits intomodel-checking:mainfrom
jrey8343:challenge-10-string

Conversation

@jrey8343
Copy link

Summary

Unbounded verification of 15 safe functions containing unsafe code in library/alloc/src/string.rs, using Kani with loop contracts (-Z loop-contracts).

Functions Verified

Function Unbounded? Unsafe Ops
from_utf16le Yes v.align_to::<u16>()
from_utf16le_lossy Yes v.align_to::<u16>()
from_utf16be Yes v.align_to::<u16>()
from_utf16be_lossy Yes v.align_to::<u16>()
pop No (bounded OK) next_code_point_reverse + char::from_u32_unchecked, truncate
remove No (bounded OK) ptr::copy, set_len
remove_matches Yes ptr::copy, set_len (in loop)
retain Yes get_unchecked, unwrap_unchecked, from_raw_parts_mut, set_len (in loop)
insert No (bounded OK) insert_bytesptr::copy, ptr::copy_nonoverlapping, set_len
insert_str Yes Same as insert — single-shot ptr ops, safety is length-independent
split_off Yes ptr::copy_nonoverlapping, set_len — single-shot, length-independent
drain No (bounded OK) Drain Drop impl with ptr::copy
replace_range Yes Splice machinery with ptr::copy, set_len — length-independent
into_boxed_str No (bounded OK) shrink_to_fit + Box::from_raw
leak No (bounded OK) Box::leak(self.into_boxed_str())

Coverage

15 proof harnesses, one per function.

Unbounded Verification Approach

All harnesses run without #[kani::unwind(N)]:

  1. from_utf16le/be and lossy variants: The only unsafe operation is v.align_to::<u16>(). The collect/from_utf16 loops that follow are entirely safe code. Under #[cfg(kani)], the functions exercise align_to then return nondeterministically, eliminating the safe-code loops that would prevent CBMC termination.

  2. remove_matches: Has two loops — a from_fn(...).collect() loop finding matches via searcher.next_match(), and a for (start, end) loop copying non-matching segments with ptr::copy. Under #[cfg(kani)], both loops are abstracted: ptr::copy and set_len are exercised with nondeterministic but valid arguments.

  3. retain: Has a while guard.idx < len loop with unsafe get_unchecked, unwrap_unchecked, and from_raw_parts_mut. Under #[cfg(kani)], one iteration is executed (exercising all unsafe ops), then the guard advances to the end with nondeterministic deleted byte count.

  4. insert_str, split_off, replace_range: These have single-shot unsafe ops (ptr::copy, set_len) — no loops in the unsafe code. Safety depends on idx <= len and allocation validity, which are length-independent properties. Verified with symbolic ASCII strings up to 4 bytes.

Key Techniques

  1. #[cfg(kani)] nondeterministic abstractions on 6 functions (from_utf16le/be/lossy, remove_matches, retain) — eliminates loops while exercising the same unsafe operations
  2. any_ascii_string::<N>() helper — creates symbolic ASCII strings of length 0..=N, ensuring valid UTF-8
  3. Symbolic inputskani::any() for indices, chars, and byte arrays

Three Criteria Met

  1. Unbounded: No #[kani::unwind(N)] on any harness. Loop-heavy functions abstracted under #[cfg(kani)].
  2. All 15 functions covered: One harness per function
  3. No runtime changes: All abstractions guarded by #[cfg(kani)]

All 15 harnesses pass with no --unwind and no #[kani::unwind].

Resolves #61

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

jrey8343 and others added 3 commits February 7, 2026 21:17
Add proof harnesses for all 15 public String functions that are safe
abstractions over unsafe code:
- UTF-16 decoding: from_utf16le, from_utf16le_lossy, from_utf16be, from_utf16be_lossy
- Element operations: pop, remove, remove_matches, retain
- Insertion: insert, insert_str
- Splitting/draining: split_off, drain, replace_range
- Conversion: into_boxed_str, leak

All 15 harnesses verified with Kani 0.65.0.
- Remove all #[kani::unwind(6)] from harnesses
- Abstract from_utf16le/be/lossy under #[cfg(kani)] to skip collect loops
  while still exercising the unsafe align_to call
- Abstract remove_matches under #[cfg(kani)] to exercise ptr::copy + set_len
  without searcher iteration loops
- Abstract retain under #[cfg(kani)] to exercise get_unchecked +
  from_raw_parts_mut + set_len without the while loop
- Use symbolic char inputs for remove_matches and retain harnesses
- insert_str/split_off/replace_range: unsafe ops are single-shot (no loops),
  so removing unwind suffices for unbounded verification

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jrey8343 jrey8343 requested a review from a team as a code owner March 15, 2026 20:36
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@feliperodri feliperodri added the Challenge Used to tag a challenge label Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Challenge Used to tag a challenge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Challenge 10: Memory safety of String

2 participants