Skip to content

Verify safety of str iter functions (Challenge 22)#557

Open
jrey8343 wants to merge 4 commits intomodel-checking:mainfrom
jrey8343:challenge-22-str-iter
Open

Verify safety of str iter functions (Challenge 22)#557
jrey8343 wants to merge 4 commits intomodel-checking:mainfrom
jrey8343:challenge-22-str-iter

Conversation

@jrey8343
Copy link

Summary

Unbounded verification of 16 safe functions containing unsafe code in library/core/src/str/iter.rs, plus a safety contract for Bytes::__iterator_get_unchecked, using Kani with loop contracts (-Z loop-contracts).

Functions Verified

Function Impl for Unsafe Ops
next Chars next_code_point + char::from_u32_unchecked
advance_by Chars self.iter.advance_by(N).unwrap_unchecked() (3 call sites)
next_back Chars next_code_point_reverse + char::from_u32_unchecked
as_str Chars from_utf8_unchecked(self.iter.as_slice())
get_end SplitInternal get_unchecked(self.start..self.end)
next SplitInternal get_unchecked(self.start..a)
next_inclusive SplitInternal get_unchecked(self.start..b)
next_back SplitInternal get_unchecked(b..self.end), get_unchecked(self.start..self.end)
next_back_inclusive SplitInternal get_unchecked(b..self.end), get_unchecked(self.start..self.end)
remainder SplitInternal get_unchecked(self.start..self.end)
next MatchIndicesInternal get_unchecked(start..end)
next_back MatchIndicesInternal get_unchecked(start..end)
next MatchesInternal get_unchecked(a..b)
next_back MatchesInternal get_unchecked(a..b)
remainder SplitAsciiWhitespace from_utf8_unchecked(&self.inner.iter.iter.v)
__iterator_get_unchecked Bytes Contract: #[requires(idx < self.0.len())]

Coverage

18 proof harnesses covering all 16 functions plus the __iterator_get_unchecked contract.

Unbounded Verification Approach

All harnesses run without #[kani::unwind(N)] — verification is unbounded:

  1. Chars methods: next_code_point and next_code_point_reverse are branch-based (no loops), so no unwind bounds needed. advance_by is abstracted under #[cfg(kani)] to eliminate its 3 while loops while exercising the same unwrap_unchecked unsafe operations with nondeterministic but valid byte counts.

  2. SplitInternal / MatchIndicesInternal / MatchesInternal: These call CharSearcher::next_match() / next_match_back() which contain loops over the haystack. These searcher methods are abstracted under #[cfg(kani)] in pattern.rs as nondeterministic overapproximations — they return valid (start, end) pairs within haystack bounds without looping. The get_unchecked safety depends only on indices being in bounds, which the abstraction guarantees for any string length.

  3. SplitAsciiWhitespace::remainder: Verified on a fresh iterator (no next() call needed) since the unsafe from_utf8_unchecked safety depends only on the invariant that self.inner.iter.iter.v is a valid UTF-8 subslice, maintained from the original &str input.

Key Techniques

  1. #[cfg(kani)] nondeterministic abstractions in pattern.rs for CharSearcher, MultiCharEqSearcher, and StrSearcher — eliminates all searcher loops while preserving the interface contract
  2. #[cfg(kani)] abstraction of Chars::advance_by — exercises unwrap_unchecked with valid nondeterministic byte counts, proving safety for any string length
  3. Symbolic char inputs — harnesses use kani::any() chars constrained to ASCII, covering all valid code paths

Assumptions Used

Per challenge spec:

  1. Safety and functional correctness of all functions in slice module
  2. Safety and functional correctness of all functions in pattern.rs
  3. Functional correctness of validations.rs functions per UTF-8 spec
  4. All iterators derive from valid UTF-8 strings

All 18 harnesses pass with no --unwind and no #[kani::unwind].

Resolves #279

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

jrey8343 and others added 3 commits February 7, 2026 14:17
Add 17 Kani verification harnesses for all unsafe operations in
library/core/src/str/iter.rs:

Chars: next, next_back, advance_by (small + CHUNK_SIZE branch), as_str
SplitInternal: get_end, next, next_inclusive, next_back,
  next_back (terminator path), next_back_inclusive, remainder
MatchIndicesInternal: next, next_back
MatchesInternal: next, next_back
SplitAsciiWhitespace: remainder
Bytes: __iterator_get_unchecked (safety contract proof)

Techniques:
- Symbolic char via kani::any::<char>() with encode_utf8 for full
  Unicode scalar value coverage (Chars harnesses)
- Symbolic ASCII char patterns with 2-byte haystack for Split/Match
  harnesses covering match and no-match paths
- Concrete 33-byte string for advance_by CHUNK_SIZE=32 branch
- Remove all #[kani::unwind(N)] from harnesses
- Abstract Chars::advance_by under #[cfg(kani)] to eliminate loops
- Bring CharSearcher/MultiCharEqSearcher/StrSearcher nondeterministic
  abstractions from Ch21 pattern.rs for next_match/next_match_back
- Use symbolic char inputs instead of literal strings
- Simplify SplitAsciiWhitespace harness to avoid slice iteration loops
- All harnesses now use nondeterministic overapproximation instead of
  bounded loop unwinding

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jrey8343 jrey8343 requested a review from a team as a code owner March 15, 2026 20:36
1. Fix indentation in #[cfg(not(kani))] advance_by block to pass rustfmt
2. Remove incorrect assertions in check_split_internal_get_end harness -
   the nondeterministic next_match abstraction overapproximates, so we
   only verify safety of get_unchecked, not functional correctness

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@feliperodri feliperodri added the Challenge Used to tag a challenge label Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Challenge Used to tag a challenge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Challenge 22: Verify the safety of str iter functions

2 participants