Add string type support to SMT back-end#8838
Add string type support to SMT back-end#8838tautschnig wants to merge 4 commits intodiffblue:developfrom
string type support to SMT back-end#8838Conversation
We can safely thread through string operations to the back-end as SMT solvers now have support for the theory of strings.
There was a problem hiding this comment.
Pull request overview
This pull request adds basic string type support to the SMT2 backend solver, enabling SMT solvers to handle string operations using the SMT-LIB theory of strings.
Changes:
- Added support for converting string types to SMT2 "String" type
- Implemented string constant conversion with escaping for double quotes and backslashes
- Added special handling for Str.Concat function application to convert to SMT-LIB str.++ operator
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/solvers/smt2/smt2_conv.cpp
Outdated
| if(fn_name == "Str.Concat" && | ||
| function_application_expr.arguments().size() == 2) | ||
| { | ||
| out << "(str.++ "; | ||
| convert_expr(function_application_expr.arguments()[0]); | ||
| out << ' '; | ||
| convert_expr(function_application_expr.arguments()[1]); |
There was a problem hiding this comment.
The str.++ (string concatenation) operation in SMT-LIB 2 can take any number of arguments (including 0, 1, or more than 2), but this implementation only handles exactly 2 arguments. When there are != 2 arguments, it falls through to the else case which will output "Str.Concat" instead of "str.++". Consider removing the size check or handling variable numbers of arguments to ensure all Str.Concat calls are properly converted to str.++.
| if(fn_name == "Str.Concat" && | |
| function_application_expr.arguments().size() == 2) | |
| { | |
| out << "(str.++ "; | |
| convert_expr(function_application_expr.arguments()[0]); | |
| out << ' '; | |
| convert_expr(function_application_expr.arguments()[1]); | |
| if(fn_name == "Str.Concat") | |
| { | |
| out << "(str.++"; | |
| for(const auto &op : function_application_expr.arguments()) | |
| { | |
| out << ' '; | |
| convert_expr(op); | |
| } |
| UNEXPECTEDCASE("unsuppored range type"); | ||
| out << "(_ BitVec " << address_bits(size) << ")"; | ||
| } | ||
| else if(type.id()==ID_string) |
There was a problem hiding this comment.
Inconsistent spacing around the == operator. The codebase convention in this file is to use no spaces (e.g., type.id()==ID_string on line 6025, expr_type.id()==ID_string on line 3782). This line has spaces around == which is inconsistent with the rest of the file.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #8838 +/- ##
===========================================
- Coverage 80.00% 79.99% -0.02%
===========================================
Files 1700 1700
Lines 188252 188349 +97
Branches 73 73
===========================================
+ Hits 150613 150668 +55
- Misses 37639 37681 +42 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…_index bounds_check_index crashes when processing arrays indexed by mathematical types (ID_integer, ID_natural) or arrays with no known size (nil size expression). This occurs because object_descriptor_exprt::build computes byte offsets via size_of_expr, which fails for unsized arrays, producing an ID_unknown offset. Subsequent calls to from_integer on the resulting type then hit a precondition violation. These array types arise in verification tools that model heaps as unbounded mathematical arrays (e.g., array<array<struct>> indexed by mathematical integers). The fix adds an early return in bounds_check_index that: - Treats ID_natural the same as ID_unsignedbv (always non-negative) - For ID_integer or ID_signedbv indices on symbol or nested-index array expressions, emits a simple index >= 0 lower-bound check - Skips the object_descriptor_exprt byte-offset path entirely, avoiding the crash in the simplifier when it encounters types without a defined size
Add regex type support (irep ID and RegLan SMT output) and map all Strata function application names to their SMT-LIB equivalents: - Str.Length -> str.len, Str.Concat -> str.++, Str.Substr -> str.substr - Str.ToRegEx -> str.to_re, Str.InRegEx -> str.in_re - Re.AllChar -> re.allchar, Re.All -> re.all, Re.None -> re.none - Re.Range -> re.range, Re.Concat -> re.concat, Re.Star -> re.* - Re.Plus -> re.+, Re.Loop -> re.loop, Re.Union -> re.union - Re.Inter -> re.inter, Re.Comp -> re.comp These function_application expressions are emitted by Strata's GOTO backend as mathematical_function calls. The smt2_conv now recognizes them and emits proper SMT-LIB syntax for solvers with string/regex theory support (e.g., cvc5, Z3).
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
84732a2 to
ab97a57
Compare
We can safely thread through string operations to the back-end as SMT solvers now have support for the theory of strings.