perf: Cumulative startup and runtime optimizations#3
Draft
Conversation
black is only used in create_context_prompt() and format_code() -- both cold paths. Moving the import inside the functions avoids loading black and its transitive deps (pathspec, black.nodes, etc.) on every import typeagent.
- Combine 16 separate cursor.execute() calls in init_db_schema into a single db.executescript() call, reducing SQLite round-trips during database initialization. - Pre-compile the whitespace regex in _prepare_term to avoid re-compiling on every call (552 calls during indexing).
This reverts commit d4bc744.
Add add_terms_batch / add_properties_batch to the index interfaces with executemany-based SQLite implementations. Restructure add_metadata_to_index_from_list and add_to_property_index to collect all items first, then batch-insert via extend() and the new batch methods. Eliminates ~1000 individual INSERT round-trips during indexing.
Replace hand-rolled time.perf_counter() loop with the pedantic fixture from pytest-async-benchmark. Setup (DB/storage/transcript creation) and teardown (close/delete) are now properly excluded from timing via the framework instead of inline timing code.
Move repeated setup/teardown/target pattern into run_indexing_benchmark() helper. Each test now delegates with just messages and message_type.
Rename _collect_{facet,entity,action}_{terms,properties} to drop the
leading underscore in propindex.py and semrefindex.py.
Install from fork with pedantic mode support for benchmark tests.
Change list to Sequence in add_terms_batch and add_properties_batch interfaces and implementations to satisfy covariance. Add missing add_terms_batch to FakeTermIndex in conftest.py.
Replace Python-level list comprehension + sort with numpy operations: - No-predicate path: np.flatnonzero for score filtering, np.argpartition for O(n) top-k selection — avoids building ScoredInt for every vector - Predicate path: numpy pre-filters by score, applies predicate only to candidates above threshold - Subset lookup: numpy fancy indexing computes dot products only for subset indices instead of delegating to full-vector scan with predicate
lookup_term_filtered called get_item() per scored ref — one SELECT and full deserialization per match. The filter only needs knowledge_type (a plain column) and range (json.loads of range_json), never the expensive knowledge_json deserialization (64% of per-row cost). Add get_metadata_multiple to ISemanticRefCollection that fetches only semref_id, range_json, knowledge_type in a single batch query. Replace the N+1 loop in lookup_term_filtered with one get_metadata_multiple call. Benchmark (200 matches, 200 rounds): 4.38ms → 1.32ms (3.3x speedup).
Apply the same get_metadata_multiple pattern from lookup_term_filtered to four more sites that called get_item() in a loop: - propindex.lookup_property_in_property_index: filter by .range - SemanticRefAccumulator.group_matches_by_type: group by .knowledge_type - SemanticRefAccumulator.get_matches_in_scope: filter by .range - answers.get_scored_semantic_refs_from_ordinals_iter: two-phase metadata filter then batch get_multiple for matching full objects All sites now use a single batch query instead of N individual SELECTs, skipping knowledge_json deserialization where only range or knowledge_type is needed.
parse_azure_endpoint returned the raw URL including ?api-version=... which AsyncAzureOpenAI then mangled into invalid paths like ...?api-version=2024-06-01/openai/. Strip the query string before returning — api_version is already returned as a separate value and passed to the SDK independently.
…arisons - Use bisect_right with key=start in TextRangeCollection.contains_range to skip O(n) linear scan (O(log n) for non-overlapping point ranges) - Replace TextLocation allocations in TextRange __eq__/__lt__/__contains__ with a shared _effective_end returning tuples - Skip pydantic validation in get_metadata_multiple by constructing TextLocation/TextRange directly from JSON
black is only used at runtime in two cold formatting paths: - create_context_prompt() in answers.py (LLM debug context) - format_code()/pretty_print() in utils.py (developer terminal output) Both format Python data structures, which is exactly what pprint does. Replace black.format_str with pprint.pformat + ast.literal_eval, eliminating the runtime dependency entirely. Move black from dependencies to dev dependency-group — it remains available for make format/check but is no longer required by library consumers.
answers, search_query_schema, searchlang, and answer_response_schema are only used in the query() method. Move their imports from module level into query() and use TYPE_CHECKING + __future__.annotations for the type hints. These modules pull in search, query, and schema initialization that isn't needed when creating or indexing conversations.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cumulative Benchmarks
Azure Standard_D2s_v5 (2 vCPU, 8 GB RAM, non-burstable), Python 3.13, Ubuntu 24.04
Startup
hyperfine (warmup 5, min-runs 30)
import typeagentRuntime (indexing pipeline)
pytest-async-benchmark pedantic mode, 20 rounds, 3 warmup — only hot path timed (setup/teardown excluded)
add_messages_with_indexing(200 msgs)add_messages_with_indexing(50 msgs)Query
pytest-async-benchmark pedantic mode, 200 rounds, 20 warmup
lookup_term_filtered(200 matches)group_matches_by_type(200 matches)get_scored_semantic_refs_from_ordinals_iter(200 matches)lookup_property_in_property_index(200 matches)get_matches_in_scope(200 matches)Vector Search
pytest-async-benchmark pedantic mode, 200 rounds, 20 warmup, 384-dim embeddings
fuzzy_lookup_embedding(1K vecs)fuzzy_lookup_embedding(10K vecs)fuzzy_lookup_embedding(10K + predicate)fuzzy_lookup_embedding_in_subset(1K of 10K)Optimizations (cumulative)
1. Defer
blackimport to first use (ecbf6f5)blackwas imported at module level but only used in two cold-path functionsimport blackinsidecreate_context_prompt()andformat_code()2. Batch SQLite INSERTs for indexing pipeline (
bc9f2df)add_terms_batchandadd_properties_batchtoITermToSemanticRefIndexandIPropertyToSemanticRefIndexinterfacesexecutemanyinstead of individualcursor.execute()callsadd_metadata_to_index_from_listandadd_to_property_indexto collect all data first, then batch-insert3. Numpy vectorized fuzzy lookup (
bc5b319)np.flatnonzero+np.argpartitionfor O(n) top-kfuzzy_lookup_embedding_in_subsetnow uses fancy indexing to compute dot products only for subset indices4. Batch metadata query across 5 N+1 call sites
get_item()per scored ref — N+1 pattern with fullknowledge_jsondeserializationget_metadata_multipletoISemanticRefCollection— fetches onlysemref_id, range_json, knowledge_typein one batch queryjson.loads(knowledge_json)anddeserialize_knowledge()entirely (64% of per-row cost)5. Speed up scope-filtering: bisect + inline tuple comparisons
TextRangeCollection.contains_range— replaced O(n) linear scan withbisect_rightkeyed onstartTextRange.__eq__/__lt__/__contains__— replacedTextLocationallocations with_effective_endtuplesget_metadata_multiple— constructTextLocation/TextRangedirectly from JSON6. Bugfix: parse_azure_endpoint
parse_azure_endpointreturned full URL with?api-version=..., whichAsyncAzureOpenAImangled into a double-pathThis PR accumulates all optimizations on the
optimizationbranch. Benchmarks are re-run after every push. Individual PRs are opened separately for review.