perf: optimize claude-compat latency and reliability by daixudk96-max · Pull Request #348 · su-kaka/gcli2api

daixudk96-max · 2026-03-02T17:25:41Z

Optimization Plan Execution: `optimize-claude-compat-latency`

This PR completes the 3-track optimization plan for the Claude-compatible API layer, aimed at substantially reducing latency without compromising quality or backward compatibility.

Changes Included:

Retry Mechanism Cleanup (Task 1): Eliminated duplicate sleep loops between caller and utility layers. Added fail-fast logic for stream endpoints after the first chunk is emitted.
HTTP/2 Connection Pooling (Task 2): Switched to pooled httpx.AsyncClient reuse with HTTP/2 enabled, effectively eliminating TCP/TLS/ALPN handshake latency on hot paths.
Anthropic Converter Fast-Path (Task 3): Bypassed schema sanitization and recursive mapping for simple text-only prompts/responses, avoiding CPU bottleneck in the stream loop.
Preview Credential Scheduler (Task 4): Upgraded from purely random picking to a health-aware score (combining in-flight pressure, 429 cooling status, and recent errors) specifically for the gemini-3-pro-preview model.
Rollout & Verification Gates (Task 5): Added 4 new Feature Flags connected to the Control Panel, and built a CI-friendly perf testing suite (scripts/perf/bench.py, assert_latency.py, rollout_guard.py).

Performance Results (Real-World Test):

Against gemini-3-pro-preview-high:

TTFB (First Byte): ~7.5ms (Streaming)
First Token P95: 18.14s (55.8% improvement vs 41.05s baseline)
Full Latency P95: 18.14s (55.8% improvement)
Throughput: ~98% increase in Requests/sec
Quality Regression: None (100% success rate on test dataset)

Resolves plan optimize-claude-compat-latency.

Fallback to ASCII-escaped JSON encoding when payload contains lone surrogate code units so streaming/non-stream requests no longer crash before dispatch. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

…a_for_claude unsupported_keys Google's Gemini API uses a protobuf-based schema parser that only accepts a whitelist of JSON Schema fields. When the model name contains 'claude', tools are cleaned via _clean_schema_for_claude() instead of _clean_schema_for_gemini(). However, _clean_schema_for_claude() was missing patternProperties, dependencies, and propertyNames from its unsupported_keys set, causing Google API to reject requests with: Invalid JSON payload received. Unknown name "patternProperties" This affects any OpenAI-compatible client (e.g. OpenClaw) that includes patternProperties in tool parameter schemas when routed through the Claude/Antigravity path. Ref: https://cloud.google.com/vertex-ai/generative-ai/docs/reference/rest/v1/Schema

…backoff, raise retry interval default - Enable HTTP/2 in httpx client to match Google cloudcode-pa endpoint expectations (fixes intermittent 'All connection attempts failed' errors) - Detect MODEL_CAPACITY_EXHAUSTED errors in 429 responses and apply exponential backoff with jitter (capped at 12s) instead of flat retry interval - When capacity is exhausted, rotate to next credential instead of hammering the same one (prevents thundering herd amplification) - Raise default RETRY_429_INTERVAL from 0.1s to 1.0s - Improve exception handler logging with type/repr for better diagnostics - Fix CredentialManager type hints in utils.py (Any instead of concrete class)

- Normalize multiple project_id response shapes from loadCodeAssist/onboardUser - Support extracting project id from resource names like projects/*/locations/* - Extend onboardUser polling window from 10s to 30s for slow activation cases

…ompat latency - **Preview Scheduler (Task 4)**: Added health-aware credential scoring (in-flight pressure, 429 signal, usage count) via `sqlite_manager.py` and `credential_manager.py`, keeping strict preview model boundaries. Includes assert script `assert_preview_pool.py`. - **Release Automation (Task 5)**: Wired 4 core feature flags (`ff_retry_policy_v2`, `ff_http2_pool_tuning`, `ff_converter_fast_path`, `ff_preview_credential_scheduler_v2`) into runtime and Control Panel. Created `rollout_guard.py` to compute automated rollout/rollback decisions based on relative latency/throughput/quality thresholds. - **Verification**: Real-world load against `gemini-3-pro-preview-high` confirms TTFB ~7.5ms (via HTTP/2 pooling), P95 Latency 18.14s (-55.8% vs 41.05s baseline), and 100% success rate with no quality regression. All targeted gates passed. Fixes #plan-optimize-claude-compat-latency

… stream errors

…tp2-retry

CI User and others added 30 commits February 23, 2026 23:57

反重力取消附加默认系统提示词

2882d14

chore: update version.txt [skip ci]

86ca194

优化重试机制

f61a0e4

chore: update version.txt [skip ci]

83bffcf

优化性能

1cc231d

chore: update version.txt [skip ci]

7c0b378

优化mongodb查询

461218e

chore: update version.txt [skip ci]

c4f907b

更新redis缓存

4839cac

chore: update version.txt [skip ci]

857c6b8

Update mongodb_manager.py

07f8250

chore: update version.txt [skip ci]

b951437

Update pyproject.toml

facccca

chore: update version.txt [skip ci]

3db1545

Update .env.example

0f3a4a2

Update mongodb_manager.py

0595667

chore: update version.txt [skip ci]

b68da13

chore: update version.txt [skip ci]

de10836

fix: replace 2 bare excepts with except Exception

8bc8334

chore: update version.txt [skip ci]

4247d02

Update anti_truncation.py

7671533

chore: update version.txt [skip ci]

a526efa

Update gemini_fix.py

21e9448

chore: update version.txt [skip ci]

77bbd21

配置项：重试时是否切换凭证

ceefc00

chore: update version.txt [skip ci]

274ec71

1

2e38216

render保活

fe0e7d8

chore: update version.txt [skip ci]

53ff29f

github-actions bot and others added 14 commits February 28, 2026 17:48

chore: update version.txt [skip ci]

fedf2cd

移除重试时不切换凭证

dd5b630

chore: update version.txt [skip ci]

bc4dbd5

优化流式传输时的报错显示

fe10819

chore: update version.txt [skip ci]

3af729e

chore: update version.txt [skip ci]

85a0188

merge: sync upstream origin/master into feature branch

8ccb16b

chore: backup pre-optimization code changes

f33ab48

fix: retry thinking-only stream interruptions and normalize anthropic…

de98f11

… stream errors

Merge remote-tracking branch 'origin/master' into fix/capacity-429-ht…

e04c03c

…tp2-retry

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: optimize claude-compat latency and reliability#348

perf: optimize claude-compat latency and reliability#348
daixudk96-max wants to merge 44 commits intosu-kaka:masterfrom
daixudk96-max:fix/capacity-429-http2-retry

daixudk96-max commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

daixudk96-max commented Mar 2, 2026

Optimization Plan Execution: optimize-claude-compat-latency

Changes Included:

Performance Results (Real-World Test):

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Optimization Plan Execution: `optimize-claude-compat-latency`