Commit ab4ebd4
docs: add multimodal OOM fix design spec
Design spec for eliminating the multimodal OOM class that surfaced with
Qwen3.5-VL. Replaces PR #1253 in full: absorbs its Qwen stress helpers
(minus the empty_cache call that released the measured peak), adds the
min-max bug fix at visualserver/manager.py:87, tightens visual+audio
concurrency semaphores from x8 to x1, ports _check_decode_infer from
origin/qw35_stable, and re-shapes the LLM init into a two-pass
probe-measure-rebuild-validate auto-profile that eliminates --mem_fraction
as a tuning knob.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent bbc9eba commit ab4ebd4
1 file changed
Lines changed: 759 additions & 0 deletions
0 commit comments