From 1f77b45b86567f7d11dfd91b4865adc5f2e9a9f5 Mon Sep 17 00:00:00 2001 From: "Paul S. Schweigert" Date: Fri, 29 May 2026 13:36:06 -0400 Subject: [PATCH] update gpu recs for speech demo notebook Signed-off-by: Paul S. Schweigert --- tutorials/notebooks/granite_speech_demo.ipynb | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tutorials/notebooks/granite_speech_demo.ipynb b/tutorials/notebooks/granite_speech_demo.ipynb index ea41a20..2b5c04c 100644 --- a/tutorials/notebooks/granite_speech_demo.ipynb +++ b/tutorials/notebooks/granite_speech_demo.ipynb @@ -18,7 +18,7 @@ "\n", "## Prerequisites\n", "\n", - "- **GPU runtime: A100 (Colab Pro) recommended.** L4 works. T4 will OOM — both Granite models won't fit.\n", + "- **GPU runtime: A100 (Colab Pro) required.** Smaller GPUs won't have enough VRAM to hold both Granite models simultaneously.\n", "- **HuggingFace read token.** Free; create one at https://huggingface.co/settings/tokens. Add it as a Colab Secret named `HF_TOKEN` (sidebar → 🔑 → New secret). Used for two things: downloading the Granite model weights, *and* minting per-session WebRTC TURN credentials so audio reaches your browser.\n", "- **Browser:** Chrome, Edge, or Firefox. Safari may behave oddly with WebRTC.\n", "\n", @@ -30,7 +30,7 @@ "## What to do\n", "\n", "1. Set the `HF_TOKEN` Colab Secret.\n", - "2. Switch the runtime to a GPU (Runtime → Change runtime type → A100/L4).\n", + "2. Switch the runtime to an A100 GPU (Runtime → Change runtime type → A100).\n", "3. **Runtime → Run all.**\n", "4. When the last cell prints a `*.trycloudflare.com` URL, open it, allow mic access, and start talking.\n", "\n", @@ -225,7 +225,7 @@ "View one with `!tail -100 logs/vllm-speech.log` (or open the file from the Colab file browser).\n", "\n", "**Common failures:**\n", - "- *T4 OOM:* switch the runtime to A100 or L4. Both Granite models won't fit on a T4.\n", + "- *GPU OOM:* switch the runtime to an A100. Both Granite models won't fit on smaller GPUs (T4, L4).\n", "- *`HF_TOKEN` missing:* re-run Cell 3 after adding the secret. Without it, the backend falls back to STUN-only and audio likely won't connect through the cloudflared tunnel.\n", "- *Stuck \"waiting for vLLM\":* model weights are downloading. The cell waits up to 20 min — let it run.\n", "- *Re-running cells without cleaning up:* old processes still hold the ports. Run the kill-switch cell below, then re-run from the top.\n",