Skip to content

fix: add Cerebras models zai-glm-4.7#679

Open
github-actions[bot] wants to merge 1 commit into
mainfrom
chore/autofix-issue-674
Open

fix: add Cerebras models zai-glm-4.7#679
github-actions[bot] wants to merge 1 commit into
mainfrom
chore/autofix-issue-674

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

fix: add Cerebras models zai-glm-4.7

Closes #674

Source issue: #674

Summary

Field Value
Provider cerebras
Primary model zai-glm-4.7
Changed models zai-glm-4.7
Added models zai-glm-4.7
Updated models None
Verification sources 1
2
3

Verified metadata

Model Display name Parent Providers Format Flavor Token limits Pricing Lifecycle
zai-glm-4.7 Z.ai GLM 4.7 cerebras openai chat input=131072, output=40960 in/out=2.25/2.75 per 1M reasoning=true

Verification notes

Verification

Sources and fields verified

  1. https://inference-docs.cerebras.ai/models/overview — verified: model ID (zai-glm-4.7), parameters (355B), speed (~1000 tok/s), Preview status
  2. https://inference-docs.cerebras.ai/models/zai-glm-47 — verified: context window paid tier (131k), max output (40k), pricing ($2.25/$2.75 per MTok), reasoning enabled by default, tool calling, structured outputs
  3. https://cerebras.ai/pricing — confirmed general pricing tiers exist; per-model token pricing is on the model-specific page rather than the pricing landing page

sync_models (LiteLLM) cross-check

The model cerebras/zai-glm-4.7 does not appear in the LiteLLM model_prices_and_context_window_backup.json catalog. However, a related entry exists under the zai provider (Z.AI's own hosted endpoint, not Cerebras):

  • zai/glm-4.7 in sync_models: max_input_tokens=200000, max_output_tokens=128000, input_cost_per_token=6e-07 ($0.60/MTok), output_cost_per_token=2.2e-06 ($2.20/MTok)
  • Proposed zai-glm-4.7 (Cerebras-hosted): max_input_tokens=131072, max_output_tokens=40960, input_cost_per_mil_tokens=2.25, output_cost_per_mil_tokens=2.75

All four numeric fields differ because the sync_models entry reflects Z.AI's own infrastructure limits and pricing, whereas this issue covers the Cerebras-hosted version which has different context limits and pricing:

Field Proposed (Cerebras) sync_models zai/glm-4.7 (Z.AI) Justification
max_input_tokens 131072 200000 Cerebras model page states "131k tokens" context window for paid tiers (https://inference-docs.cerebras.ai/models/zai-glm-47). Different provider, different limits.
max_output_tokens 40960 128000 Cerebras model page states "40k tokens" max output (https://inference-docs.cerebras.ai/models/zai-glm-47). Different provider, different limits.
input_cost_per_mil_tokens 2.25 0.60 Cerebras charges $2.25/MTok (https://inference-docs.cerebras.ai/models/zai-glm-47); Z.AI charges $0.60/MTok on their own platform.
output_cost_per_mil_tokens 2.75 2.20 Cerebras charges $2.75/MTok (https://inference-docs.cerebras.ai/models/zai-glm-47); Z.AI charges $2.20/MTok on their own platform.

The Cerebras official documentation is preferred because this catalog entry is specifically for the Cerebras-hosted version of the model, which has its own independently published limits and pricing. The sync_models zai/glm-4.7 entry is for a different provider (zai) and is not applicable to the cerebras provider mapping.

Fields not published or not applicable

  • multimodal: Not specified in docs (input/output listed as "Text only"); omitted.
  • parent: Not applicable — this is a standalone model, not a dated snapshot or alias.
  • input_cache_read_cost_per_mil_tokens / input_cache_write_cost_per_mil_tokens: Prompt caching is mentioned as supported, but no cache-specific pricing is published; omitted.
  • deprecation_date: Not applicable — model is in Preview, no deprecation announced.
  • supported_regions: Not applicable — Cerebras is not a Vertex provider.
  • locations: Not applicable — Cerebras models do not use location-scoped routing.

Token limit interpretation

The Cerebras docs state "131k tokens" for context window and "40k tokens" for max output. Interpreting these as:

  • 131k → 131,072 (consistent with existing gpt-oss-120b entry which uses 131072)
  • 40k → 40,960 (40 × 1024, following the same binary-k convention)

If Cerebras means literal 40,000 rather than 40,960, the difference is minor (960 tokens). The binary interpretation is used here for consistency with the existing catalog convention.

sync_models vs proposed update

sync_models cross-check found differences. Official provider verification was used for the applied values, and sync_models discrepancies are listed below for review.

Model Field Proposed update sync_models sync_models source models
zai-glm-4.7 max_input_tokens 131072 128000 cerebras/zai-glm-4.7
zai-glm-4.7 max_output_tokens 40960 128000 cerebras/zai-glm-4.7

@github-actions github-actions Bot requested a review from aswink May 29, 2026 15:20
@vercel
Copy link
Copy Markdown

vercel Bot commented May 29, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
ai-proxy Ready Ready Preview, Comment May 29, 2026 3:21pm

Request Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BOT ISSUE] Cerebras: add missing zai-glm-4.7 model

1 participant