fix: add Cerebras models zai-glm-4.7#679
Open
github-actions[bot] wants to merge 1 commit into
Open
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fix: add Cerebras models zai-glm-4.7
Closes #674
Source issue: #674
Summary
zai-glm-4.7zai-glm-4.72
3
Verified metadata
Verification notes
Verification
Sources and fields verified
zai-glm-4.7), parameters (355B), speed (~1000 tok/s), Preview statussync_models (LiteLLM) cross-check
The model
cerebras/zai-glm-4.7does not appear in the LiteLLMmodel_prices_and_context_window_backup.jsoncatalog. However, a related entry exists under thezaiprovider (Z.AI's own hosted endpoint, not Cerebras):zai/glm-4.7in sync_models:max_input_tokens=200000,max_output_tokens=128000,input_cost_per_token=6e-07($0.60/MTok),output_cost_per_token=2.2e-06($2.20/MTok)zai-glm-4.7(Cerebras-hosted):max_input_tokens=131072,max_output_tokens=40960,input_cost_per_mil_tokens=2.25,output_cost_per_mil_tokens=2.75All four numeric fields differ because the sync_models entry reflects Z.AI's own infrastructure limits and pricing, whereas this issue covers the Cerebras-hosted version which has different context limits and pricing:
zai/glm-4.7(Z.AI)max_input_tokensmax_output_tokensinput_cost_per_mil_tokensoutput_cost_per_mil_tokensThe Cerebras official documentation is preferred because this catalog entry is specifically for the Cerebras-hosted version of the model, which has its own independently published limits and pricing. The sync_models
zai/glm-4.7entry is for a different provider (zai) and is not applicable to thecerebrasprovider mapping.Fields not published or not applicable
multimodal: Not specified in docs (input/output listed as "Text only"); omitted.parent: Not applicable — this is a standalone model, not a dated snapshot or alias.input_cache_read_cost_per_mil_tokens/input_cache_write_cost_per_mil_tokens: Prompt caching is mentioned as supported, but no cache-specific pricing is published; omitted.deprecation_date: Not applicable — model is in Preview, no deprecation announced.supported_regions: Not applicable — Cerebras is not a Vertex provider.locations: Not applicable — Cerebras models do not use location-scoped routing.Token limit interpretation
The Cerebras docs state "131k tokens" for context window and "40k tokens" for max output. Interpreting these as:
gpt-oss-120bentry which uses 131072)If Cerebras means literal 40,000 rather than 40,960, the difference is minor (960 tokens). The binary interpretation is used here for consistency with the existing catalog convention.
sync_models vs proposed update
sync_models cross-check found differences. Official provider verification was used for the applied values, and sync_models discrepancies are listed below for review.