-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Description
Do you need to file an issue?
- I have searched the existing issues and this bug is not already filed.
- My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
- I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.
Describe the bug
When using DRIFT search with certain LLM providers (specifically moonshotai/kimi-k2-0905 via OpenRouter), GraphRAG encounters a JSONDecodeError: Invalid control character. This occurs because the LLM returns a JSON response containing malformed fragments (specifically trailing characters like ", ":", "}) outside the valid JSON structure.
Steps to reproduce
- Configuration: Set up GraphRAG v2.x to use OpenRouter as the LLM provider.
- Model: Configure the model to
moonshotai/kimi-k2-0905:moonshotai. - Search Mode: Initiate a DRIFT search.
- Query: Execute the following query (failure rate is approximately 1 in 8):
{ "question_type": "data_global", "search_mode": "drift", "question_text": "How should grass hay be positioned within the Litter Box Strategy to maximize the effectiveness of the litter training program for shelter rabbits?" } - Observation: The search fails with a
JSONDecodeError. Retrying the same query results in the same error at the exact same character position.
Expected Behavior
GraphRAG should gracefully handle or sanitize LLM responses before attempting to parse them as JSON, or provide a retry mechanism that catches JSONDecodeError specifically when the LLM provider returns slightly malformed output.
GraphRAG Config Used
# Paste your config here
Logs and screenshots
Error Trace:
json.decoder.JSONDecodeError: Invalid control character at: line 1 column 25 (char 24)
Raw Response (Debug Log):
The LLM provider returns JSON responses with malformed fragments. Note the stray characters after the closing brace:
2026-01-02 12:24:26,759 - LiteLLM - DEBUG - RAW RESPONSE:
{"id": "gen-1767353063-Ho0rIH8LK3FLrGBfQT64", "choices": [{"finish_reason": null, "index": 0,
"logprobs": null, "message": {"content": "{\"intermediate_answer\":\"# How the Spring Pines Micro
Farm community relates to dry-bath instructions\\n\\n...\", \":\", \"}", "refusal": null,
"role": "assistant"...
Additional Information
Environment Information
- GraphRAG Version: v2.x (async LiteLLM streaming)
- Python Version: 3.12.2
- LLM Provider: OpenRouter (openrouter.ai)
- LLM Model:
moonshotai/kimi-k2-0905:moonshotai - OS: Mac OS
Additional context
- Root Cause Analysis: The error is strictly caused by the LLM appending invalid JSON control characters (
", ":", "}) to the end of theintermediate_answercontent string. - Scope: This does not affect local or global search modes using the same model; it appears specific to the prompt structure or parsing logic of DRIFT search.
- Source Data: The source question JSON files are valid; the corruption is introduced during generation.
Suggested Fix
I have identified a potential fix. Adding a sanitization step before parsing the JSON response resolves the issue.
import re
def sanitize_json_response(text: str) -> str:
"""Remove invalid JSON control characters from LLM response."""
if not text:
return text
# Remove control characters (ASCII < 32) except valid JSON whitespace
sanitized = ''.join(
char for char in text
if ord(char) >= 32 or char in '\n\r\t'
)
# Remove common malformed patterns like trailing ", ":", "}"
sanitized = re.sub(r',\s*":\s*",\s*"}\s*$', '}', sanitized)
return sanitized.strip()