Skip to content

[Bug]: JSONDecodeError in DRIFT Search due to malformed LLM response (Invalid control character) #2163

@arvin8

Description

@arvin8

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the bug

When using DRIFT search with certain LLM providers (specifically moonshotai/kimi-k2-0905 via OpenRouter), GraphRAG encounters a JSONDecodeError: Invalid control character. This occurs because the LLM returns a JSON response containing malformed fragments (specifically trailing characters like ", ":", "}) outside the valid JSON structure.

Steps to reproduce

  1. Configuration: Set up GraphRAG v2.x to use OpenRouter as the LLM provider.
  2. Model: Configure the model to moonshotai/kimi-k2-0905:moonshotai.
  3. Search Mode: Initiate a DRIFT search.
  4. Query: Execute the following query (failure rate is approximately 1 in 8):
    {
      "question_type": "data_global",
      "search_mode": "drift",
      "question_text": "How should grass hay be positioned within the Litter Box Strategy to maximize the effectiveness of the litter training program for shelter rabbits?"
    }
  5. Observation: The search fails with a JSONDecodeError. Retrying the same query results in the same error at the exact same character position.

Expected Behavior

GraphRAG should gracefully handle or sanitize LLM responses before attempting to parse them as JSON, or provide a retry mechanism that catches JSONDecodeError specifically when the LLM provider returns slightly malformed output.

GraphRAG Config Used

# Paste your config here

Logs and screenshots

Error Trace:

json.decoder.JSONDecodeError: Invalid control character at: line 1 column 25 (char 24)

Raw Response (Debug Log):
The LLM provider returns JSON responses with malformed fragments. Note the stray characters after the closing brace:

2026-01-02 12:24:26,759 - LiteLLM - DEBUG - RAW RESPONSE:
{"id": "gen-1767353063-Ho0rIH8LK3FLrGBfQT64", "choices": [{"finish_reason": null, "index": 0, 
"logprobs": null, "message": {"content": "{\"intermediate_answer\":\"# How the Spring Pines Micro 
Farm community relates to dry-bath instructions\\n\\n...\", \":\", \"}", "refusal": null, 
"role": "assistant"...

Additional Information

Environment Information

  • GraphRAG Version: v2.x (async LiteLLM streaming)
  • Python Version: 3.12.2
  • LLM Provider: OpenRouter (openrouter.ai)
  • LLM Model: moonshotai/kimi-k2-0905:moonshotai
  • OS: Mac OS

Additional context

  • Root Cause Analysis: The error is strictly caused by the LLM appending invalid JSON control characters (", ":", "}) to the end of the intermediate_answer content string.
  • Scope: This does not affect local or global search modes using the same model; it appears specific to the prompt structure or parsing logic of DRIFT search.
  • Source Data: The source question JSON files are valid; the corruption is introduced during generation.

Suggested Fix

I have identified a potential fix. Adding a sanitization step before parsing the JSON response resolves the issue.

import re

def sanitize_json_response(text: str) -> str:
    """Remove invalid JSON control characters from LLM response."""
    if not text:
        return text
    
    # Remove control characters (ASCII < 32) except valid JSON whitespace
    sanitized = ''.join(
        char for char in text 
        if ord(char) >= 32 or char in '\n\r\t'
    )
    
    # Remove common malformed patterns like trailing ", ":", "}"
    sanitized = re.sub(r',\s*":\s*",\s*"}\s*$', '}', sanitized)
    
    return sanitized.strip()

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriageDefault label assignment, indicates new issue needs reviewed by a maintainer

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions