Skip to content

RubricBasedEvaluator _normalize_text too basic — fails on judge model markdown output #6072

@tottenjordan

Description

@tottenjordan

Description

_normalize_text() in rubric_based_evaluator.py only does text.lower().strip(). When LLM judge models return rubric verdicts with markdown formatting (bullets, smart quotes, extra whitespace, non-ASCII characters), the exact-match rubric lookup fails silently, producing incorrect scores.

Reproduction

Judge model returns: "• The response correctly identifies the tool"
Expected rubric: "the response correctly identifies the tool"
_normalize_text() produces: "• the response correctly identifies the tool"
Result: No match → score defaults to 0 or lowest rubric

Common Patterns That Fail

  • Leading bullets: , *, -
  • Smart quotes: "...", '...'
  • Non-ASCII: accented characters, em-dashes
  • Multi-space: "the response" vs "the response"
  • Trailing whitespace/newlines

Suggested Fix

Enhanced normalization:

def _normalize_text(text: str) -> str:
    if not isinstance(text, str):
        return ""
    text = re.sub(r'^[\s*•\-]+', '', text)   # Strip leading bullets
    text = re.sub(r'[\s*•\-]+$', '', text)   # Strip trailing
    text = re.sub(r'\s+', ' ', text)          # Collapse whitespace
    text = text.encode('ascii', 'ignore').decode()  # Remove non-ASCII
    return text.lower().strip()

Additionally, a substring fallback when exact match fails would prevent silent scoring failures:

# If exact match fails, try substring match
for rubric_text, score in rubric_map.items():
    if normalized_response in rubric_text or rubric_text in normalized_response:
        return score

Impact

Without this fix, GEPA optimization produces unreliable rubric-based scores (rubric_based_final_response_quality_v1, rubric_based_tool_use_quality_v1), leading to suboptimal prompt evolution.

Environment

  • google-adk 2.2.0
  • Judge models: gemini-2.5-pro, gemini-3.5-flash

Metadata

Metadata

Labels

eval[Component] This issue is related to evaluationrequest clarification[Status] The maintainer need clarification or more information from the author

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions