Skip to content

fix(analyzer): parse structured output from text so OpenAI-compatible endpoints don't crash LLM analysis#71

Open
bmd1905 wants to merge 1 commit into
NVIDIA:mainfrom
bmd1905:fix/openai-compatible-structured-output
Open

fix(analyzer): parse structured output from text so OpenAI-compatible endpoints don't crash LLM analysis#71
bmd1905 wants to merge 1 commit into
NVIDIA:mainfrom
bmd1905:fix/openai-compatible-structured-output

Conversation

@bmd1905

@bmd1905 bmd1905 commented Jun 15, 2026

Copy link
Copy Markdown

What this fixes

Closes #69.

When you point SKILLSPECTOR_PROVIDER=openai at an OpenAI-compatible endpoint through OPENAI_BASE_URL, the semantic stage can crash the whole scan with a JSON decode error. Static analysis still runs, but every LLM finding is silently lost.

The cause is in how the analyzers build their structured-output client:

self._structured_llm = self._llm.with_structured_output(self.response_schema)

ChatOpenAI.with_structured_output defaults to method="json_schema", i.e. strict response_format. An endpoint can implement the OpenAI API faithfully and still not honor that, and plenty don't. The ones I hit return valid JSON, just wrapped in a markdown fence:

```json
{"findings": []}

or with a sentence of prose on either side. The strict path hands that straight to `json.loads`, which raises, and the scan exits with `Error: Expecting value: line 1 column 1 (char 0)` or `Invalid JSON ... for LLMAnalysisResult`. Every analyzer that inherits `LLMAnalyzerBase` is affected, including the meta-analyzer.

## The change

I dropped the strict client in favor of the canonical langchain chain:

```python
self._parser = PydanticOutputParser(pydantic_object=self.response_schema)
self._structured_llm = self._llm | self._parser
self._format_instructions = self._parser.get_format_instructions()

PydanticOutputParser already uses langchain's fence-tolerant parse_json_markdown, and it validates into the same Pydantic schema, so parse_response does not change for either the per-file analyzers or the meta-analyzer. Strict endpoints keep working exactly as before; lenient ones now work too.

I also moved the format guidance into one place. A small _with_format helper appends get_format_instructions() at the run-loop chokepoint, so the default prompt and the meta-analyzer's own prompt both get it without each template having to embed JSON instructions by hand. It is a no-op in raw-string mode.

I deliberately did not hand-roll a fence stripper or a brace slicer. langchain's parser already covers the fenced, bare, and prose-wrapped cases, and reusing it keeps the surface small.

Tests

  • Added a regression suite that pushes bare, fenced, and prose-wrapped JSON through the real PydanticOutputParser via a tiny fake Runnable, plus an empty-findings case and a check that the format instructions land on the prompt.
  • Updated the three existing tests that asserted the now-removed with_structured_output seam. They use the same direct _structured_llm mock pattern that the other run-loop tests already rely on.
  • make test: 605 passed, 11 skipped. make lint and make format are clean.

How I verified it end to end

Against a non-strict OpenAI-compatible endpoint, main aborts in the semantic stage with the JSON error above. With this change the same scan completes and returns LLM-backed findings. Repeated runs show no more decode crashes; the only non-zero exits are the intended risk_score > 50 gate (exit 1), not parser errors (exit 2).

…e endpoints

The LLM analyzers built their structured-output client with
`ChatOpenAI.with_structured_output(schema)`, which defaults to
`method="json_schema"` (strict `response_format`). Endpoints that speak the
OpenAI API but do not enforce that contract are free to ignore it, and several
return the JSON wrapped in a markdown code fence or surrounded by prose. The
strict parser then receives non-JSON and raises, so every semantic analyzer
fails and the whole scan exits with a JSON decode error. Static analysis still
runs, but all LLM findings are lost.

Swap the strict client for the canonical langchain chain `llm | PydanticOutputParser`.
The parser uses langchain's fence-tolerant `parse_json_markdown` under the hood
and validates into the same Pydantic schema, so `parse_response` is unchanged
for both the per-file analyzers and the meta-analyzer. Strict endpoints keep
working; lenient ones now work too.

Inject the parser's `get_format_instructions()` through a single `_with_format`
helper at the run-loop chokepoint so the default prompt and the meta-analyzer's
overridden prompt both get JSON guidance without duplicating it.

Add regression tests that drive fenced and prose-wrapped JSON through the real
parser, and update the three tests that asserted the removed
`with_structured_output` seam to the direct-mock pattern the other tests use.

Closes NVIDIA#69

Signed-off-by: Minh-Duc Bui <minhduc2k31905@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenAI-compatible endpoints that don't strictly honor response_format crash all LLM analyzers (fenced/prose JSON)

1 participant