Skip to content

Fix an issue where LLM responses are not streamed or rendered properly in the AI Assistant. Fixes #9734#9735

Open
dpage wants to merge 2 commits intopgadmin-org:masterfrom
dpage:fix_assistant_streaming_rendering
Open

Fix an issue where LLM responses are not streamed or rendered properly in the AI Assistant. Fixes #9734#9735
dpage wants to merge 2 commits intopgadmin-org:masterfrom
dpage:fix_assistant_streaming_rendering

Conversation

@dpage
Copy link
Contributor

@dpage dpage commented Mar 11, 2026

Fix #9734

Summary by CodeRabbit

  • New Features

    • Real-time streaming for AI-assisted chat with live text updates, intermediate “thinking” indicators, and final aggregated responses.
    • Incremental Markdown rendering including fenced SQL code blocks and multi-statement results; improved SQL extraction and per-block rendering/actions.
  • Bug Fixes

    • Fixed issue preventing AI assistant responses from streaming or rendering correctly.
  • Style

    • Added hyperlink color tokens to dark, light, and high-contrast themes.
  • Documentation

    • Release notes updated to include the new bug-fix entry.

@dpage dpage changed the title Fixean issue where LLM responses are not streamed or rendered properly in the AI Assistant. Fixes #9734 Fix an issue where LLM responses are not streamed or rendered properly in the AI Assistant. Fixes #9734 Mar 11, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 11, 2026

Walkthrough

Adds end-to-end LLM streaming: streaming APIs in client/providers/chat, streaming NLQ backend pipeline, incremental SSE events, frontend incremental Markdown/code rendering for streaming responses, tests for extraction/streaming, theme color tokens, and a release-note entry for issue #9734.

Changes

Cohort / File(s) Summary
Core LLM API
web/pgadmin/llm/chat.py, web/pgadmin/llm/client.py
Added chat_with_database_stream and LLMClient.chat_stream generator interfaces for incremental chunks and final LLMResponse tuples.
Provider implementations
web/pgadmin/llm/providers/openai.py, web/pgadmin/llm/providers/anthropic.py, web/pgadmin/llm/providers/docker.py, web/pgadmin/llm/providers/ollama.py
Implemented chat_stream, _process_stream, and provider-specific stream parsers (_read_*_stream/NDJSON/SSE); added streaming error handling, usage/tool_call aggregation; Docker adds loopback URL validation.
NLQ prompt
web/pgadmin/llm/prompts/nlq.py
Replaced strict JSON-output instruction with a fenced Markdown SQL response format in NLQ_SYSTEM_PROMPT.
NLQ backend SSE pipeline
web/pgadmin/tools/sqleditor/__init__.py
Switched to streaming backend chat_with_database_stream; emits SSE text_delta, thinking, and complete events; extracts SQL from fenced-code blocks with JSON fallback; event payload shape adjusted.
NLQ frontend streaming UI
web/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsx
Added streaming message type/state, incremental Markdown parsing/rendering, code-block segmentation, blinking cursor, and handlers for text_delta/thinking/complete SSE events; UI accepts formatSqlWithPrefs.
Tests: streaming & extraction
web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py
Updated tests to mock streaming generator; added tests for fenced-code SQL extraction, SSE streaming events, and prompt format expectations.
Theme tokens
web/pgadmin/static/js/Theme/light.js, web/pgadmin/static/js/Theme/dark.js, web/pgadmin/static/js/Theme/high_contrast.js
Added hyperlinkColor token to theme otherVars across themes.
Docs
docs/en_US/release_notes_9_14.rst
Appended bugfix entry for issue #9734 about streamed LLM responses rendering.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant NLQ_UI as NLQ Chat UI
    participant Backend as NLQ Backend
    participant ChatPipe as Chat Pipeline
    participant Provider as LLM Provider Client
    participant LLM as LLM Service

    User->>NLQ_UI: Submit question
    NLQ_UI->>Backend: POST user message
    Backend->>ChatPipe: chat_with_database_stream(...)
    ChatPipe->>Provider: chat_stream(messages, tools, system_prompt)
    Provider->>LLM: HTTP streaming request
    LLM-->>Provider: Stream chunks (SSE/NDJSON)

    loop per chunk
        alt text chunk
            Provider-->>ChatPipe: yield "text" (string)
            ChatPipe-->>Backend: emit SSE {type: "text_delta", content}
            Backend-->>NLQ_UI: SSE text_delta
            NLQ_UI->>NLQ_UI: parseMarkdownSegments + render
        else tool call
            Provider-->>ChatPipe: yield ('tool_use', ...)
            ChatPipe-->>Backend: emit SSE {type: "thinking"}
            Backend-->>NLQ_UI: SSE thinking
        end
    end

    LLM-->>Provider: stream end + final response
    Provider-->>ChatPipe: yield final LLMResponse
    ChatPipe->>ChatPipe: extract SQL from fenced code blocks (or JSON fallback)
    ChatPipe-->>Backend: final tuple (response_text, history)
    Backend-->>NLQ_UI: SSE {type: "complete", content, sql}
    NLQ_UI->>User: render final content + SQL
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • PR #9472 — modifies the same LLM/chat streaming subsystems (client/providers/chat) and introduces related streaming interfaces and parsing.

Suggested reviewers

  • anilsahoo20
  • akshay-joshi
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main objective: fixing streaming and rendering of LLM responses in the AI Assistant, which aligns with the primary changes across chat.py, client.py, and all LLM provider implementations.
Linked Issues check ✅ Passed The PR comprehensively addresses issue #9734 requirements: implements complete SSE streaming integration via chat_stream methods in client.py and all provider modules (anthropic.py, openai.py, docker.py, ollama.py); ensures proper streaming to client through chat_with_database_stream in chat.py and nlq_chat_stream in init.py; handles incremental rendering of streamed content including code block re-rendering on fence closure via parseMarkdownSegments and renderMarkdownText in NLQChatPanel.jsx.
Out of Scope Changes check ✅ Passed All changes directly support streaming and rendering objectives: LLM provider streaming implementations, chat layer updates, NLQ integration, frontend rendering, system prompt adjustment for code fence output, and theme color additions for hyperlinks are all within scope of fixing streaming/rendering.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (3)
web/pgadmin/llm/providers/ollama.py (2)

321-329: Preserve exception context with raise ... from e.

The exception re-raise loses the original traceback. Use from e (or from None if you intentionally want to suppress the original) to maintain the exception chain for easier debugging.

♻️ Proposed fix
         try:
             yield from self._process_stream(payload)
         except LLMClientError:
             raise
         except Exception as e:
             raise LLMClientError(LLMError(
-                message=f"Streaming request failed: {str(e)}",
+                message=f"Streaming request failed: {e!s}",
                 provider=self.provider_name
-            ))
+            )) from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/providers/ollama.py` around lines 321 - 329, The exception
handler in the streaming path currently raises a new LLMClientError losing the
original traceback; update the except Exception as e block to re-raise the
constructed LLMClientError using exception chaining (raise ... from e) so the
original exception context is preserved; apply this change where the yield from
self._process_stream(payload) is wrapped and the LLMClientError(LLMError(...,
provider=self.provider_name)) is created so the new raise uses "from e" to keep
the original traceback.

346-364: Preserve exception context in _process_stream error handlers.

Both HTTPError and URLError handlers lose the original exception context. Add from e to maintain the traceback chain.

♻️ Proposed fix
         except urllib.error.HTTPError as e:
             error_body = e.read().decode('utf-8')
             try:
                 error_data = json.loads(error_body)
                 error_msg = error_data.get('error', str(e))
             except json.JSONDecodeError:
                 error_msg = error_body or str(e)
             raise LLMClientError(LLMError(
                 message=error_msg,
                 code=str(e.code),
                 provider=self.provider_name,
                 retryable=e.code in (429, 500, 502, 503, 504)
-            ))
+            )) from e
         except urllib.error.URLError as e:
             raise LLMClientError(LLMError(
                 message=f"Cannot connect to Ollama: {e.reason}",
                 provider=self.provider_name,
                 retryable=True
-            ))
+            )) from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/providers/ollama.py` around lines 346 - 364, In
_process_stream, the except handlers for urllib.error.HTTPError and
urllib.error.URLError (the blocks that build and raise LLMClientError wrapping
LLMError) are losing the original exception context; update both raises to use
exception chaining (raise LLMClientError(...) from e) so the original traceback
is preserved — specifically modify the HTTPError handler that parses
error_body/error_data and raises LLMClientError(LLMError(...)) and the URLError
handler that raises LLMClientError(LLMError(...)) to append "from e".
web/pgadmin/llm/chat.py (1)

157-166: Generator return type annotation could be more precise.

The type annotation Generator[Union[str, tuple[str, list[Message]]], None, None] doesn't capture the ('tool_use', list[str]) tuple variant that's yielded at line 229. Consider using a more explicit union or documenting the intermediate tuple type.

💡 Suggested type improvement
+# Type alias for streaming yields
+StreamYield = Union[str, tuple[str, list[str]], tuple[str, list[Message]]]
+
 def chat_with_database_stream(
     user_message: str,
     sid: int,
     did: int,
     conversation_history: Optional[list[Message]] = None,
     system_prompt: Optional[str] = None,
     max_tool_iterations: Optional[int] = None,
     provider: Optional[str] = None,
     model: Optional[str] = None
-) -> Generator[Union[str, tuple[str, list[Message]]], None, None]:
+) -> Generator[StreamYield, None, None]:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/chat.py` around lines 157 - 166, The Generator return type
for chat_with_database_stream is too broad and doesn't include the ('tool_use',
list[str]) intermediate yield; update the annotation to explicitly enumerate all
yielded variants (e.g. include Tuple[Literal['tool_use'], List[str]] and the
existing Tuple[str, List[Message]] plus str) using typing.Tuple, typing.List and
typing.Literal (or a small Union alias) so the signature reads something like
Generator[Union[str, Tuple[Literal['tool_use'], List[str]], Tuple[str,
List[Message]]], None, None]; change only the annotation on
chat_with_database_stream and import any required typing names.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@web/pgadmin/llm/providers/anthropic.py`:
- Around line 422-457: The streaming handler inside _parse_response currently
concatenates every text_delta which can collapse boundaries between content
blocks; modify the event handling around event_type == 'content_block_start' /
'content_block_delta' / 'content_block_stop' to preserve block separators by
tracking when a new content block begins (use current_tool_block or a new flag
like in_block) and, if content_parts already contains data, emit a separator
(e.g., '\n\n' or the same separator used when the top-level response joins
blocks) before yielding subsequent text deltas; ensure this logic integrates
with the existing variables content_parts, tool_input_json and yield text so the
final LLMResponse reconstruction retains original block boundaries.

In `@web/pgadmin/llm/providers/docker.py`:
- Around line 535-566: The streaming finalizer currently always yields an
LLMResponse (the block constructing content, tool_calls, stop_reason and
yielding LLMResponse), which can convert stream failures into empty assistant
messages; change it to mirror _parse_response() safeguards by: after building
content, tool_calls (using the same json.loads with JSONDecodeError fallback as
used for ToolCall construction) and mapping finish_reason via stop_reason_map,
detect error/empty-stream conditions (no content and no tool_calls OR
finish_reason indicates an error/blank result) and raise LLMClientError instead
of yielding an empty LLMResponse; otherwise yield the LLMResponse as before,
keeping the same fields (content, tool_calls, stop_reason, model, usage).
- Around line 411-423: The code currently uses self._api_url directly for
server-side requests; add a shared helper (e.g., validate_loopback_api_url(url))
and call it from both the sync and streaming request paths before constructing
the urllib Request. The helper should parse the URL (urllib.parse.urlparse),
ensure scheme is 'http' or 'https', and ensure the netloc hostname is loopback
(127.0.0.1, ::1, or 'localhost' and their bracketed IPv6 forms) — otherwise
raise ValueError or return an error so the request is rejected; update the
places that use self._api_url (the streaming path and the sync path that builds
url = f'{self._api_url}/engines/v1/chat/completions') to call this helper and
fail-fast if validation fails.

In `@web/pgadmin/llm/providers/openai.py`:
- Around line 524-556: The streaming finalizer currently always yields an
LLMResponse even if the stream produced no usable output; mirror the safeguards
in _parse_response() by validating content_parts, tool_calls_data and
finish_reason before yielding: after building content = ''.join(content_parts)
and tool_calls from tool_calls_data, if both content is empty and tool_calls is
empty or the finish_reason indicates an error, raise the same exception
type/logic used by _parse_response() (instead of yielding) so callers receive an
error; use the same variables (content_parts, tool_calls_data, finish_reason,
model_name, usage) and only yield LLMResponse when those checks pass.

In `@web/pgadmin/tools/sqleditor/__init__.py`:
- Around line 2929-2931: The current join that builds the variable sql from
sql_blocks uses "';\n\n'.join(block.strip() for block in sql_blocks)" which can
produce double-semicolons if blocks already end with ';'; update the logic that
constructs sql to first strip trailing semicolons and whitespace from each block
(e.g. normalize each block with something like
block.rstrip().rstrip(';').strip()) before joining, then join with a single
';\n\n' to ensure only a single semicolon separates blocks; modify the
assignment that computes sql (the sql and sql_blocks variables) accordingly.

In `@web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py`:
- Around line 193-201: The SQL extraction is joining blocks that already include
trailing semicolons, causing duplicated semicolons; update the join that assigns
to the variable named sql (the expression iterating over sql_blocks) to remove
trailing semicolons from each block before joining (e.g., call strip() then
rstrip(';') on each block) so the final join uses cleaned blocks separated by
';\n\n'.

---

Nitpick comments:
In `@web/pgadmin/llm/chat.py`:
- Around line 157-166: The Generator return type for chat_with_database_stream
is too broad and doesn't include the ('tool_use', list[str]) intermediate yield;
update the annotation to explicitly enumerate all yielded variants (e.g. include
Tuple[Literal['tool_use'], List[str]] and the existing Tuple[str, List[Message]]
plus str) using typing.Tuple, typing.List and typing.Literal (or a small Union
alias) so the signature reads something like Generator[Union[str,
Tuple[Literal['tool_use'], List[str]], Tuple[str, List[Message]]], None, None];
change only the annotation on chat_with_database_stream and import any required
typing names.

In `@web/pgadmin/llm/providers/ollama.py`:
- Around line 321-329: The exception handler in the streaming path currently
raises a new LLMClientError losing the original traceback; update the except
Exception as e block to re-raise the constructed LLMClientError using exception
chaining (raise ... from e) so the original exception context is preserved;
apply this change where the yield from self._process_stream(payload) is wrapped
and the LLMClientError(LLMError(..., provider=self.provider_name)) is created so
the new raise uses "from e" to keep the original traceback.
- Around line 346-364: In _process_stream, the except handlers for
urllib.error.HTTPError and urllib.error.URLError (the blocks that build and
raise LLMClientError wrapping LLMError) are losing the original exception
context; update both raises to use exception chaining (raise LLMClientError(...)
from e) so the original traceback is preserved — specifically modify the
HTTPError handler that parses error_body/error_data and raises
LLMClientError(LLMError(...)) and the URLError handler that raises
LLMClientError(LLMError(...)) to append "from e".

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 45878a7f-70de-4cc0-82bd-4bb0c8b48eb3

📥 Commits

Reviewing files that changed from the base of the PR and between c8bd75c and fc0af6f.

📒 Files selected for processing (14)
  • docs/en_US/release_notes_9_14.rst
  • web/pgadmin/llm/chat.py
  • web/pgadmin/llm/client.py
  • web/pgadmin/llm/prompts/nlq.py
  • web/pgadmin/llm/providers/anthropic.py
  • web/pgadmin/llm/providers/docker.py
  • web/pgadmin/llm/providers/ollama.py
  • web/pgadmin/llm/providers/openai.py
  • web/pgadmin/static/js/Theme/dark.js
  • web/pgadmin/static/js/Theme/high_contrast.js
  • web/pgadmin/static/js/Theme/light.js
  • web/pgadmin/tools/sqleditor/__init__.py
  • web/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsx
  • web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py

dpage and others added 2 commits March 12, 2026 10:50
- Anthropic: preserve separators between text blocks in streaming to
  match _parse_response() behavior.
- Docker: validate that the API URL points to a loopback address to
  constrain the request surface.
- Docker/OpenAI: raise LLMClientError on empty streams instead of
  yielding blank LLMResponse objects, matching non-streaming behavior.
- SQL extraction: strip trailing semicolons before joining blocks to
  avoid double semicolons in output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
web/pgadmin/tools/sqleditor/__init__.py (1)

2875-2877: ⚠️ Potential issue | 🟠 Major

conversation_id is stateless right now.

chat_with_database_stream() returns (response_text, messages), but this handler drops messages and only echoes/generates an ID. Follow-up turns therefore cannot replay prior tool calls/results, so multi-turn NLQ chats will silently lose context. Persist the returned history by conversation_id, or return it to the client and feed it back into conversation_history on the next request.

Also applies to: 2919-2954

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/tools/sqleditor/__init__.py` around lines 2875 - 2877, The
handler currently discards the messages returned by chat_with_database_stream()
and only emits a generated conversation_id, losing tool-call history; update the
request flow in the sqleditor handler to persist or round-trip the returned
conversation history: capture the (response_text, messages) tuple from
chat_with_database_stream(), then either (a) store messages keyed by
conversation_id in a persistent/in-memory store (so future requests can load
conversation_history by conversation_id) or (b) include the messages in the JSON
response to the client and accept conversation_history on the next request;
apply the same change to the identical logic block referenced around lines
2919-2954 so conversation_history is preserved across turns (use the symbols
conversation_id, conversation_history, and chat_with_database_stream to locate
and modify the code).
♻️ Duplicate comments (1)
web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py (1)

185-242: ⚠️ Potential issue | 🟠 Major

Mirror the production SQL normalization in this test.

This helper still uses block.strip(), while the endpoint now uses block.strip().rstrip(';'), which is why CI is hitting SELECT * FROM users;;. Update the join here to match production and drop the trailing-semicolon expectations for the single/final block cases as well.

🔧 Proposed fix
         ('SQL Extraction - Single SQL block', dict(
             response_text=(
                 'Here is the query:\n\n'
                 '```sql\nSELECT * FROM users;\n```\n\n'
                 'This returns all users.'
             ),
-            expected_sql='SELECT * FROM users;'
+            expected_sql='SELECT * FROM users'
         )),
@@
         ('SQL Extraction - pgsql language tag', dict(
             response_text='```pgsql\nSELECT 1;\n```',
-            expected_sql='SELECT 1;'
+            expected_sql='SELECT 1'
         )),
         ('SQL Extraction - postgresql language tag', dict(
             response_text='```postgresql\nSELECT 1;\n```',
-            expected_sql='SELECT 1;'
+            expected_sql='SELECT 1'
         )),
@@
             expected_sql=(
                 'SELECT u.name, o.total\n'
                 'FROM users u\n'
                 'JOIN orders o ON u.id = o.user_id\n'
-                'WHERE o.total > 100;'
+                'WHERE o.total > 100'
             )
         )),
@@
         sql = ';\n\n'.join(
-            block.strip() for block in sql_blocks
+            block.strip().rstrip(';') for block in sql_blocks
         ) if sql_blocks else None

Also applies to: 253-261

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py` around lines 185 - 242,
The test expectations and the SQL-joining helper in test_nlq_chat.py must mirror
production normalization: change the join that builds sql from sql_blocks to
strip trailing semicolons (use block.strip().rstrip(';') when joining into the
variable sql) and update expected_sql strings for the cases 'SQL Extraction -
Single SQL block', 'SQL Extraction - pgsql language tag', 'SQL Extraction -
postgresql language tag' and the multiline case so they no longer include a
trailing semicolon (e.g. 'SELECT * FROM users' instead of 'SELECT * FROM
users;'); apply the same change to the other helper occurrence referenced around
lines 253-261.
🧹 Nitpick comments (11)
web/pgadmin/llm/providers/openai.py (3)

383-387: Preserve exception chain for debugging.

When re-raising exceptions, use raise ... from e to preserve the original traceback. This pattern appears multiple times in the streaming methods.

♻️ Suggested fix
         except Exception as e:
             raise LLMClientError(LLMError(
-                message=f"Streaming request failed: {str(e)}",
+                message=f"Streaming request failed: {e!s}",
                 provider=self.provider_name
-            ))
+            )) from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/providers/openai.py` around lines 383 - 387, The exception
handling in the OpenAI streaming code wraps errors into LLMClientError/LLMError
but drops the original traceback; update the except blocks (e.g. the handler
that currently does "except Exception as e: raise LLMClientError(LLMError(...))"
which references self.provider_name and constructs LLMError) to re-raise using
"raise LLMClientError(LLMError(...)) from e" so the original exception chain is
preserved; apply the same change to all similar streaming-method handlers in
this file that catch Exception and raise LLMClientError.

550-555: The empty response safeguard is implemented - consider aligning error detail with non-streaming path.

The check for empty responses now correctly raises an error, addressing the previous review concern. However, the non-streaming _parse_response provides more specific error messages based on finish_reason (e.g., detailed guidance when MAX_TOKENS is hit about model context limits). Consider mirroring that logic for consistency:

♻️ Optional: align error handling with _parse_response
         if not content and not tool_calls:
+            if stop_reason == StopReason.MAX_TOKENS:
+                raise LLMClientError(LLMError(
+                    message=f'Response truncated due to token limit '
+                            f'(input: {usage.input_tokens} tokens). '
+                            f'The request is too large for model '
+                            f'{self._model}. '
+                            f'Try using a model with a larger context '
+                            f'window, or analyze a smaller scope.',
+                    code='max_tokens',
+                    provider=self.provider_name,
+                    retryable=False
+                ))
+            elif finish_reason and finish_reason not in ('stop', 'tool_calls'):
+                raise LLMClientError(LLMError(
+                    message=f'Empty response with finish reason: {finish_reason}',
+                    code=finish_reason,
+                    provider=self.provider_name,
+                    retryable=False
+                ))
             raise LLMClientError(LLMError(
                 message='No response content returned from API',
                 provider=self.provider_name,
                 retryable=False
             ))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/providers/openai.py` around lines 550 - 555, The
empty-response check currently raises a generic LLMClientError; update it to
mirror the non-streaming _parse_response logic by inspecting the response's
finish_reason (and any available model error metadata) before raising so you
emit the same detailed guidance (e.g., MAX_TOKENS / context-limit hints) and
include provider=self.provider_name and any diagnostic fields; modify the block
that raises LLMClientError (the conditional that checks content and tool_calls)
to build the LLMError message using the same finish_reason-driven messages and
details used in _parse_response so streaming and non-streaming paths produce
consistent errors.

418-436: Add exception chaining to preserve tracebacks.

Similar to the catch-all handler above, these exception handlers should chain the original exception.

♻️ Suggested fixes
         except urllib.error.HTTPError as e:
             # ... error parsing ...
             raise LLMClientError(LLMError(
                 message=error_msg,
                 code=str(e.code),
                 provider=self.provider_name,
                 retryable=e.code in (429, 500, 502, 503, 504)
-            ))
+            )) from e
         except urllib.error.URLError as e:
             raise LLMClientError(LLMError(
                 message=f"Connection error: {e.reason}",
                 provider=self.provider_name,
                 retryable=True
-            ))
+            )) from e
         except socket.timeout:
             raise LLMClientError(LLMError(
                 message="Request timed out.",
                 code='timeout',
                 provider=self.provider_name,
                 retryable=True
-            ))
+            )) from None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/providers/openai.py` around lines 418 - 436, The exception
handlers that raise LLMClientError/LLMError should preserve original tracebacks
by using exception chaining: in the HTTP/URLError handlers add "raise
LLMClientError(... ) from e" (they already reference e), and change "except
socket.timeout:" to "except socket.timeout as e:" then raise the LLMClientError
with "from e". Ensure this is applied to the handlers that build LLMError (the
ones referencing e.code, the urllib.error.URLError handler, and the
socket.timeout handler) so the original exception is chained to the new
LLMClientError.
web/pgadmin/llm/providers/docker.py (5)

51-64: ValueError propagation may be inconsistent with caller expectations.

From the context in client.py, get_llm_client() instantiates DockerClient without catching ValueError. Other error paths in this module raise LLMClientError. A ValueError here will propagate as an unexpected exception type to higher-level handlers.

Consider wrapping the validation in __init__ to convert ValueError to LLMClientError, or catch it in get_llm_client() for consistency.

♻️ Option: Wrap in __init__ for consistency
         self._api_url = (api_url or DEFAULT_API_URL).rstrip('/')
-        _validate_loopback_url(self._api_url)
+        try:
+            _validate_loopback_url(self._api_url)
+        except ValueError as e:
+            raise LLMClientError(LLMError(
+                message=str(e),
+                provider='docker',
+                retryable=False
+            )) from e
         self._model = model or DEFAULT_MODEL
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/providers/docker.py` around lines 51 - 64, The
_validate_loopback_url function currently raises ValueError which can leak to
callers; convert that into the module's LLMClientError before it escapes by
catching ValueError in the DockerClient initialization (DockerClient.__init__)
or by having get_llm_client() wrap the call that triggers _validate_loopback_url
in a try/except that catches ValueError and re-raises LLMClientError with a
clear message; ensure you reference _validate_loopback_url, DockerClient,
get_llm_client, and LLMClientError so callers always receive LLMClientError for
validation failures.

418-422: Add exception chaining for better debugging.

Use raise ... from e to preserve the original exception traceback, which helps with debugging.

♻️ Proposed fix
         except Exception as e:
             raise LLMClientError(LLMError(
-                message=f"Streaming request failed: {str(e)}",
+                message=f"Streaming request failed: {e!s}",
                 provider=self.provider_name
-            ))
+            )) from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/providers/docker.py` around lines 418 - 422, The except block
that currently raises LLMClientError(LLMError(...)) should preserve the original
exception traceback by using exception chaining; modify the raise to use "raise
LLMClientError(LLMError(message=f'Streaming request failed: {str(e)}',
provider=self.provider_name)) from e" so the original exception e is attached
for better debugging.

515-521: Simplify redundant key check.

The if 'usage' in data and data['usage'] check is redundant. Using data.get('usage') achieves the same result more concisely.

♻️ Proposed fix
-            if 'usage' in data and data['usage']:
-                u = data['usage']
+            usage_data = data.get('usage')
+            if usage_data:
                 usage = Usage(
-                    input_tokens=u.get('prompt_tokens', 0),
-                    output_tokens=u.get('completion_tokens', 0),
-                    total_tokens=u.get('total_tokens', 0)
+                    input_tokens=usage_data.get('prompt_tokens', 0),
+                    output_tokens=usage_data.get('completion_tokens', 0),
+                    total_tokens=usage_data.get('total_tokens', 0)
                 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/providers/docker.py` around lines 515 - 521, Replace the
redundant check "if 'usage' in data and data['usage']" with a single call using
data.get('usage') to retrieve the usage dict (e.g., u = data.get('usage')), then
proceed to construct the Usage(...) only when u is truthy; update the block
around the variables data, u, and Usage to use this simplified check so behavior
remains identical but the code is more concise.

454-474: Add exception chaining in error handlers.

Multiple except blocks re-raise without preserving the original exception chain. Adding from e or from None improves debuggability.

♻️ Proposed fix for exception chaining
             raise LLMClientError(LLMError(
                 message=error_msg,
                 code=str(e.code),
                 provider=self.provider_name,
                 retryable=e.code in (429, 500, 502, 503, 504)
-            ))
+            )) from e
         except urllib.error.URLError as e:
             raise LLMClientError(LLMError(
                 message=f"Connection error: {e.reason}. "
                         f"Is Docker Model Runner running at "
                         f"{self._api_url}?",
                 provider=self.provider_name,
                 retryable=True
-            ))
+            )) from e
         except socket.timeout:
             raise LLMClientError(LLMError(
                 message="Request timed out.",
                 code='timeout',
                 provider=self.provider_name,
                 retryable=True
-            ))
+            )) from None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/providers/docker.py` around lines 454 - 474, The except
blocks that raise LLMClientError wrapping an LLMError (the urllib.error.URLError
and socket.timeout handlers and the earlier exception handler that uses variable
e) must preserve exception chaining so the original traceback is kept; update
each raise to use exception chaining (e.g., raise LLMClientError(LLMError(...))
from e for handlers that capture exception as e, and use an explicit "from None"
only where you intentionally want to suppress context), targeting the raise
statements that construct LLMClientError/LLMError in the docker provider method
where urllib.error.URLError, socket.timeout, and the earlier exception variable
e are handled.

581-594: Consider mirroring the MAX_TOKENS truncation check from _parse_response.

The empty-stream check is good, but _parse_response() (lines 344-372) has additional logic that distinguishes MAX_TOKENS truncation with a specific error message including token counts. The streaming path could benefit from similar handling for better user feedback when responses are truncated.

♻️ Example enhancement
         if not content and not tool_calls:
+            if stop_reason == StopReason.MAX_TOKENS:
+                raise LLMClientError(LLMError(
+                    message=(
+                        f'Response truncated due to token limit '
+                        f'(input: {usage.input_tokens} tokens). '
+                        f'Try using a model with a larger context '
+                        f'window, or analyze a smaller scope.'
+                    ),
+                    code='max_tokens',
+                    provider=self.provider_name,
+                    retryable=False
+                ))
             raise LLMClientError(LLMError(
                 message='No response content returned from API',
                 provider=self.provider_name,
                 retryable=False
             ))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/providers/docker.py` around lines 581 - 594, When handling
the streaming path after the empty-stream check, mirror the MAX_TOKENS
truncation handling from _parse_response by detecting when stop_reason (or
usage) indicates the model hit the token limit (compare against MAX_TOKENS) and
surface the same specific LLMClientError message that includes token counts;
update the block that currently yields LLMResponse (referencing content,
tool_calls, stop_reason, model_name, usage) to either raise LLMClientError with
the detailed "truncated by MAX_TOKENS" message (like _parse_response) or augment
the yielded response to include that truncation detail so callers get identical
feedback. Ensure you use the same MAX_TOKENS constant and message format as in
_parse_response and preserve existing behavior for non-truncated responses.
web/pgadmin/llm/providers/anthropic.py (2)

308-314: Add exception chaining to preserve traceback context.

When re-raising exceptions, use from e (or from None if intentionally suppressing) to maintain the exception chain. This helps with debugging by preserving the original traceback.

♻️ Proposed fix
         try:
             yield from self._process_stream(payload)
         except LLMClientError:
             raise
         except Exception as e:
             raise LLMClientError(LLMError(
                 message=f"Streaming request failed: {e}",
                 provider=self.provider_name
-            ))
+            )) from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/providers/anthropic.py` around lines 308 - 314, The except
block catching Exception in the Anthropc provider currently raises a new
LLMClientError without chaining the original exception; update the catch in the
method handling streaming (the block referencing LLMClientError, LLMError and
self.provider_name) to re-raise the new LLMClientError using "from e" (i.e.,
raise LLMClientError(LLMError(...)) from e) so the original traceback is
preserved for debugging.

337-364: Add exception chaining to all exception handlers.

All three exception handlers re-raise without chaining, which loses the original traceback context.

♻️ Proposed fix
         except urllib.error.HTTPError as e:
             error_body = e.read().decode('utf-8')
             try:
                 error_data = json.loads(error_body)
                 error_msg = error_data.get(
                     'error', {}
                 ).get('message', str(e))
             except json.JSONDecodeError:
                 error_msg = error_body or str(e)
             raise LLMClientError(LLMError(
                 message=error_msg,
                 code=str(e.code),
                 provider=self.provider_name,
                 retryable=e.code in (429, 500, 502, 503, 504)
-            ))
+            )) from e
         except urllib.error.URLError as e:
             raise LLMClientError(LLMError(
                 message=f"Connection error: {e.reason}",
                 provider=self.provider_name,
                 retryable=True
-            ))
+            )) from e
         except socket.timeout:
             raise LLMClientError(LLMError(
                 message="Request timed out.",
                 code='timeout',
                 provider=self.provider_name,
                 retryable=True
-            ))
+            )) from None

Note: For socket.timeout, use from None since the original exception doesn't add useful context (the message already explains it's a timeout).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/providers/anthropic.py` around lines 337 - 364, The exception
handlers for urllib.error.HTTPError, urllib.error.URLError, and socket.timeout
in the anthropic provider currently re-raise LLMClientError without chaining the
original exception and thus lose traceback; update the three raises so the
HTTPError and URLError handlers use "raise LLMClientError(... ) from e"
(referencing urllib.error.HTTPError and urllib.error.URLError and the
LLMClientError/LLMError construction and self.provider_name) and change the
socket.timeout handler to "raise LLMClientError(... ) from None" per the note so
the original exception context is preserved or suppressed appropriately.
web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py (1)

326-332: Add a tool_use streaming scenario.

This generator never emits ('tool_use', ...), so the new branch that should surface a second thinking event is still untested. One scenario that injects a tool-use tuple between chunks would lock down the SSE protocol added in this PR.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py` around lines 326 - 332,
The mock_stream_gen generator never emits a ('tool_use', ...) tuple so the new
branch that should produce a second "thinking" event is not exercised; update
mock_stream_gen to yield the same text chunks but inject a ('tool_use', <tool
payload>) tuple between chunk yields (e.g., after the first or second chunk)
before yielding the final (self.mock_response, []) tuple so the test for the SSE
tool_use streaming path and the extra thinking event is exercised; refer to
mock_stream_gen and the existing yields to locate where to insert the
('tool_use', ...) emission.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@web/pgadmin/tools/sqleditor/__init__.py`:
- Around line 2875-2877: The handler currently discards the messages returned by
chat_with_database_stream() and only emits a generated conversation_id, losing
tool-call history; update the request flow in the sqleditor handler to persist
or round-trip the returned conversation history: capture the (response_text,
messages) tuple from chat_with_database_stream(), then either (a) store messages
keyed by conversation_id in a persistent/in-memory store (so future requests can
load conversation_history by conversation_id) or (b) include the messages in the
JSON response to the client and accept conversation_history on the next request;
apply the same change to the identical logic block referenced around lines
2919-2954 so conversation_history is preserved across turns (use the symbols
conversation_id, conversation_history, and chat_with_database_stream to locate
and modify the code).

---

Duplicate comments:
In `@web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py`:
- Around line 185-242: The test expectations and the SQL-joining helper in
test_nlq_chat.py must mirror production normalization: change the join that
builds sql from sql_blocks to strip trailing semicolons (use
block.strip().rstrip(';') when joining into the variable sql) and update
expected_sql strings for the cases 'SQL Extraction - Single SQL block', 'SQL
Extraction - pgsql language tag', 'SQL Extraction - postgresql language tag' and
the multiline case so they no longer include a trailing semicolon (e.g. 'SELECT
* FROM users' instead of 'SELECT * FROM users;'); apply the same change to the
other helper occurrence referenced around lines 253-261.

---

Nitpick comments:
In `@web/pgadmin/llm/providers/anthropic.py`:
- Around line 308-314: The except block catching Exception in the Anthropc
provider currently raises a new LLMClientError without chaining the original
exception; update the catch in the method handling streaming (the block
referencing LLMClientError, LLMError and self.provider_name) to re-raise the new
LLMClientError using "from e" (i.e., raise LLMClientError(LLMError(...)) from e)
so the original traceback is preserved for debugging.
- Around line 337-364: The exception handlers for urllib.error.HTTPError,
urllib.error.URLError, and socket.timeout in the anthropic provider currently
re-raise LLMClientError without chaining the original exception and thus lose
traceback; update the three raises so the HTTPError and URLError handlers use
"raise LLMClientError(... ) from e" (referencing urllib.error.HTTPError and
urllib.error.URLError and the LLMClientError/LLMError construction and
self.provider_name) and change the socket.timeout handler to "raise
LLMClientError(... ) from None" per the note so the original exception context
is preserved or suppressed appropriately.

In `@web/pgadmin/llm/providers/docker.py`:
- Around line 51-64: The _validate_loopback_url function currently raises
ValueError which can leak to callers; convert that into the module's
LLMClientError before it escapes by catching ValueError in the DockerClient
initialization (DockerClient.__init__) or by having get_llm_client() wrap the
call that triggers _validate_loopback_url in a try/except that catches
ValueError and re-raises LLMClientError with a clear message; ensure you
reference _validate_loopback_url, DockerClient, get_llm_client, and
LLMClientError so callers always receive LLMClientError for validation failures.
- Around line 418-422: The except block that currently raises
LLMClientError(LLMError(...)) should preserve the original exception traceback
by using exception chaining; modify the raise to use "raise
LLMClientError(LLMError(message=f'Streaming request failed: {str(e)}',
provider=self.provider_name)) from e" so the original exception e is attached
for better debugging.
- Around line 515-521: Replace the redundant check "if 'usage' in data and
data['usage']" with a single call using data.get('usage') to retrieve the usage
dict (e.g., u = data.get('usage')), then proceed to construct the Usage(...)
only when u is truthy; update the block around the variables data, u, and Usage
to use this simplified check so behavior remains identical but the code is more
concise.
- Around line 454-474: The except blocks that raise LLMClientError wrapping an
LLMError (the urllib.error.URLError and socket.timeout handlers and the earlier
exception handler that uses variable e) must preserve exception chaining so the
original traceback is kept; update each raise to use exception chaining (e.g.,
raise LLMClientError(LLMError(...)) from e for handlers that capture exception
as e, and use an explicit "from None" only where you intentionally want to
suppress context), targeting the raise statements that construct
LLMClientError/LLMError in the docker provider method where
urllib.error.URLError, socket.timeout, and the earlier exception variable e are
handled.
- Around line 581-594: When handling the streaming path after the empty-stream
check, mirror the MAX_TOKENS truncation handling from _parse_response by
detecting when stop_reason (or usage) indicates the model hit the token limit
(compare against MAX_TOKENS) and surface the same specific LLMClientError
message that includes token counts; update the block that currently yields
LLMResponse (referencing content, tool_calls, stop_reason, model_name, usage) to
either raise LLMClientError with the detailed "truncated by MAX_TOKENS" message
(like _parse_response) or augment the yielded response to include that
truncation detail so callers get identical feedback. Ensure you use the same
MAX_TOKENS constant and message format as in _parse_response and preserve
existing behavior for non-truncated responses.

In `@web/pgadmin/llm/providers/openai.py`:
- Around line 383-387: The exception handling in the OpenAI streaming code wraps
errors into LLMClientError/LLMError but drops the original traceback; update the
except blocks (e.g. the handler that currently does "except Exception as e:
raise LLMClientError(LLMError(...))" which references self.provider_name and
constructs LLMError) to re-raise using "raise LLMClientError(LLMError(...)) from
e" so the original exception chain is preserved; apply the same change to all
similar streaming-method handlers in this file that catch Exception and raise
LLMClientError.
- Around line 550-555: The empty-response check currently raises a generic
LLMClientError; update it to mirror the non-streaming _parse_response logic by
inspecting the response's finish_reason (and any available model error metadata)
before raising so you emit the same detailed guidance (e.g., MAX_TOKENS /
context-limit hints) and include provider=self.provider_name and any diagnostic
fields; modify the block that raises LLMClientError (the conditional that checks
content and tool_calls) to build the LLMError message using the same
finish_reason-driven messages and details used in _parse_response so streaming
and non-streaming paths produce consistent errors.
- Around line 418-436: The exception handlers that raise LLMClientError/LLMError
should preserve original tracebacks by using exception chaining: in the
HTTP/URLError handlers add "raise LLMClientError(... ) from e" (they already
reference e), and change "except socket.timeout:" to "except socket.timeout as
e:" then raise the LLMClientError with "from e". Ensure this is applied to the
handlers that build LLMError (the ones referencing e.code, the
urllib.error.URLError handler, and the socket.timeout handler) so the original
exception is chained to the new LLMClientError.

In `@web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py`:
- Around line 326-332: The mock_stream_gen generator never emits a ('tool_use',
...) tuple so the new branch that should produce a second "thinking" event is
not exercised; update mock_stream_gen to yield the same text chunks but inject a
('tool_use', <tool payload>) tuple between chunk yields (e.g., after the first
or second chunk) before yielding the final (self.mock_response, []) tuple so the
test for the SSE tool_use streaming path and the extra thinking event is
exercised; refer to mock_stream_gen and the existing yields to locate where to
insert the ('tool_use', ...) emission.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 99402f45-993b-4c75-a066-600d0fd81ef5

📥 Commits

Reviewing files that changed from the base of the PR and between fc0af6f and 04b846c.

📒 Files selected for processing (5)
  • web/pgadmin/llm/providers/anthropic.py
  • web/pgadmin/llm/providers/docker.py
  • web/pgadmin/llm/providers/openai.py
  • web/pgadmin/tools/sqleditor/__init__.py
  • web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py

@dpage dpage force-pushed the fix_assistant_streaming_rendering branch from 04b846c to 82a9bc6 Compare March 12, 2026 10:50
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (1)
web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py (1)

259-261: ⚠️ Potential issue | 🟡 Minor

Keep the test extractor in sync with the endpoint.

The helper here still joins raw blocks, so the multi-block case reintroduces the double-semicolon output and the pipeline is already failing on it. Mirror the production cleanup before joining.

🧪 Suggested fix
-        sql = ';\n\n'.join(
-            block.strip() for block in sql_blocks
-        ) if sql_blocks else None
+        sql = ';\n\n'.join(
+            block.strip().rstrip(';') for block in sql_blocks
+        ) if sql_blocks else None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py` around lines 259 - 261,
The test helper constructs sql from sql_blocks but doesn't apply the production
cleanup, causing double-semicolons; update the logic that builds sql (the sql
variable from sql_blocks) to mirror the endpoint's cleanup: trim each block,
remove any trailing semicolons/empty blocks (e.g., strip() then rstrip(';') and
skip empties), then join the cleaned blocks with ';\n\n' (or call the shared
cleanup helper if one exists) so the multi-block output matches production.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@web/pgadmin/llm/chat.py`:
- Around line 166-179: The current generator emits ambiguous 2-tuples for both
tool events and final completions which causes misclassification (e.g., in
nlq_chat_stream); change the event shapes so they are distinct: emit a
clearly-typed tool event like ('tool_use', tool_events_list) and a distinct
completion event like ('complete', final_text, updated_messages) (or use a
simple Event namedtuple/dataclass) wherever the generator yields tool events and
the final result (references: the generator function in web/pgadmin/llm/chat.py
that yields tool-use tuples and the nlq_chat_stream consumer); update consumers
to match the new 3-tuple completion shape and handle the explicit 'tool_use' and
'complete' tags accordingly.

In `@web/pgadmin/tools/sqleditor/__init__.py`:
- Around line 2896-2903: The call to chat_with_database_stream currently omits
the conversation history so follow-up NLQ turns lose context; update the call to
pass the request's history into the conversation_history parameter (e.g.,
conversation_history=request.history or the existing history variable in scope)
alongside user_message, sid, did, and system_prompt so chat_with_database_stream
receives prior messages for proper context.
- Around line 2924-2931: The regex used in the re.findall call that populates
sql_blocks is case-sensitive and misses fenced blocks with uppercase fence tags;
update the re.findall invocation in __init__.py (the call that assigns
sql_blocks) to add the re.IGNORECASE flag alongside re.DOTALL so the pattern
(?:sql|pgsql|postgresql)\s*\n(.*?)``` matches tags like SQL/POSTGRESQL as well,
then leave the subsequent sql construction (the sql variable) unchanged.

In `@web/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsx`:
- Around line 874-880: The current streaming branch inside the component uses
streamingTextRef.current and setMessages to replace the assistant message by
appending '\n\n(Generation stopped)' to the partial markdown, which can end up
inside an open fenced code block; instead preserve the partial response exactly
(do not append the stop notice to streamingTextRef.current) and call setMessages
twice (or add two entries) so you first update/commit the assistant message with
content: streamingTextRef.current and type MESSAGE_TYPES.ASSISTANT (filtering
out thinkingId and streamId as before), then append a separate assistant-status
message whose content is gettext('(Generation stopped)') (or use a distinct type
if you prefer) so the stop marker is rendered as a standalone message; apply the
same pattern to the analogous block handling stream end around
streamId/thinkingId (the other occurrence at the 905-910 region).
- Around line 552-556: The MarkdownContent is being forced into a span
(component="span") while renderMarkdownText(content) can output block-level
HTML, causing invalid DOM; change MarkdownContent in NLQChatPanel.jsx to render
as a block-level container (remove or set component to a block element) so block
nodes (paragraphs, lists, tables) are not injected into a span, and render the
streaming cursor separately (keep the Box wrapper and the cursor rendering logic
outside/adjacent to MarkdownContent) so streaming markdown displays correctly.
- Around line 1035-1043: Snapshot streamingIdRef.current into a local (e.g.,
const streamingId = streamingIdRef.current) before calling setMessages so the
functional updater filters using that captured id (alongside thinkingId) instead
of reading streamingIdRef.current later; then update setMessages to filter out
that local streamingId and add the error message (MESSAGE_TYPES.ERROR), and only
after setMessages clear streamingTextRef.current and set streamingIdRef.current
= null to keep behavior consistent with the abort/complete paths.

---

Duplicate comments:
In `@web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py`:
- Around line 259-261: The test helper constructs sql from sql_blocks but
doesn't apply the production cleanup, causing double-semicolons; update the
logic that builds sql (the sql variable from sql_blocks) to mirror the
endpoint's cleanup: trim each block, remove any trailing semicolons/empty blocks
(e.g., strip() then rstrip(';') and skip empties), then join the cleaned blocks
with ';\n\n' (or call the shared cleanup helper if one exists) so the
multi-block output matches production.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 03a8611a-98cf-454d-94d9-14ef26b9bbe7

📥 Commits

Reviewing files that changed from the base of the PR and between 04b846c and 82a9bc6.

📒 Files selected for processing (14)
  • docs/en_US/release_notes_9_14.rst
  • web/pgadmin/llm/chat.py
  • web/pgadmin/llm/client.py
  • web/pgadmin/llm/prompts/nlq.py
  • web/pgadmin/llm/providers/anthropic.py
  • web/pgadmin/llm/providers/docker.py
  • web/pgadmin/llm/providers/ollama.py
  • web/pgadmin/llm/providers/openai.py
  • web/pgadmin/static/js/Theme/dark.js
  • web/pgadmin/static/js/Theme/high_contrast.js
  • web/pgadmin/static/js/Theme/light.js
  • web/pgadmin/tools/sqleditor/__init__.py
  • web/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsx
  • web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py
✅ Files skipped from review due to trivial changes (1)
  • docs/en_US/release_notes_9_14.rst
🚧 Files skipped from review as they are similar to previous changes (3)
  • web/pgadmin/static/js/Theme/high_contrast.js
  • web/pgadmin/static/js/Theme/light.js
  • web/pgadmin/llm/client.py

Comment on lines +166 to +179
) -> Generator[Union[str, tuple[str, list[Message]]], None, None]:
"""
Stream an LLM chat conversation with database tool access.

Like chat_with_database, but yields text chunks as the final
response streams in. During tool-use iterations, no text is
yielded (tools are executed silently).

Yields:
str: Text content chunks from the final LLM response.

The last item yielded is a tuple of
(final_response_text, updated_conversation_history).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don't overload 2-tuples for both tool and completion events.

('tool_use', [...]) and (response_text, messages) share the same shape, so callers have to guess which one they got by inspecting item[0]. In nlq_chat_stream, a legitimate final response whose content is exactly "tool_use" would be misclassified and dropped. A small typed event, or at least a distinct completion shape like ('complete', text, messages), would remove that ambiguity.

Also applies to: 227-229

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/chat.py` around lines 166 - 179, The current generator emits
ambiguous 2-tuples for both tool events and final completions which causes
misclassification (e.g., in nlq_chat_stream); change the event shapes so they
are distinct: emit a clearly-typed tool event like ('tool_use',
tool_events_list) and a distinct completion event like ('complete', final_text,
updated_messages) (or use a simple Event namedtuple/dataclass) wherever the
generator yields tool events and the final result (references: the generator
function in web/pgadmin/llm/chat.py that yields tool-use tuples and the
nlq_chat_stream consumer); update consumers to match the new 3-tuple completion
shape and handle the explicit 'tool_use' and 'complete' tags accordingly.

Comment on lines +2896 to +2903
# Stream the LLM response with database tools
response_text = ''
for item in chat_with_database_stream(
user_message=user_message,
sid=trans_obj.sid,
did=trans_obj.did,
system_prompt=NLQ_SYSTEM_PROMPT
)

# Try to parse the response as JSON
sql = None
explanation = ''

# First, try to extract JSON from markdown code blocks
json_text = response_text.strip()

# Look for ```json ... ``` blocks
json_match = re.search(
r'```json\s*\n?(.*?)\n?```',
json_text,
):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Pass request history into chat_with_database_stream().

This new streaming call always starts from a fresh message list because it never forwards the request's history. That drops context on follow-up NLQ turns even though this endpoint advertises history and chat_with_database_stream() already supports conversation_history.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/tools/sqleditor/__init__.py` around lines 2896 - 2903, The call
to chat_with_database_stream currently omits the conversation history so
follow-up NLQ turns lose context; update the call to pass the request's history
into the conversation_history parameter (e.g.,
conversation_history=request.history or the existing history variable in scope)
alongside user_message, sid, did, and system_prompt so chat_with_database_stream
receives prior messages for proper context.

Comment on lines +2924 to +2931
sql_blocks = re.findall(
r'```(?:sql|pgsql|postgresql)\s*\n(.*?)```',
response_text,
re.DOTALL
)
if json_match:
json_text = json_match.group(1).strip()
else:
# Also try to find a plain JSON object in the response
# Look for {"sql": ... } pattern anywhere in the text
sql_pattern = (
r'\{["\']?sql["\']?\s*:\s*'
r'(?:null|"[^"]*"|\'[^\']*\').*?\}'
)
plain_json_match = re.search(sql_pattern, json_text, re.DOTALL)
if plain_json_match:
json_text = plain_json_match.group(0)

try:
result = json.loads(json_text)
sql = result.get('sql')
explanation = result.get('explanation', '')
except (json.JSONDecodeError, TypeError):
# If not valid JSON, try to extract SQL from the response
# Look for SQL code blocks first
sql_match = re.search(
r'```sql\s*\n?(.*?)\n?```',
response_text,
re.DOTALL
)
if sql_match:
sql = sql_match.group(1).strip()
else:
# Check for malformed tool call text patterns
# Some models output tool calls as text instead of
# proper tool use blocks
tool_call_match = re.search(
r'<function=execute_sql_query>\s*'
r'<parameter=query>\s*(.*?)\s*</parameter>',
response_text,
re.DOTALL
)
if tool_call_match:
sql = tool_call_match.group(1).strip()
explanation = gettext(
'Generated SQL query from your request.'
)
else:
# No parseable JSON or SQL block found
# Treat the response as an explanation/error message
explanation = response_text.strip()
# Don't set sql - leave it as None
sql = ';\n\n'.join(
block.strip().rstrip(';') for block in sql_blocks
) if sql_blocks else None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle uppercase fence tags when extracting sql.

This regex is case-sensitive, so fenced SQL/PostgreSQL blocks still render fine in content but leave the structured sql field empty. Adding re.IGNORECASE here makes extraction much more tolerant of normal model variation.

💡 Suggested tweak
             sql_blocks = re.findall(
                 r'```(?:sql|pgsql|postgresql)\s*\n(.*?)```',
                 response_text,
-                re.DOTALL
+                re.DOTALL | re.IGNORECASE
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/tools/sqleditor/__init__.py` around lines 2924 - 2931, The regex
used in the re.findall call that populates sql_blocks is case-sensitive and
misses fenced blocks with uppercase fence tags; update the re.findall invocation
in __init__.py (the call that assigns sql_blocks) to add the re.IGNORECASE flag
alongside re.DOTALL so the pattern (?:sql|pgsql|postgresql)\s*\n(.*?)``` matches
tags like SQL/POSTGRESQL as well, then leave the subsequent sql construction
(the sql variable) unchanged.

Comment on lines +552 to +556
<Box key={idx} sx={{ marginTop: idx > 0 ? 1 : 0, display: 'inline' }}>
<MarkdownContent
component="span"
dangerouslySetInnerHTML={{ __html: renderMarkdownText(content || '') }}
/>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Render streamed markdown in a block container, not a span.

renderMarkdownText() can produce block nodes like paragraphs, lists, and tables. On Lines 553-555 those nodes are injected into component="span", which creates invalid DOM and will misrender/collapse once the model streams anything beyond inline text. Keep MarkdownContent block-level here and render the cursor separately after it.

Proposed fix
-          return (
-            <Box key={idx} sx={{ marginTop: idx > 0 ? 1 : 0, display: 'inline' }}>
-              <MarkdownContent
-                component="span"
-                dangerouslySetInnerHTML={{ __html: renderMarkdownText(content || '') }}
-              />
-              {cursor}
-            </Box>
-          );
+          return (
+            <Box key={idx} sx={{ marginTop: idx > 0 ? 1 : 0 }}>
+              <MarkdownContent
+                dangerouslySetInnerHTML={{ __html: renderMarkdownText(content || '') }}
+              />
+              {cursor}
+            </Box>
+          );
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<Box key={idx} sx={{ marginTop: idx > 0 ? 1 : 0, display: 'inline' }}>
<MarkdownContent
component="span"
dangerouslySetInnerHTML={{ __html: renderMarkdownText(content || '') }}
/>
<Box key={idx} sx={{ marginTop: idx > 0 ? 1 : 0 }}>
<MarkdownContent
dangerouslySetInnerHTML={{ __html: renderMarkdownText(content || '') }}
/>
🧰 Tools
🪛 Biome (2.4.6)

[error] 555-555: Avoid passing content using the dangerouslySetInnerHTML prop.

(lint/security/noDangerouslySetInnerHtml)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsx`
around lines 552 - 556, The MarkdownContent is being forced into a span
(component="span") while renderMarkdownText(content) can output block-level
HTML, causing invalid DOM; change MarkdownContent in NLQChatPanel.jsx to render
as a block-level container (remove or set component to a block element) so block
nodes (paragraphs, lists, tables) are not injected into a span, and render the
streaming cursor separately (keep the Box wrapper and the cursor rendering logic
outside/adjacent to MarkdownContent) so streaming markdown displays correctly.

Comment on lines +874 to +880
if (streamingTextRef.current) {
setMessages((prev) => [
...prev.filter((m) => m.id !== thinkingId && m.id !== streamId),
{
type: MESSAGE_TYPES.ASSISTANT,
content: streamingTextRef.current + '\n\n' + gettext('(Generation stopped)'),
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep the stop notice separate from the partial markdown body.

On Lines 879 and 909, the stop marker is appended to the streamed content itself. If generation stops inside an open fenced block, markdown parsing will treat (Generation stopped) as part of that code block, so the stop state is no longer rendered as a status message. Preserve the partial response unchanged and add the stop notice as a separate message.

Proposed fix
-            {
-              type: MESSAGE_TYPES.ASSISTANT,
-              content: streamingTextRef.current + '\n\n' + gettext('(Generation stopped)'),
-            },
+            {
+              type: MESSAGE_TYPES.ASSISTANT,
+              content: streamingTextRef.current,
+            },
+            {
+              type: MESSAGE_TYPES.ASSISTANT,
+              content: gettext('Generation stopped.'),
+            },

Also applies to: 905-910

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsx`
around lines 874 - 880, The current streaming branch inside the component uses
streamingTextRef.current and setMessages to replace the assistant message by
appending '\n\n(Generation stopped)' to the partial markdown, which can end up
inside an open fenced code block; instead preserve the partial response exactly
(do not append the stop notice to streamingTextRef.current) and call setMessages
twice (or add two entries) so you first update/commit the assistant message with
content: streamingTextRef.current and type MESSAGE_TYPES.ASSISTANT (filtering
out thinkingId and streamId as before), then append a separate assistant-status
message whose content is gettext('(Generation stopped)') (or use a distinct type
if you prefer) so the stop marker is rendered as a standalone message; apply the
same pattern to the analogous block handling stream end around
streamId/thinkingId (the other occurrence at the 905-910 region).

Comment on lines 1035 to +1043
setMessages((prev) => [
...prev.filter((m) => m.id !== thinkingId),
...prev.filter((m) => m.id !== thinkingId && m.id !== streamingIdRef.current),
{
type: MESSAGE_TYPES.ERROR,
content: event.message,
},
]);
streamingTextRef.current = '';
streamingIdRef.current = null;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Snapshot the streaming message id before the error-state update.

Line 1036 reads streamingIdRef.current inside the functional setMessages updater, but Lines 1042-1043 clear that ref immediately after. Because React may execute the updater later, the filter can see null and leave the stale streaming bubble on screen alongside the error. Capture the id in a local first, like the abort/complete paths already do.

Proposed fix
-    case 'error':
+    case 'error': {
+      const streamId = streamingIdRef.current;
       setMessages((prev) => [
-        ...prev.filter((m) => m.id !== thinkingId && m.id !== streamingIdRef.current),
+        ...prev.filter((m) => m.id !== thinkingId && m.id !== streamId),
         {
           type: MESSAGE_TYPES.ERROR,
           content: event.message,
         },
       ]);
       streamingTextRef.current = '';
       streamingIdRef.current = null;
       break;
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsx`
around lines 1035 - 1043, Snapshot streamingIdRef.current into a local (e.g.,
const streamingId = streamingIdRef.current) before calling setMessages so the
functional updater filters using that captured id (alongside thinkingId) instead
of reading streamingIdRef.current later; then update setMessages to filter out
that local streamingId and add the error message (MESSAGE_TYPES.ERROR), and only
after setMessages clear streamingTextRef.current and set streamingIdRef.current
= null to keep behavior consistent with the abort/complete paths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LLM responses are not streamed or rendered properly in the AI Assistant

1 participant