[LEADS-241] judge llm token counter for deepeval#180
Conversation
|
Important Review skippedThis PR was authored by the user configured for CodeRabbit reviews. CodeRabbit does not review PRs authored by this user. It's recommended to use a dedicated user account to post CodeRabbit review feedback. ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughExtracts token-tracking into a dedicated Changes
Sequence DiagramsequenceDiagram
participant Client as Client Code
participant Patch as litellm_patch.wrapper
participant LiteLLM as original_litellm.completion
participant Tracker as TokenTracker
Client->>Patch: call litellm.completion(...)
Patch->>LiteLLM: call original_completion(...)
LiteLLM-->>Patch: return response (with usage / cache info)
Patch->>Tracker: track_tokens(response)
Tracker->>Tracker: extract prompt/completion tokens and add_tokens
Patch-->>Client: return original response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (1)
src/lightspeed_evaluation/core/llm/llm_patch.py (1)
15-18: Make patching idempotent to prevent double-wrapping.Current import-time patching is unconditional. Module reloads (especially in tests) can wrap wrappers again and inflate token counts.
💡 Suggested fix
-# Store original functions before patching -_original_completion = litellm.completion -_original_acompletion = litellm.acompletion +# Store originals once and avoid double-patching +if not getattr(litellm, "_lightspeed_token_patch_applied", False): + litellm._lightspeed_original_completion = litellm.completion + litellm._lightspeed_original_acompletion = litellm.acompletion + litellm._lightspeed_token_patch_applied = True + +_original_completion = litellm._lightspeed_original_completion +_original_acompletion = litellm._lightspeed_original_acompletion @@ -# Patch litellm's completion functions to include token tracking -litellm.completion = _completion_with_token_tracking -litellm.acompletion = _acompletion_with_token_tracking +if litellm.completion is not _completion_with_token_tracking: + litellm.completion = _completion_with_token_tracking +if litellm.acompletion is not _acompletion_with_token_tracking: + litellm.acompletion = _acompletion_with_token_trackingAlso applies to: 37-38
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lightspeed_evaluation/core/llm/llm_patch.py` around lines 15 - 18, The patching logic is currently unconditional and can double-wrap litellm.completion and litellm.acompletion; change it to be idempotent by only saving and replacing the originals if they haven't been wrapped yet: check whether litellm.completion and litellm.acompletion are already the wrapper (or whether _original_completion/_original_acompletion are already set) before assigning; use unique symbols _original_completion and _original_acompletion to hold the true originals and skip patching if those variables are non-None or if the target functions have an indicator (e.g., a __wrapped__ attribute or identity check) that they’re already wrapped, then only wrap once.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lightspeed_evaluation/core/llm/__init__.py`:
- Line 30: The lazy-import mapping points to a non-existent top-level symbol
"llm_patch" in module lightspeed_evaluation.core.llm.llm_patch; either export a
top-level symbol named llm_patch from llm_patch.py (for example by assigning
llm_patch = _completion_with_token_tracking or creating a wrapper function that
calls _acompletion_with_token_tracking/_completion_with_token_tracking) or
change the mapping to reference an actual existing symbol such as
"_completion_with_token_tracking" or "_acompletion_with_token_tracking" so
create_lazy_getattr can successfully getattr the target.
In `@src/lightspeed_evaluation/core/llm/custom.py`:
- Around line 10-13: Remove the inline lint suppressions and keep the
side-effect import for llm_patch but mark it as used so linters don't flag it;
replace the current suppressed import of llm_patch with an import that either
aliases it to a leading-underscore name (e.g., import as _llm_patch) or
references it immediately (e.g., an assert or simple no-op reference) so the
symbol llm_patch is considered used without changing runtime behavior.
In `@src/lightspeed_evaluation/core/llm/deepeval.py`:
- Line 49: Replace the stdout print in DeepEvalLLMManager initialization with a
structured logger call: ensure a module-level logger
(logging.getLogger(__name__) or the project logger) is used and call
logger.info/platform-appropriate-level with a clear message and fields (include
self.model_name and token-tracking status) instead of print; update the code in
the DeepEvalLLMManager (constructor/__init__) where print(f"✅ DeepEval LLM
Manager (with token tracking): {self.model_name}") appears to use logger and
attach model_name as a structured field or in the message.
- Around line 12-17: Move the llm_patch import to execute before DeepEval's
LiteLLMModel is imported: replace the current top-of-file sequence so you first
call importlib.import_module("lightspeed_evaluation.core.llm.llm_patch") (or
import_module("lightspeed_evaluation.core.llm") if llm_patch is executed on
module import) to run the patch for side effects, then import LiteLLMModel from
deepeval.models; remove the "noqa: F401" and "pylint: disable=unused-import"
comments since importlib.import_module avoids creating an unused name binding
and thus fixes the underlying lint warnings.
In `@src/lightspeed_evaluation/core/llm/token_tracker.py`:
- Around line 90-101: Replace the print and pylint suppression with structured,
defensive logging: add a module logger via logging.getLogger(__name__) (if not
already present), remove the "# pylint: disable=protected-access" comment, and
compute cache_hit using cache_hit = getattr(getattr(response, "_hidden_params",
{}), "get", lambda *a, **k: False)("cache_hit", False) or simply cp =
getattr(response, "_hidden_params", {}) and cache_hit = cp.get("cache_hit",
False) to avoid protected-access suppression; then replace the print(...) with
logger.debug("TokenTracker: Tracking tokens for response (cache_hit=%s)",
cache_hit) and keep the existing guarded call to
tracker.add_tokens(getattr(response.usage, "prompt_tokens", 0),
getattr(response.usage, "completion_tokens", 0)) so tokens are only added when
not cached and response.usage is present.
In `@tests/unit/core/llm/test_token_tracker.py`:
- Line 1: Remove the "# pylint: disable=protected-access" and other
pylint-disable comments in tests/unit/core/llm/test_token_tracker.py and fix
each suppressed case by using the test patch fixture and safe attribute setting:
replace direct writes to protected attributes (e.g., any usage of
target._protected_attr = value) with setattr(target, "protected_attr", value),
and switch manual patch targets to the llm_patch fixture where the test
currently mutates internals; ensure no "# pylint: disable" comments remain and
update all affected spots referenced in the review (the initial file header and
the other occurrences called out) accordingly.
---
Nitpick comments:
In `@src/lightspeed_evaluation/core/llm/llm_patch.py`:
- Around line 15-18: The patching logic is currently unconditional and can
double-wrap litellm.completion and litellm.acompletion; change it to be
idempotent by only saving and replacing the originals if they haven't been
wrapped yet: check whether litellm.completion and litellm.acompletion are
already the wrapper (or whether _original_completion/_original_acompletion are
already set) before assigning; use unique symbols _original_completion and
_original_acompletion to hold the true originals and skip patching if those
variables are non-None or if the target functions have an indicator (e.g., a
__wrapped__ attribute or identity check) that they’re already wrapped, then only
wrap once.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3412ec78-95f8-42fa-96c3-d248f98d34e6
📒 Files selected for processing (9)
src/lightspeed_evaluation/core/llm/__init__.pysrc/lightspeed_evaluation/core/llm/custom.pysrc/lightspeed_evaluation/core/llm/deepeval.pysrc/lightspeed_evaluation/core/llm/llm_patch.pysrc/lightspeed_evaluation/core/llm/token_tracker.pysrc/lightspeed_evaluation/pipeline/evaluation/evaluator.pytests/unit/core/llm/test_custom.pytests/unit/core/llm/test_token_tracker.pytests/unit/pipeline/evaluation/test_evaluator.py
ba9cfe6 to
3a44146
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lightspeed_evaluation/core/llm/custom.py (1)
88-89:⚠️ Potential issue | 🟡 MinorReplace
# type: ignorecomments with proper typing.Lines 88-89 use
# type: ignorewhich violates the coding guideline. Consider adding type annotations or usingcast()from thetypingmodule to properly type the litellm response.💡 Suggested fix using cast
+from typing import Any, Optional, Union, cast @@ # Extract content from all choices results = [] - for choice in response.choices: # type: ignore - content = choice.message.content # type: ignore + for choice in cast(list, response.choices): + content = cast(str, choice.message.content) if content is None: content = "" results.append(content.strip())As per coding guidelines:
**/*.py: Do not disable lint warnings with # noqa, # type: ignore, or # pylint: disable comments - fix the underlying issue instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lightspeed_evaluation/core/llm/custom.py` around lines 88 - 89, The loop is silencing type checks with "# type: ignore"; fix by properly typing the litellm response and choice objects (e.g., annotate the variable `response` with the correct response type returned by the litellm client and annotate `choice` or use typing.cast to the expected choice/message type) so you can access `choice.message.content` without ignores. Update the signature or local variable where `response` is obtained to have the precise type, import typing.cast (or the correct litellm models) and replace the two `# type: ignore` comments by casting `response`/`choice` to that type (refer to `response`, `response.choices`, `choice`, and `choice.message.content` to locate the spots).
🧹 Nitpick comments (2)
src/lightspeed_evaluation/core/llm/deepeval.py (1)
55-57: Consider extracting shared SSL setup to avoid duplicate-code suppression.The
setup_ssl_verifymethod duplicates logic fromBaseCustomLLM. Consider extracting this to a shared utility function in a common module to eliminate thepylint: disable=duplicate-codesuppression.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lightspeed_evaluation/core/llm/deepeval.py` around lines 55 - 57, The SSL setup in deepeval.py duplicates logic from BaseCustomLLM: extract the shared logic that reads ssl_verify from llm params (currently in setup_ssl_verify and the ssl_verify = self.llm_params.get(...) line) into a single utility function (e.g., get_ssl_verify or setup_ssl_verify) in a common module, update both BaseCustomLLM and DeepEval (deepeval.py) to call that utility, and remove the pylint: disable=duplicate-code suppression; ensure the utility accepts the llm_params dict (and any needed defaults) and returns the boolean ssl_verify value so callers simply replace their inline logic with a call to that function.src/lightspeed_evaluation/core/llm/custom.py (1)
18-18: Logger is defined but never used.The
loggeris initialized but no logging calls exist in this module. Either add appropriate debug/info logging for LLM calls (e.g., logging call parameters, response times, or errors) or remove the unused import.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lightspeed_evaluation/core/llm/custom.py` at line 18, The module defines logger = logging.getLogger(__name__) but never uses it; either remove the unused logger import or instrument the LLM call sites in this module to use logger (e.g., add logger.debug before invoking the LLM with call parameters, logger.info or logger.debug on successful responses including response time, and logger.exception or logger.error inside exception handlers). Locate the functions/methods that perform LLM calls in this file (use their names to find call sites) and add concise logging statements around request parameters, start/end timestamps to compute latency, and any caught exceptions using the existing logger variable; if no logging is desired, simply delete the logger definition and unused logging import.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/core/llm/test_custom.py`:
- Around line 107-112: The test class inconsistently patches litellm: update the
two tests that currently patch litellm.completion (including
test_call_does_not_add_tokens_on_cache_hit and
test_call_does_not_capture_tokens_without_active_tracker) to instead patch
lightspeed_evaluation.core.llm.llm_patch._original_completion so the
token-tracking wrapper (track_tokens) still executes; specifically, replace
mocks of custom.litellm.completion with mocks targeting _original_completion,
ensure return_value stays as mock_response, and keep assertions that verify
track_tokens/cache behaviour so the cache-hit path in track_tokens is actually
exercised.
---
Outside diff comments:
In `@src/lightspeed_evaluation/core/llm/custom.py`:
- Around line 88-89: The loop is silencing type checks with "# type: ignore";
fix by properly typing the litellm response and choice objects (e.g., annotate
the variable `response` with the correct response type returned by the litellm
client and annotate `choice` or use typing.cast to the expected choice/message
type) so you can access `choice.message.content` without ignores. Update the
signature or local variable where `response` is obtained to have the precise
type, import typing.cast (or the correct litellm models) and replace the two `#
type: ignore` comments by casting `response`/`choice` to that type (refer to
`response`, `response.choices`, `choice`, and `choice.message.content` to locate
the spots).
---
Nitpick comments:
In `@src/lightspeed_evaluation/core/llm/custom.py`:
- Line 18: The module defines logger = logging.getLogger(__name__) but never
uses it; either remove the unused logger import or instrument the LLM call sites
in this module to use logger (e.g., add logger.debug before invoking the LLM
with call parameters, logger.info or logger.debug on successful responses
including response time, and logger.exception or logger.error inside exception
handlers). Locate the functions/methods that perform LLM calls in this file (use
their names to find call sites) and add concise logging statements around
request parameters, start/end timestamps to compute latency, and any caught
exceptions using the existing logger variable; if no logging is desired, simply
delete the logger definition and unused logging import.
In `@src/lightspeed_evaluation/core/llm/deepeval.py`:
- Around line 55-57: The SSL setup in deepeval.py duplicates logic from
BaseCustomLLM: extract the shared logic that reads ssl_verify from llm params
(currently in setup_ssl_verify and the ssl_verify = self.llm_params.get(...)
line) into a single utility function (e.g., get_ssl_verify or setup_ssl_verify)
in a common module, update both BaseCustomLLM and DeepEval (deepeval.py) to call
that utility, and remove the pylint: disable=duplicate-code suppression; ensure
the utility accepts the llm_params dict (and any needed defaults) and returns
the boolean ssl_verify value so callers simply replace their inline logic with a
call to that function.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 82af2b1f-6680-4746-889c-34462cb7c1ea
📒 Files selected for processing (9)
src/lightspeed_evaluation/core/llm/__init__.pysrc/lightspeed_evaluation/core/llm/custom.pysrc/lightspeed_evaluation/core/llm/deepeval.pysrc/lightspeed_evaluation/core/llm/llm_patch.pysrc/lightspeed_evaluation/core/llm/token_tracker.pysrc/lightspeed_evaluation/pipeline/evaluation/evaluator.pytests/unit/core/llm/test_custom.pytests/unit/core/llm/test_token_tracker.pytests/unit/pipeline/evaluation/test_evaluator.py
🚧 Files skipped from review as they are similar to previous changes (4)
- tests/unit/pipeline/evaluation/test_evaluator.py
- src/lightspeed_evaluation/pipeline/evaluation/evaluator.py
- tests/unit/core/llm/test_token_tracker.py
- src/lightspeed_evaluation/core/llm/init.py
3a44146 to
e88e255
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lightspeed_evaluation/core/llm/deepeval.py (1)
47-50:⚠️ Potential issue | 🟠 MajorRemove lint suppression by fixing duplicate SSL setup at the source.
Line 49 introduces
# pylint: disable=duplicate-code, which is not allowed. Please extract shared SSL setup logic (used in both custom/deepeval managers) into a common helper and call it here.As per coding guidelines:
**/*.py: Do not disable lint warnings with # noqa, # type: ignore, or # pylint: disable comments - fix the underlying issue instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lightspeed_evaluation/core/llm/deepeval.py` around lines 47 - 50, Extract the duplicated SSL setup into a shared helper instead of suppressing the lint rule: create a helper function (e.g., get_ssl_verify(llm_params) or setup_ssl_config(llm_params)) in a common module used by both the custom manager and deepeval manager, move the logic that reads ssl_verify from llm_params into that helper, update deepeval.py to call the helper to obtain ssl_verify (replace the local ssl_verify = self.llm_params.get("ssl_verify", True) and remove the # pylint: disable=duplicate-code), and update the other manager to call the same helper so both use the centralized implementation. Ensure the helper has the same default behavior and update imports accordingly.
♻️ Duplicate comments (3)
src/lightspeed_evaluation/core/llm/deepeval.py (1)
43-43:⚠️ Potential issue | 🟠 MajorUse structured logger instead of
Line 43 should log via the module logger at an appropriate level, not stdout.
Proposed fix
+import logging import os from typing import Any import litellm from deepeval.models import LiteLLMModel +logger = logging.getLogger(__name__) + @@ - print(f"✅ DeepEval LLM Manager: {self.model_name}") + logger.info( + "DeepEval LLM Manager initialized", + extra={"model_name": self.model_name, "token_tracking": True}, + )As per coding guidelines:
src/lightspeed_evaluation/**/*.py: Use structured logging with appropriate log levels in Python code.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lightspeed_evaluation/core/llm/deepeval.py` at line 43, Replace the direct print call that prints self.model_name in the DeepEval manager initialization with a structured logger call (e.g., logger.info) using the module logger; include the model name as structured context (so the message and model_name are logged) and ensure a module-level logger (logger = logging.getLogger(__name__)) exists or is imported. Target the print(f"✅ DeepEval LLM Manager: {self.model_name}") in the DeepEvalLLMManager initializer (or the method that sets self.model_name) and swap it for a logger call at an appropriate level (info/debug) with structured fields.src/lightspeed_evaluation/core/llm/token_tracker.py (1)
90-98:⚠️ Potential issue | 🟠 MajorRemove protected-access suppression and make hidden-param parsing defensive.
This block currently relies on
# pylint: disable=protected-access. You can avoid suppression by defensively reading_hidden_paramsand keep token parsing resilient.Proposed fix
- cache_hit = getattr( - response, "_hidden_params", {} - ).get( # pylint: disable=protected-access - "cache_hit", False - ) + hidden_params = getattr(response, "_hidden_params", {}) + cache_hit = ( + hidden_params.get("cache_hit", False) + if isinstance(hidden_params, dict) + else False + ) @@ - prompt_tokens = int(getattr(response.usage, "prompt_tokens", 0)) - completion_tokens = int(getattr(response.usage, "completion_tokens", 0)) + prompt_tokens = int(getattr(response.usage, "prompt_tokens", 0) or 0) + completion_tokens = int( + getattr(response.usage, "completion_tokens", 0) or 0 + )As per coding guidelines:
**/*.py: Do not disable lint warnings with # noqa, # type: ignore, or # pylint: disable comments - fix the underlying issue instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lightspeed_evaluation/core/llm/token_tracker.py` around lines 90 - 98, The code is using a protected attribute with a pylint suppression and assumes _hidden_params is a dict and usage fields exist; update the block in token_tracker.py to defensively read hidden params and parse tokens without disabling lint: use getattr(response, "_hidden_params", None) and if it's not a dict replace with {} before reading "cache_hit", remove the "# pylint: disable=protected-access" comment, and only proceed to parse tokens from response.usage after confirming response.usage is not None; convert prompt_tokens and completion_tokens using safe defaults and int-casting inside a guarded path (or with try/except) so malformed or missing values fall back to 0.src/lightspeed_evaluation/core/llm/__init__.py (1)
5-7:⚠️ Potential issue | 🟠 MajorRemove lint suppression on side-effect import.
Line 6 uses
# noqa: F401, which violates repo policy. Keep the side-effect import, but do it without lint suppression.Proposed fix
+import importlib from typing import TYPE_CHECKING # Apply litellm patching globally before any litellm usage in this package -import lightspeed_evaluation.core.llm.llm_patch # noqa: F401 +importlib.import_module("lightspeed_evaluation.core.llm.llm_patch")As per coding guidelines:
**/*.py: Do not disable lint warnings with # noqa, # type: ignore, or # pylint: disable comments - fix the underlying issue instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lightspeed_evaluation/core/llm/__init__.py` around lines 5 - 7, Remove the lint suppression by replacing the bare side-effect import of lightspeed_evaluation.core.llm.llm_patch with an explicit dynamic import call (e.g., via importlib.import_module or __import__) so the module is actually used at runtime without triggering F401; update the top-level import statement that currently references lightspeed_evaluation.core.llm.llm_patch to perform importlib.import_module("lightspeed_evaluation.core.llm.llm_patch") (or equivalent) so the side-effect still runs but no # noqa is needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/lightspeed_evaluation/core/llm/deepeval.py`:
- Around line 47-50: Extract the duplicated SSL setup into a shared helper
instead of suppressing the lint rule: create a helper function (e.g.,
get_ssl_verify(llm_params) or setup_ssl_config(llm_params)) in a common module
used by both the custom manager and deepeval manager, move the logic that reads
ssl_verify from llm_params into that helper, update deepeval.py to call the
helper to obtain ssl_verify (replace the local ssl_verify =
self.llm_params.get("ssl_verify", True) and remove the # pylint:
disable=duplicate-code), and update the other manager to call the same helper so
both use the centralized implementation. Ensure the helper has the same default
behavior and update imports accordingly.
---
Duplicate comments:
In `@src/lightspeed_evaluation/core/llm/__init__.py`:
- Around line 5-7: Remove the lint suppression by replacing the bare side-effect
import of lightspeed_evaluation.core.llm.llm_patch with an explicit dynamic
import call (e.g., via importlib.import_module or __import__) so the module is
actually used at runtime without triggering F401; update the top-level import
statement that currently references lightspeed_evaluation.core.llm.llm_patch to
perform importlib.import_module("lightspeed_evaluation.core.llm.llm_patch") (or
equivalent) so the side-effect still runs but no # noqa is needed.
In `@src/lightspeed_evaluation/core/llm/deepeval.py`:
- Line 43: Replace the direct print call that prints self.model_name in the
DeepEval manager initialization with a structured logger call (e.g.,
logger.info) using the module logger; include the model name as structured
context (so the message and model_name are logged) and ensure a module-level
logger (logger = logging.getLogger(__name__)) exists or is imported. Target the
print(f"✅ DeepEval LLM Manager: {self.model_name}") in the DeepEvalLLMManager
initializer (or the method that sets self.model_name) and swap it for a logger
call at an appropriate level (info/debug) with structured fields.
In `@src/lightspeed_evaluation/core/llm/token_tracker.py`:
- Around line 90-98: The code is using a protected attribute with a pylint
suppression and assumes _hidden_params is a dict and usage fields exist; update
the block in token_tracker.py to defensively read hidden params and parse tokens
without disabling lint: use getattr(response, "_hidden_params", None) and if
it's not a dict replace with {} before reading "cache_hit", remove the "#
pylint: disable=protected-access" comment, and only proceed to parse tokens from
response.usage after confirming response.usage is not None; convert
prompt_tokens and completion_tokens using safe defaults and int-casting inside a
guarded path (or with try/except) so malformed or missing values fall back to 0.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 164186e3-abdd-4518-aee2-f6ac221f3dbd
📒 Files selected for processing (9)
src/lightspeed_evaluation/core/llm/__init__.pysrc/lightspeed_evaluation/core/llm/custom.pysrc/lightspeed_evaluation/core/llm/deepeval.pysrc/lightspeed_evaluation/core/llm/llm_patch.pysrc/lightspeed_evaluation/core/llm/token_tracker.pysrc/lightspeed_evaluation/pipeline/evaluation/evaluator.pytests/unit/core/llm/test_custom.pytests/unit/core/llm/test_token_tracker.pytests/unit/pipeline/evaluation/test_evaluator.py
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/unit/core/llm/test_token_tracker.py
- tests/unit/pipeline/evaluation/test_evaluator.py
VladimirKadlec
left a comment
There was a problem hiding this comment.
LGTM 💪
nit: I'd rename core/llm/llm_patch.py to core/llm/litellm_patch.py
e88e255 to
eb59c92
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/lightspeed_evaluation/core/llm/__init__.py (1)
5-7:⚠️ Potential issue | 🟠 MajorRemove lint suppression on the patch side-effect import.
Using
# noqa: F401here violates the no-suppression rule. Useimportlib.import_module(...)to keep the side effect without creating an unused binding.Suggested fix
+import importlib from typing import TYPE_CHECKING # Apply litellm patching globally before any litellm usage in this package -import lightspeed_evaluation.core.llm.litellm_patch # noqa: F401 +importlib.import_module("lightspeed_evaluation.core.llm.litellm_patch")As per coding guidelines:
**/*.py: Do not disable lint warnings with # noqa, # type: ignore, or # pylint: disable comments - fix the underlying issue instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lightspeed_evaluation/core/llm/__init__.py` around lines 5 - 7, The module currently does a side-effect import using "import lightspeed_evaluation.core.llm.litellm_patch # noqa: F401"; remove the noqa suppression and replace this static import with a dynamic import using importlib.import_module('lightspeed_evaluation.core.llm.litellm_patch') so the patch is applied without creating an unused binding; update the top-level import in __init__.py accordingly and ensure importlib is imported if not already.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lightspeed_evaluation/core/llm/litellm_patch.py`:
- Around line 23-25: The token-tracking call track_tokens(response) is unguarded
and can raise, causing successful LLM completions to fail; wrap the call in a
try/except in both wrapper locations (the block that calls _original_completion
and the second wrapper around lines 31-33) so any exception from track_tokens is
caught and logged (use the module logger, e.g., logging.getLogger(__name__) or
an existing logger) and does not prevent returning the successful response from
_original_completion; add an import for logging if needed and ensure the catch
only logs the exception and continues to return the original response.
---
Duplicate comments:
In `@src/lightspeed_evaluation/core/llm/__init__.py`:
- Around line 5-7: The module currently does a side-effect import using "import
lightspeed_evaluation.core.llm.litellm_patch # noqa: F401"; remove the noqa
suppression and replace this static import with a dynamic import using
importlib.import_module('lightspeed_evaluation.core.llm.litellm_patch') so the
patch is applied without creating an unused binding; update the top-level import
in __init__.py accordingly and ensure importlib is imported if not already.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 2a5d6927-9329-4e0e-a62c-bd2e69223885
📒 Files selected for processing (9)
src/lightspeed_evaluation/core/llm/__init__.pysrc/lightspeed_evaluation/core/llm/custom.pysrc/lightspeed_evaluation/core/llm/deepeval.pysrc/lightspeed_evaluation/core/llm/litellm_patch.pysrc/lightspeed_evaluation/core/llm/token_tracker.pysrc/lightspeed_evaluation/pipeline/evaluation/evaluator.pytests/unit/core/llm/test_custom.pytests/unit/core/llm/test_token_tracker.pytests/unit/pipeline/evaluation/test_evaluator.py
✅ Files skipped from review due to trivial changes (1)
- src/lightspeed_evaluation/pipeline/evaluation/evaluator.py
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/unit/pipeline/evaluation/test_evaluator.py
- src/lightspeed_evaluation/core/llm/token_tracker.py
eb59c92 to
172bd00
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
src/lightspeed_evaluation/core/llm/token_tracker.py (1)
90-94: 🛠️ Refactor suggestion | 🟠 MajorRemove lint suppression by accessing
_hidden_paramsdefensively.The
# pylint: disable=protected-accesson line 92 violates coding guidelines. Usevars()or a defensive approach to avoid accessing protected attributes directly.♻️ Proposed fix
if tracker and response is not None: - cache_hit = getattr( - response, "_hidden_params", {} - ).get( # pylint: disable=protected-access - "cache_hit", False - ) + # Access hidden params defensively without triggering protected-access warning + hidden_params = vars(response).get("_hidden_params", {}) + cache_hit = ( + hidden_params.get("cache_hit", False) + if isinstance(hidden_params, dict) + else False + )As per coding guidelines:
**/*.py: Do not disable lint warnings with # noqa, # type: ignore, or # pylint: disable comments - fix the underlying issue instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lightspeed_evaluation/core/llm/token_tracker.py` around lines 90 - 94, Replace the protected-attribute access to response._hidden_params with a defensive use of vars() (or getattr(response, "__dict__", {})) so you don't disable pylint; specifically change the cache_hit assignment that references response._hidden_params in token_tracker.py to fetch the dict via vars(response).get("_hidden_params", {}) and then .get("cache_hit", False), and remove the "# pylint: disable=protected-access" comment.src/lightspeed_evaluation/core/llm/deepeval.py (1)
43-43: 🛠️ Refactor suggestion | 🟠 MajorUse structured logger instead of
Line 43 uses
print()for initialization confirmation, which violates the coding guideline requiring structured logging with appropriate log levels.♻️ Proposed fix
+import logging import os from typing import Any import litellm from deepeval.models import LiteLLMModel +logger = logging.getLogger(__name__) + class DeepEvalLLMManager: ... - print(f"✅ DeepEval LLM Manager: {self.model_name}") + logger.info("DeepEval LLM Manager initialized: %s", self.model_name)As per coding guidelines:
src/lightspeed_evaluation/**/*.py: Use structured logging with appropriate log levels in Python code.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lightspeed_evaluation/core/llm/deepeval.py` at line 43, Replace the plain print in the DeepEval LLM manager initialization with the module/class structured logger: locate the print(f"✅ DeepEval LLM Manager: {self.model_name}") in the DeepEvalLLMManager (likely __init__) and change it to use a logger (e.g., module logger via logging.getLogger(__name__) or the instance logger self.logger if one exists) at an appropriate level (info) and log the model_name as a structured field or formatted argument rather than printing to stdout.
🧹 Nitpick comments (3)
src/lightspeed_evaluation/core/llm/litellm_patch.py (1)
34-42: Streaming responses may not have.usageattribute.Per the context snippet from
token_tracker.py,track_tokens()expects a response object with a.usageattribute. Whenlitellm.acompletionreturns an async generator (streaming mode), callingtrack_tokens()will silently skip tracking because async generators don't have.usage.This is likely acceptable behavior (streaming responses don't provide usage until completion), but consider adding a debug log in
track_tokenswhenresponselacks.usageto aid troubleshooting.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lightspeed_evaluation/core/llm/litellm_patch.py` around lines 34 - 42, The wrapper _acompletion_with_token_tracking currently calls track_tokens(response) but streaming responses (async generators returned by litellm.acompletion) won't have a .usage attribute; update the track_tokens function in token_tracker.py to detect missing usage (e.g., via hasattr(response, "usage") or inspect.isasyncgen) and emit a debug log indicating that token tracking was skipped for a streaming/usage-less response (include the response type/class in the log for troubleshooting) so callers like _acompletion_with_token_tracking still quietly work while providing a clear debug signal.src/lightspeed_evaluation/core/llm/deepeval.py (1)
45-56: Remove lint suppression and consider extracting SSL setup to shared utility.Line 49 uses
# pylint: disable=duplicate-code, which violates coding guidelines. Thesetup_ssl_verifymethod is duplicated betweenBaseCustomLLMandDeepEvalLLMManager.Consider extracting this to a shared utility function or mixin to eliminate duplication and the need for suppression.
♻️ Proposed fix - extract to utility
# In a shared module (e.g., lightspeed_evaluation/core/llm/ssl_utils.py): import os import litellm def configure_ssl_verify(llm_params: dict) -> None: """Configure SSL verification for litellm based on parameters.""" ssl_verify = llm_params.get("ssl_verify", True) if ssl_verify: litellm.ssl_verify = os.environ.get("SSL_CERTIFI_BUNDLE", True) else: litellm.ssl_verify = FalseThen both classes can call
configure_ssl_verify(self.llm_params)instead of duplicating the method.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lightspeed_evaluation/core/llm/deepeval.py` around lines 45 - 56, The setup_ssl_verify implementation is duplicated (and currently silences pylint) — remove the "# pylint: disable=duplicate-code" and extract the logic into a shared utility function (e.g., configure_ssl_verify(llm_params: dict)) that sets litellm.ssl_verify using os.environ.get("SSL_CERTIFI_BUNDLE", True) when llm_params.get("ssl_verify", True) is truthy, otherwise sets it to False; then replace the existing DeepEvalLLMManager.setup_ssl_verify and BaseCustomLLM.setup_ssl_verify implementations to simply call configure_ssl_verify(self.llm_params).tests/unit/pipeline/evaluation/test_evaluator.py (1)
907-964: Consider relocating TokenTracker unit tests to mirror source structure.These
TokenTrackerunit tests are placed intest_evaluator.py, but per coding guidelines, test directory structure should mirror source code structure. SinceTokenTrackernow lives insrc/lightspeed_evaluation/core/llm/token_tracker.py, basic unit tests should be intests/unit/core/llm/test_token_tracker.py.The context snippets indicate that
tests/unit/core/llm/test_token_tracker.pyexists and importsTokenTracker. Consider moving these basic tests there to maintain consistency.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/pipeline/evaluation/test_evaluator.py` around lines 907 - 964, The TokenTracker unit tests are placed in test_evaluator.py but should live in the dedicated TokenTracker test module; move the TestTokenTracker class (and its methods) out of test_evaluator.py and into the existing test_token_tracker.py test module, ensure the new file imports the TokenTracker symbol and contains TestTokenTracker, remove the duplicate from test_evaluator.py, update any imports if needed, and run the test suite to confirm no duplicate tests remain.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lightspeed_evaluation/core/llm/__init__.py`:
- Around line 5-6: The file currently does a side-effect import of
lightspeed_evaluation.core.llm.litellm_patch with a noqa to silence "unused
import" lint; replace that line with an explicit runtime import using
importlib.import_module("lightspeed_evaluation.core.llm.litellm_patch") (ensure
importlib is imported at top) so the patch is executed without creating an
unused name binding and then remove the "# noqa: F401" comment; reference the
module name lightspeed_evaluation.core.llm.litellm_patch and the package
__init__.py where the change should be made.
In `@src/lightspeed_evaluation/core/llm/litellm_patch.py`:
- Around line 27-31: The try/except around track_tokens(response) is silencing
all exceptions with a pylint disable; replace the broad except with explicit
exception types (catch AttributeError, TypeError, and ValueError) and remove the
"# pylint: disable=broad-exception-caught" comment; do the same for the other
occurrence in this file (the other try/except that logs via logger.exception),
keeping the existing logger.exception call and message but only catching the
specified exceptions for track_tokens-related failures.
In `@tests/unit/core/llm/test_token_tracker.py`:
- Around line 71-90: The test starts tracker1 and tracker2 (via thread_work) but
only calls tracker1.stop() at the end, risking leaked active trackers if an
assertion fails; update the test to ensure both tracker1 and the thread's
tracker2 are always stopped by wrapping their lifecycles in try/finally blocks
(or a single try/finally that stops both) around the sections that call
tracker.start() and Thread(...).start(), ensuring you call tracker1.stop() and
tracker2.stop() in the finally; keep references to the existing symbols
thread_work, tracker1, tracker2, TokenTracker.get_active, tracker.start, and
tracker.stop to locate and modify the test.
---
Duplicate comments:
In `@src/lightspeed_evaluation/core/llm/deepeval.py`:
- Line 43: Replace the plain print in the DeepEval LLM manager initialization
with the module/class structured logger: locate the print(f"✅ DeepEval LLM
Manager: {self.model_name}") in the DeepEvalLLMManager (likely __init__) and
change it to use a logger (e.g., module logger via logging.getLogger(__name__)
or the instance logger self.logger if one exists) at an appropriate level (info)
and log the model_name as a structured field or formatted argument rather than
printing to stdout.
In `@src/lightspeed_evaluation/core/llm/token_tracker.py`:
- Around line 90-94: Replace the protected-attribute access to
response._hidden_params with a defensive use of vars() (or getattr(response,
"__dict__", {})) so you don't disable pylint; specifically change the cache_hit
assignment that references response._hidden_params in token_tracker.py to fetch
the dict via vars(response).get("_hidden_params", {}) and then .get("cache_hit",
False), and remove the "# pylint: disable=protected-access" comment.
---
Nitpick comments:
In `@src/lightspeed_evaluation/core/llm/deepeval.py`:
- Around line 45-56: The setup_ssl_verify implementation is duplicated (and
currently silences pylint) — remove the "# pylint: disable=duplicate-code" and
extract the logic into a shared utility function (e.g.,
configure_ssl_verify(llm_params: dict)) that sets litellm.ssl_verify using
os.environ.get("SSL_CERTIFI_BUNDLE", True) when llm_params.get("ssl_verify",
True) is truthy, otherwise sets it to False; then replace the existing
DeepEvalLLMManager.setup_ssl_verify and BaseCustomLLM.setup_ssl_verify
implementations to simply call configure_ssl_verify(self.llm_params).
In `@src/lightspeed_evaluation/core/llm/litellm_patch.py`:
- Around line 34-42: The wrapper _acompletion_with_token_tracking currently
calls track_tokens(response) but streaming responses (async generators returned
by litellm.acompletion) won't have a .usage attribute; update the track_tokens
function in token_tracker.py to detect missing usage (e.g., via
hasattr(response, "usage") or inspect.isasyncgen) and emit a debug log
indicating that token tracking was skipped for a streaming/usage-less response
(include the response type/class in the log for troubleshooting) so callers like
_acompletion_with_token_tracking still quietly work while providing a clear
debug signal.
In `@tests/unit/pipeline/evaluation/test_evaluator.py`:
- Around line 907-964: The TokenTracker unit tests are placed in
test_evaluator.py but should live in the dedicated TokenTracker test module;
move the TestTokenTracker class (and its methods) out of test_evaluator.py and
into the existing test_token_tracker.py test module, ensure the new file imports
the TokenTracker symbol and contains TestTokenTracker, remove the duplicate from
test_evaluator.py, update any imports if needed, and run the test suite to
confirm no duplicate tests remain.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 2f28d73f-9f45-4506-881b-a98c2ac7721a
📒 Files selected for processing (9)
src/lightspeed_evaluation/core/llm/__init__.pysrc/lightspeed_evaluation/core/llm/custom.pysrc/lightspeed_evaluation/core/llm/deepeval.pysrc/lightspeed_evaluation/core/llm/litellm_patch.pysrc/lightspeed_evaluation/core/llm/token_tracker.pysrc/lightspeed_evaluation/pipeline/evaluation/evaluator.pytests/unit/core/llm/test_custom.pytests/unit/core/llm/test_token_tracker.pytests/unit/pipeline/evaluation/test_evaluator.py
172bd00 to
d09b606
Compare
…LM-token-counter-for-Deepeval [LEADS-241] judge llm token counter for deepeval
Description
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Improvements
Tests