feat: add health check endpoint and related schemas with tests#1521
feat: add health check endpoint and related schemas with tests#1521
Conversation
📝 WalkthroughWalkthroughAdds a new health-check feature: collects system metrics (DB size, memory, load, storage, CPU temp), exposes a secured GET /health endpoint with a HealthCheckResponse schema, updates startup registration, and adds comprehensive endpoint tests and code-standard guidance. Changes
Sequence DiagramsequenceDiagram
participant Client
participant API as API Server
participant Handler as check_health()
participant Collector as get_health_status()
participant System as System Resources
Client->>API: GET /health (Authorization)
API->>API: validate auth
alt auth valid
API->>Handler: invoke check_health()
Handler->>Collector: request aggregated metrics
Collector->>System: get_db_size_mb()
System-->>Collector: db_size_mb
Collector->>System: get_mem_usage_pct()
System-->>Collector: mem_usage_pct
Collector->>System: get_load_avg_1m()
System-->>Collector: load_1m
Collector->>System: get_storage_pct()
System-->>Collector: storage_pct
Collector->>System: get_cpu_temp()
System-->>Collector: cpu_temp
Collector-->>Handler: health_dict
Handler-->>API: 200 HealthCheckResponse
API-->>Client: 200 OK with metrics
else auth invalid
API-->>Client: 403 Forbidden
end
alt exception during collection
Handler-->>API: error
API-->>Client: 500 Error payload
end
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
.github/skills/code-standards/SKILL.md (1)
8-14: Duplicated guidance within the same document.Lines 12–14 restate the "DRY" and "< 500 LOC" rules that already appear in the dedicated sections at Lines 18–22. Consider removing the duplicates from the bullet list to keep this skill file itself DRY, or consolidating into one place.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/skills/code-standards/SKILL.md around lines 8 - 14, The SKILL.md bullet list duplicates the "DRY" and "< 500 LOC" rules that also appear in the dedicated sections at Lines 18–22; remove the duplicate bullets from the top-level list (the lines containing "follow DRY principle" and "code files should be less than 500 LOC for better maintainability") or consolidate them by keeping the guidance only in the dedicated "DRY" and "500 LOC" sections and replacing the bullets with a single cross-reference to those sections so the document itself follows DRY.server/api_server/health_endpoint.py (2)
110-127: Minor static-analysis clean-ups inget_cpu_temp.A few quick wins flagged by Ruff:
- RUF019 (Line 116): Replace the redundant key-check with
dict.get.- B007 (Line 120): Rename the unused loop variable
sensor_typeto_.♻️ Proposed diff
- if "coretemp" in temps and temps["coretemp"]: - return int(temps["coretemp"][0].current) + coretemp = temps.get("coretemp") + if coretemp: + return int(coretemp[0].current) # Fallback to first sensor with data - for sensor_type, readings in temps.items(): + for _, readings in temps.items(): if readings: return int(readings[0].current)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/api_server/health_endpoint.py` around lines 110 - 127, In get_cpu_temp, simplify the 'coretemp' presence check by using temps.get("coretemp") instead of temps["coretemp"] and guard with truthiness, and rename the unused loop variable sensor_type to _ in the fallback for-loop (for _, readings in temps.items()) to address the B007 unused-variable warning while keeping readings as-is and preserving existing behavior and exception handling.
49-52: Usepsutil.virtual_memory().percentdirectly.
psutil.virtual_memory()already exposes a.percentattribute, avoiding the manualused / total * 100calculation and potential floating-point edge cases.♻️ Proposed simplification
try: vm = psutil.virtual_memory() - pct = int((vm.used / vm.total) * 100) - return max(0, min(100, pct)) # Clamp to 0-100 + return max(0, min(100, int(vm.percent))) except Exception as e:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/api_server/health_endpoint.py` around lines 49 - 52, Replace the manual calculation using vm.used/vm.total with psutil's built-in percent: get vm via psutil.virtual_memory(), set pct = int(vm.percent) (or int(psutil.virtual_memory().percent) directly) and then clamp with max(0, min(100, pct)). Update the references to vm and pct in the current function (the block that currently computes pct = int((vm.used / vm.total) * 100)) to use vm.percent to avoid floating-point edge cases.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server/api_server/api_server_start.py`:
- Around line 1948-1959: In check_health, avoid returning the raw exception text
to callers; retain the detailed error in the server log via the existing mylog
call and change the response body to use a fixed, generic error message (e.g.,
"Internal server error" or "Failed to retrieve health status") instead of str(e)
so it matches other 500 handlers like handle_500_error; update the return in the
except block of check_health (which calls get_health_status and logs via mylog)
to remove str(e) from the "message" field and return only the generic message
and success: False.
In `@server/api_server/health_endpoint.py`:
- Around line 42-55: get_mem_usage_pct and get_storage_pct currently return -1
on exception which violates HealthCheckResponse.mem_usage_pct/storage_pct (ge=0,
le=100); change both functions to return None on error (instead of -1) and
update the OpenAPI schema types for HealthCheckResponse.mem_usage_pct and
storage_pct to Optional[int] (allow null) while keeping the ge=0, le=100
constraints for non-null values so the response stays compliant and the
generated JSON schema permits null values.
---
Nitpick comments:
In @.github/skills/code-standards/SKILL.md:
- Around line 8-14: The SKILL.md bullet list duplicates the "DRY" and "< 500
LOC" rules that also appear in the dedicated sections at Lines 18–22; remove the
duplicate bullets from the top-level list (the lines containing "follow DRY
principle" and "code files should be less than 500 LOC for better
maintainability") or consolidate them by keeping the guidance only in the
dedicated "DRY" and "500 LOC" sections and replacing the bullets with a single
cross-reference to those sections so the document itself follows DRY.
In `@server/api_server/health_endpoint.py`:
- Around line 110-127: In get_cpu_temp, simplify the 'coretemp' presence check
by using temps.get("coretemp") instead of temps["coretemp"] and guard with
truthiness, and rename the unused loop variable sensor_type to _ in the fallback
for-loop (for _, readings in temps.items()) to address the B007 unused-variable
warning while keeping readings as-is and preserving existing behavior and
exception handling.
- Around line 49-52: Replace the manual calculation using vm.used/vm.total with
psutil's built-in percent: get vm via psutil.virtual_memory(), set pct =
int(vm.percent) (or int(psutil.virtual_memory().percent) directly) and then
clamp with max(0, min(100, pct)). Update the references to vm and pct in the
current function (the block that currently computes pct = int((vm.used /
vm.total) * 100)) to use vm.percent to avoid floating-point edge cases.
…y and storage usage
…ertX into chore_timestamps
There was a problem hiding this comment.
🧹 Nitpick comments (1)
server/api_server/health_endpoint.py (1)
106-112: Two minor style issues inget_cpu_temp(Ruff RUF019 + B007)
- Line 106 — the
"coretemp" in temps andguard is redundant;temps.get("coretemp")expresses the same intent in one step (Ruff RUF019).- Line 110 —
sensor_typeis never referenced inside the loop body; rename it to_(Ruff B007).♻️ Proposed fix
- if "coretemp" in temps and temps["coretemp"]: - return int(temps["coretemp"][0].current) + coretemp = temps.get("coretemp") + if coretemp: + return int(coretemp[0].current) # Fallback to first sensor with data - for sensor_type, readings in temps.items(): + for _, readings in temps.items(): if readings: return int(readings[0].current)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/api_server/health_endpoint.py` around lines 106 - 112, get_cpu_temp contains two stylistic issues: replace the redundant membership check "coretemp" in temps and temps["coretemp"] with temps.get("coretemp") to simplify the guard, and rename the unused loop variable sensor_type to _ in the for-loop that iterates over temps.items() so the unused variable is explicit; update the return logic to use temps.get("coretemp") and change "for sensor_type, readings in temps.items()" to "for _, readings in temps.items()" in the get_cpu_temp function.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@server/api_server/health_endpoint.py`:
- Around line 106-112: get_cpu_temp contains two stylistic issues: replace the
redundant membership check "coretemp" in temps and temps["coretemp"] with
temps.get("coretemp") to simplify the guard, and rename the unused loop variable
sensor_type to _ in the for-loop that iterates over temps.items() so the unused
variable is explicit; update the return logic to use temps.get("coretemp") and
change "for sensor_type, readings in temps.items()" to "for _, readings in
temps.items()" in the get_cpu_temp function.
Summary by CodeRabbit
New Features
Tests
Documentation