Skip to content

feat: add health check endpoint and related schemas with tests#1521

Merged
jokob-sk merged 4 commits intomainfrom
chore_timestamps
Feb 17, 2026
Merged

feat: add health check endpoint and related schemas with tests#1521
jokob-sk merged 4 commits intomainfrom
chore_timestamps

Conversation

@jokob-sk
Copy link
Collaborator

@jokob-sk jokob-sk commented Feb 17, 2026

Summary by CodeRabbit

  • New Features

    • Added a new /health API endpoint returning system health metrics (DB size, memory use, 1m CPU load, disk usage, CPU temp); accessible via authenticated requests.
  • Tests

    • Added comprehensive tests validating endpoint auth, response structure, metric ranges, error handling, and HTTP method behavior.
  • Documentation

    • Updated Code Standards with pre-step review, planning steps, and explicit maintainability principles (DRY, no duplicates, size guideline).

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 17, 2026

📝 Walkthrough

Walkthrough

Adds a new health-check feature: collects system metrics (DB size, memory, load, storage, CPU temp), exposes a secured GET /health endpoint with a HealthCheckResponse schema, updates startup registration, and adds comprehensive endpoint tests and code-standard guidance.

Changes

Cohort / File(s) Summary
Health check implementation
server/api_server/health_endpoint.py
New module providing functions to collect system metrics: get_db_size_mb, get_mem_usage_pct, get_load_avg_1m, get_storage_pct, get_cpu_temp, and an aggregator get_health_status. Includes error handling and logging.
API integration
server/api_server/api_server_start.py
Registers a new authenticated GET /health route, imports the health collector, exposes check_health route handler and HealthCheckResponse in the public OpenAPI surface. Returns 200 with metrics or 500 on error.
OpenAPI schema
server/api_server/openapi/schemas.py
Adds HealthCheckResponse (inherits BaseResponse) with fields: db_size_mb, mem_usage_pct, load_1m, storage_pct, cpu_temp; includes example and extra="allow".
Tests
test/api_endpoints/test_health_endpoints.py
New comprehensive tests: authentication (missing/invalid/valid), response shape and types, metric range/consistency checks, error-mocking path, HTTP method restrictions, and token-via-query support.
Documentation / Standards
.github/skills/code-standards/SKILL.md
Adds pre-step review and planning guidance, clarifies review process, mandates test creation as final step, and adds maintainability bullets (avoid duplicates, DRY, sub-500 LOC guideline).

Sequence Diagram

sequenceDiagram
    participant Client
    participant API as API Server
    participant Handler as check_health()
    participant Collector as get_health_status()
    participant System as System Resources

    Client->>API: GET /health (Authorization)
    API->>API: validate auth
    alt auth valid
        API->>Handler: invoke check_health()
        Handler->>Collector: request aggregated metrics
        Collector->>System: get_db_size_mb()
        System-->>Collector: db_size_mb
        Collector->>System: get_mem_usage_pct()
        System-->>Collector: mem_usage_pct
        Collector->>System: get_load_avg_1m()
        System-->>Collector: load_1m
        Collector->>System: get_storage_pct()
        System-->>Collector: storage_pct
        Collector->>System: get_cpu_temp()
        System-->>Collector: cpu_temp
        Collector-->>Handler: health_dict
        Handler-->>API: 200 HealthCheckResponse
        API-->>Client: 200 OK with metrics
    else auth invalid
        API-->>Client: 403 Forbidden
    end
    alt exception during collection
        Handler-->>API: error
        API-->>Client: 500 Error payload
    end
Loading

Poem

🐰 I hopped through bytes and midnight logs,
I fetched the bays where data clogs,
DB size, load, and memory hum,
CPU whispers, disk drums drum,
A rabbit cheers — the system's plumb!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: adding a health check endpoint with schemas and tests across multiple files.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chore_timestamps

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
.github/skills/code-standards/SKILL.md (1)

8-14: Duplicated guidance within the same document.

Lines 12–14 restate the "DRY" and "< 500 LOC" rules that already appear in the dedicated sections at Lines 18–22. Consider removing the duplicates from the bullet list to keep this skill file itself DRY, or consolidating into one place.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/skills/code-standards/SKILL.md around lines 8 - 14, The SKILL.md
bullet list duplicates the "DRY" and "< 500 LOC" rules that also appear in the
dedicated sections at Lines 18–22; remove the duplicate bullets from the
top-level list (the lines containing "follow DRY principle" and "code files
should be less than 500 LOC for better maintainability") or consolidate them by
keeping the guidance only in the dedicated "DRY" and "500 LOC" sections and
replacing the bullets with a single cross-reference to those sections so the
document itself follows DRY.
server/api_server/health_endpoint.py (2)

110-127: Minor static-analysis clean-ups in get_cpu_temp.

A few quick wins flagged by Ruff:

  1. RUF019 (Line 116): Replace the redundant key-check with dict.get.
  2. B007 (Line 120): Rename the unused loop variable sensor_type to _.
♻️ Proposed diff
-        if "coretemp" in temps and temps["coretemp"]:
-            return int(temps["coretemp"][0].current)
+        coretemp = temps.get("coretemp")
+        if coretemp:
+            return int(coretemp[0].current)
 
         # Fallback to first sensor with data
-        for sensor_type, readings in temps.items():
+        for _, readings in temps.items():
             if readings:
                 return int(readings[0].current)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/api_server/health_endpoint.py` around lines 110 - 127, In
get_cpu_temp, simplify the 'coretemp' presence check by using
temps.get("coretemp") instead of temps["coretemp"] and guard with truthiness,
and rename the unused loop variable sensor_type to _ in the fallback for-loop
(for _, readings in temps.items()) to address the B007 unused-variable warning
while keeping readings as-is and preserving existing behavior and exception
handling.

49-52: Use psutil.virtual_memory().percent directly.

psutil.virtual_memory() already exposes a .percent attribute, avoiding the manual used / total * 100 calculation and potential floating-point edge cases.

♻️ Proposed simplification
     try:
         vm = psutil.virtual_memory()
-        pct = int((vm.used / vm.total) * 100)
-        return max(0, min(100, pct))  # Clamp to 0-100
+        return max(0, min(100, int(vm.percent)))
     except Exception as e:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/api_server/health_endpoint.py` around lines 49 - 52, Replace the
manual calculation using vm.used/vm.total with psutil's built-in percent: get vm
via psutil.virtual_memory(), set pct = int(vm.percent) (or
int(psutil.virtual_memory().percent) directly) and then clamp with max(0,
min(100, pct)). Update the references to vm and pct in the current function (the
block that currently computes pct = int((vm.used / vm.total) * 100)) to use
vm.percent to avoid floating-point edge cases.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/api_server/api_server_start.py`:
- Around line 1948-1959: In check_health, avoid returning the raw exception text
to callers; retain the detailed error in the server log via the existing mylog
call and change the response body to use a fixed, generic error message (e.g.,
"Internal server error" or "Failed to retrieve health status") instead of str(e)
so it matches other 500 handlers like handle_500_error; update the return in the
except block of check_health (which calls get_health_status and logs via mylog)
to remove str(e) from the "message" field and return only the generic message
and success: False.

In `@server/api_server/health_endpoint.py`:
- Around line 42-55: get_mem_usage_pct and get_storage_pct currently return -1
on exception which violates HealthCheckResponse.mem_usage_pct/storage_pct (ge=0,
le=100); change both functions to return None on error (instead of -1) and
update the OpenAPI schema types for HealthCheckResponse.mem_usage_pct and
storage_pct to Optional[int] (allow null) while keeping the ge=0, le=100
constraints for non-null values so the response stays compliant and the
generated JSON schema permits null values.

---

Nitpick comments:
In @.github/skills/code-standards/SKILL.md:
- Around line 8-14: The SKILL.md bullet list duplicates the "DRY" and "< 500
LOC" rules that also appear in the dedicated sections at Lines 18–22; remove the
duplicate bullets from the top-level list (the lines containing "follow DRY
principle" and "code files should be less than 500 LOC for better
maintainability") or consolidate them by keeping the guidance only in the
dedicated "DRY" and "500 LOC" sections and replacing the bullets with a single
cross-reference to those sections so the document itself follows DRY.

In `@server/api_server/health_endpoint.py`:
- Around line 110-127: In get_cpu_temp, simplify the 'coretemp' presence check
by using temps.get("coretemp") instead of temps["coretemp"] and guard with
truthiness, and rename the unused loop variable sensor_type to _ in the fallback
for-loop (for _, readings in temps.items()) to address the B007 unused-variable
warning while keeping readings as-is and preserving existing behavior and
exception handling.
- Around line 49-52: Replace the manual calculation using vm.used/vm.total with
psutil's built-in percent: get vm via psutil.virtual_memory(), set pct =
int(vm.percent) (or int(psutil.virtual_memory().percent) directly) and then
clamp with max(0, min(100, pct)). Update the references to vm and pct in the
current function (the block that currently computes pct = int((vm.used /
vm.total) * 100)) to use vm.percent to avoid floating-point edge cases.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
server/api_server/health_endpoint.py (1)

106-112: Two minor style issues in get_cpu_temp (Ruff RUF019 + B007)

  1. Line 106 — the "coretemp" in temps and guard is redundant; temps.get("coretemp") expresses the same intent in one step (Ruff RUF019).
  2. Line 110sensor_type is never referenced inside the loop body; rename it to _ (Ruff B007).
♻️ Proposed fix
-        if "coretemp" in temps and temps["coretemp"]:
-            return int(temps["coretemp"][0].current)
+        coretemp = temps.get("coretemp")
+        if coretemp:
+            return int(coretemp[0].current)

         # Fallback to first sensor with data
-        for sensor_type, readings in temps.items():
+        for _, readings in temps.items():
             if readings:
                 return int(readings[0].current)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/api_server/health_endpoint.py` around lines 106 - 112, get_cpu_temp
contains two stylistic issues: replace the redundant membership check "coretemp"
in temps and temps["coretemp"] with temps.get("coretemp") to simplify the guard,
and rename the unused loop variable sensor_type to _ in the for-loop that
iterates over temps.items() so the unused variable is explicit; update the
return logic to use temps.get("coretemp") and change "for sensor_type, readings
in temps.items()" to "for _, readings in temps.items()" in the get_cpu_temp
function.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@server/api_server/health_endpoint.py`:
- Around line 106-112: get_cpu_temp contains two stylistic issues: replace the
redundant membership check "coretemp" in temps and temps["coretemp"] with
temps.get("coretemp") to simplify the guard, and rename the unused loop variable
sensor_type to _ in the for-loop that iterates over temps.items() so the unused
variable is explicit; update the return logic to use temps.get("coretemp") and
change "for sensor_type, readings in temps.items()" to "for _, readings in
temps.items()" in the get_cpu_temp function.

@jokob-sk jokob-sk merged commit 32e2d57 into main Feb 17, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments