Skip to content

Improve validation and robustness of GitHub webhook handler#276

Open
Sagarshresti18 wants to merge 3 commits intoAOSSIE-Org:mainfrom
Sagarshresti18:improve-webhook-validation
Open

Improve validation and robustness of GitHub webhook handler#276
Sagarshresti18 wants to merge 3 commits intoAOSSIE-Org:mainfrom
Sagarshresti18:improve-webhook-validation

Conversation

@Sagarshresti18
Copy link

@Sagarshresti18 Sagarshresti18 commented Feb 13, 2026

This PR introduces Pydantic-based schema validation and improved error handling
for the /github/webhook endpoint.

Changes:

  • Added GitHubWebhookPayload model for structured validation
  • Prevent crashes on malformed webhook payloads
  • Improved logging and error handling
  • Safer attribute access using validated payload fields

Closes #275

Summary by CodeRabbit

  • Bug Fixes

    • Malformed or invalid webhook payloads now return clear HTTP 400/422 responses; improved warning logs for unmatched events or missing actor info.
  • New Features

    • More accurate detection and typing of webhook events (issues, comments, PR create/update/close/merge) and preservation of original payload for inspection.
  • Refactor

    • Centralized event handler registration and safer payload handling to prevent dispatch for unknown event types.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 13, 2026

📝 Walkthrough

Walkthrough

Added a Pydantic GitHubWebhookPayload model and centralized register_event_handlers(); the webhook endpoint now parses raw JSON with 400 on parse failure, validates payload with 422 on schema errors, maps GitHub actions to explicit event types, safely extracts actor/raw data, logs unmatched events, and only dispatches when an event type is determined.

Changes

Cohort / File(s) Summary
GitHub webhook handling
backend/routes.py
Added GitHubWebhookPayload (Pydantic) with action, sender, pull_request; JSON parse guarded with 400 response and validation with 422; introduced register_event_handlers() and centralized event mappings for issues, issue_comment, pull_request, pull_request_review, pull_request_review_comment; actor_id extracted from payload.sender.id with fallback; raw_data set to original payload; warnings logged for unmatched events; imports updated (ValidationError, Optional, Dict, Any).

Sequence Diagram(s)

sequenceDiagram
    participant GitHub
    participant Server
    participant Validator
    participant Mapper
    participant Dispatcher
    participant Handlers

    GitHub->>Server: POST /github/webhook (JSON + headers)
    Server->>Validator: parse raw JSON
    alt JSON invalid
        Validator-->>Server: parse error
        Server-->>GitHub: 400 Bad Request
    else JSON parsed
        Server->>Validator: validate as GitHubWebhookPayload
        alt validation error
            Validator-->>Server: validation error
            Server-->>GitHub: 422 Unprocessable Entity
        else valid payload
            Validator-->>Mapper: provide typed payload
            Mapper->>Mapper: determine event_type (action, pull_request.merged, etc.)
            alt no matching event_type
                Mapper-->>Server: log warning
                Server-->>GitHub: 200 OK (no dispatch)
            else matching event_type
                Mapper->>Dispatcher: build Event(actor_id, raw_data, event_type)
                Dispatcher->>Handlers: dispatch to registered handlers
                Handlers-->>Dispatcher: handler results
                Dispatcher-->>Server: dispatch outcome
                Server-->>GitHub: 200 OK
            end
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I munch on payloads neat and bright,
I check each field from morning to night.
Action, sender, merge—I'll verify,
Then hop the event off, swift and spry.
A rabbit waves: "Validated and right!"

🚥 Pre-merge checks | ✅ 5 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly summarizes the main change: improving validation and robustness of the GitHub webhook handler, which aligns with the primary objective of adding schema validation and error handling.
Linked Issues check ✅ Passed The PR implementation addresses all coding requirements from issue #275: adds Pydantic-based schema validation with GitHubWebhookPayload model, improves error handling for JSON parsing and validation failures, implements structured logging for unsupported event types, and ensures graceful handling of unexpected payloads.
Out of Scope Changes check ✅ Passed All changes are directly aligned with the linked issue objectives; no out-of-scope modifications were introduced beyond the webhook validation improvements, event handler registration, and error handling enhancements.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments
backend/routes.py (3)

17-20: Pydantic model with all-optional fields provides no meaningful validation.

Since every field has a default of None, any JSON object (including {}) will pass validation, making the ValidationError catch on line 57 effectively unreachable for typical webhook payloads. Consider adding at least a few required or constrained fields that GitHub always sends (e.g., action is present on most event payloads), or use a model_validator to enforce that at least some expected structure exists.

 class GitHubWebhookPayload(BaseModel):
-    action: Optional[str] = None
-    sender: Optional[Dict[str, Any]] = None
-    pull_request: Optional[Dict[str, Any]] = None
+    action: str
+    sender: Optional[Dict[str, Any]] = None
+    pull_request: Optional[Dict[str, Any]] = None
+
+    model_config = {"extra": "allow"}

Making action required would catch truly malformed payloads, while extra = "allow" ensures unknown fields from GitHub don't cause rejection. Note that a few GitHub events (e.g., ping) may not have action, so you may want to handle those separately or keep action optional but add a root validator with custom logic.


57-59: Use logging.exception to include the traceback for validation failures.

Per Ruff TRY400, logging.exception automatically appends the traceback, which is more useful for debugging schema mismatches than the stringified error alone.

     except ValidationError as e:
-        logging.error(f"Webhook schema validation failed: {e}")
+        logging.exception("Webhook schema validation failed")
         raise HTTPException(status_code=422, detail="Invalid webhook schema") from e

33-45: Missing handler registration for PR_REVIEWED and module-level side effect.

If you add handling for pull_request_reviewPR_REVIEWED (per the comment above), you'll also need to register a handler for it here. Additionally, calling register_event_handlers() at module import time (line 45) means importing this module has side effects, which can complicate testing. Consider deferring this to app startup (e.g., a FastAPI lifespan or on_event("startup") hook).


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@backend/routes.py`:
- Around line 49-54: Narrow the broad except: catch the specific errors from
parsing and validation (ValueError for request.json() and
pydantic.ValidationError for GitHubWebhookPayload) instead of Exception, log the
failure with logging.exception to include the traceback, and re-raise the
HTTPException using raise HTTPException(status_code=400, detail="Invalid webhook
payload") from e to preserve the exception chain; update the try/except around
request.json() and GitHubWebhookPayload(**raw_payload) accordingly.
🧹 Nitpick comments (3)
backend/routes.py (3)

17-20: Webhook payload model validates almost nothing in practice.

Since every field is Optional with a None default, any JSON object (including {}) passes validation. The model also omits commonly needed fields like issue, comment, repository, and number, so it doesn't guard against structurally malformed payloads for most event types.

Consider:

  1. Adding fields for issue, comment, and repository (at minimum) so they're accessible via the model.
  2. Adding a model-level validator or required field (e.g., making action required for actionable events) to reject truly empty/garbage payloads.
♻️ Suggested expanded model
 class GitHubWebhookPayload(BaseModel):
+    model_config = {"extra": "allow"}
+
     action: Optional[str] = None
     sender: Optional[Dict[str, Any]] = None
     pull_request: Optional[Dict[str, Any]] = None
+    issue: Optional[Dict[str, Any]] = None
+    comment: Optional[Dict[str, Any]] = None
+    repository: Optional[Dict[str, Any]] = None

Note: extra = "allow" preserves any additional fields GitHub sends, which is useful since webhook schemas evolve.


33-45: Module-level register_event_handlers() call risks duplicate handler registration.

register_event_handlers() is invoked at import time (line 45). If this module is imported more than once (e.g., during testing or hot-reload), event_bus.register_handler likely appends to a list, resulting in sample_handler being called multiple times per event.

Consider adding an idempotency guard:

🛡️ Suggested guard
+_handlers_registered = False
+
 def register_event_handlers():
+    global _handlers_registered
+    if _handlers_registered:
+        return
     # Issue events
     event_bus.register_handler(EventType.ISSUE_CREATED, sample_handler, PlatformType.GITHUB)
     ...
+    _handlers_registered = True

28-31: sample_handler appears to be placeholder/debug code.

This handler only logs the event and does no real processing. If this is intentional scaffolding, consider adding a comment or a TODO to replace it with actual handler implementations. If it's meant for production, it should be moved to its own module under the handler package for better separation of concerns.

Comment on lines 49 to 54
try:
raw_payload = await request.json()
payload = GitHubWebhookPayload(**raw_payload)
except Exception as e:
logging.error(f"Invalid webhook payload: {e}")
raise HTTPException(status_code=400, detail="Invalid webhook payload")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Improve exception handling per static analysis findings.

Three valid concerns flagged by Ruff:

  1. Blind Exception catch (BLE001): Narrow to the expected types.
  2. logging.errorlogging.exception (TRY400): Includes the traceback automatically.
  3. Missing raise ... from (B904): Preserves the exception chain for debugging.
🔧 Proposed fix
+    from pydantic import ValidationError
     try:
         raw_payload = await request.json()
         payload = GitHubWebhookPayload(**raw_payload)
-    except Exception as e:
-        logging.error(f"Invalid webhook payload: {e}")
-        raise HTTPException(status_code=400, detail="Invalid webhook payload")
+    except (ValueError, ValidationError) as e:
+        logging.exception("Invalid webhook payload")
+        raise HTTPException(status_code=400, detail="Invalid webhook payload") from e

ValueError covers JSON decode failures from request.json(), and ValidationError covers Pydantic schema violations. The from e clause chains the original exception.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
raw_payload = await request.json()
payload = GitHubWebhookPayload(**raw_payload)
except Exception as e:
logging.error(f"Invalid webhook payload: {e}")
raise HTTPException(status_code=400, detail="Invalid webhook payload")
from pydantic import ValidationError
try:
raw_payload = await request.json()
payload = GitHubWebhookPayload(**raw_payload)
except (ValueError, ValidationError) as e:
logging.exception("Invalid webhook payload")
raise HTTPException(status_code=400, detail="Invalid webhook payload") from e
🧰 Tools
🪛 Ruff (0.15.0)

[warning] 52-52: Do not catch blind exception: Exception

(BLE001)


[warning] 53-53: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


[warning] 54-54: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🤖 Prompt for AI Agents
In `@backend/routes.py` around lines 49 - 54, Narrow the broad except: catch the
specific errors from parsing and validation (ValueError for request.json() and
pydantic.ValidationError for GitHubWebhookPayload) instead of Exception, log the
failure with logging.exception to include the traceback, and re-raise the
HTTPException using raise HTTPException(status_code=400, detail="Invalid webhook
payload") from e to preserve the exception chain; update the try/except around
request.json() and GitHubWebhookPayload(**raw_payload) accordingly.

@Sagarshresti18
Copy link
Author

Thanks for the detailed automated review!

I’ll address the static analysis suggestions by refining the exception handling:

  • Separate JSON parsing errors and schema validation errors
  • Improve logging clarity for malformed payloads
  • Preserve original exception context using raise ... from e

I’ll push an updated commit shortly to incorporate these improvements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@backend/routes.py`:
- Around line 47-48: The github_webhook handler lacks HMAC verification of the
X-Hub-Signature-256 header, so update the github_webhook function to read the
raw request body bytes, compute the expected "sha256=" HMAC using a shared
secret (e.g., GITHUB_WEBHOOK_SECRET from env), and use hmac.compare_digest to
compare the header to the expected value; if it fails, raise
HTTPException(status_code=403) and return early, otherwise parse the JSON from
the raw body and continue processing. Ensure you reference the header name
"X-Hub-Signature-256", avoid calling request.json() before verification, and use
hashlib.sha256 for the HMAC calculation inside github_webhook.
- Around line 97-100: The branch is handling a non-existent
"pull_request_comment" event; remove that dead branch and instead add handling
for the real review-submission event "pull_request_review": keep the existing
handling of "pull_request_review_comment" mapping created ->
EventType.PR_COMMENTED, and add a new branch for event_header ==
"pull_request_review" that checks payload.action == "submitted" and sets
event_type = EventType.PR_REVIEWED (use the existing EventType.PR_REVIEWED
enum); ensure you remove references to "pull_request_comment" so only valid
GitHub event names are used.
🧹 Nitpick comments (2)
backend/routes.py (2)

17-20: Model with all-optional fields provides minimal validation.

Since every field defaults to None, any JSON object (including {}) passes validation — the except ValidationError block on line 57 is essentially unreachable for well-formed JSON. Consider making at minimum action required (GitHub always sends it for supported events), or adding a model validator to enforce that at least one expected field is present.

♻️ Suggested improvement
 class GitHubWebhookPayload(BaseModel):
-    action: Optional[str] = None
+    action: str
     sender: Optional[Dict[str, Any]] = None
     pull_request: Optional[Dict[str, Any]] = None

33-45: Handler registration uses only a placeholder — consider a TODO or docstring.

All event types are wired to sample_handler, which only logs. This is fine for scaffolding, but a brief docstring or TODO would clarify that real handlers need to be plugged in before this is production-ready.

Comment on lines 47 to 48
@router.post("/github/webhook")
async def github_webhook(request: Request):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if webhook signature verification exists anywhere in the codebase
rg -n -i "hub.signature\|webhook.secret\|hmac\|compare_digest" --type=py

Repository: AOSSIE-Org/Devr.AI

Length of output: 44


🏁 Script executed:

#!/bin/bash
# Also check for X-Hub-Signature references
rg -n "X-Hub-Signature\|signature" --type=py

Repository: AOSSIE-Org/Devr.AI

Length of output: 44


🏁 Script executed:

#!/bin/bash
# Get the full implementation of the webhook handler
cat -n backend/routes.py | head -100

Repository: AOSSIE-Org/Devr.AI

Length of output: 4624


🏁 Script executed:

#!/bin/bash
# Check if there's more content in routes.py after line 100
wc -l backend/routes.py

Repository: AOSSIE-Org/Devr.AI

Length of output: 83


🏁 Script executed:

#!/bin/bash
# Get the rest of the webhook handler
tail -n +100 backend/routes.py | head -50

Repository: AOSSIE-Org/Devr.AI

Length of output: 645


🏁 Script executed:

#!/bin/bash
# Search for any middleware or decorators that might handle signature verification
rg -n "middleware\|verify\|decorator" --type=py backend/

Repository: AOSSIE-Org/Devr.AI

Length of output: 44


Missing webhook signature verification (HMAC).

The endpoint does not verify the X-Hub-Signature-256 header against a shared secret. Without this, anyone can POST to this endpoint and trigger event processing. This is a significant security gap for a production webhook handler. Consider verifying the signature before processing the payload.

🔧 Example approach
import hmac
import hashlib

GITHUB_WEBHOOK_SECRET = os.environ.get("GITHUB_WEBHOOK_SECRET", "")

`@router.post`("/github/webhook")
async def github_webhook(request: Request):
    body = await request.body()
    signature = request.headers.get("X-Hub-Signature-256", "")
    expected = "sha256=" + hmac.new(
        GITHUB_WEBHOOK_SECRET.encode(), body, hashlib.sha256
    ).hexdigest()
    if not hmac.compare_digest(signature, expected):
        raise HTTPException(status_code=403, detail="Invalid signature")
    
    raw_payload = json.loads(body)
    # ... rest of handler

Note: when reading the body for HMAC verification, you'll need to parse JSON from the raw bytes yourself rather than calling request.json() twice.

🤖 Prompt for AI Agents
In `@backend/routes.py` around lines 47 - 48, The github_webhook handler lacks
HMAC verification of the X-Hub-Signature-256 header, so update the
github_webhook function to read the raw request body bytes, compute the expected
"sha256=" HMAC using a shared secret (e.g., GITHUB_WEBHOOK_SECRET from env), and
use hmac.compare_digest to compare the header to the expected value; if it
fails, raise HTTPException(status_code=403) and return early, otherwise parse
the JSON from the raw body and continue processing. Ensure you reference the
header name "X-Hub-Signature-256", avoid calling request.json() before
verification, and use hashlib.sha256 for the HMAC calculation inside
github_webhook.

Comment on lines 97 to 100
elif event_header in ["pull_request_review_comment", "pull_request_comment"]:
action = payload.get("action")
action = payload.action
if action == "created":
event_type = EventType.PR_COMMENTED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

GitHub webhook pull_request events documentation

💡 Result:

GitHub’s official documentation for pull_request webhook events is here (section “pull_request”):

https://docs.github.com/en/enterprise-cloud@latest/webhooks/webhook-events-and-payloads#pull_request

Key points from the docs:

  • The delivery includes headers like X-GitHub-Event: pull_request (plus delivery ID and signature headers). (docs.github.com)
  • The payload includes an action field indicating what happened (for example: opened, edited, closed, reopened, assigned, unassigned, labeled, unlabeled, review_requested, review_request_removed, synchronize). (docs.github.com)
  • How to create/configure a webhook in the GitHub UI is documented here: (docs.github.com)

Citations:


🏁 Script executed:

# Find and check the EventType enum in the codebase
find . -type f -name "*.py" | xargs rg -l "EventType" | head -5

Repository: AOSSIE-Org/Devr.AI

Length of output: 264


🏁 Script executed:

# Find EventType enum definition
rg -n "class EventType|enum EventType" backend/

Repository: AOSSIE-Org/Devr.AI

Length of output: 125


🌐 Web query:

GitHub webhook pull_request_review_comment pull_request_review event types

💡 Result:

For GitHub webhooks (and the same “activity types” you can filter on in GitHub Actions), the action field can be:

  • pull_request_review

    • submitted
    • edited
    • dismissed [1]
  • pull_request_review_comment (comments on the PR diff / “Files changed”)

    • created
    • edited
    • deleted [1]

Notes: “Approved” / “changes_requested” are not action values; they appear as review.state when pull_request_review has action: submitted. [1]

Sources
[1] GitHub Docs — “Events that trigger workflows” (pull_request_review, pull_request_review_comment) https://docs.github.com/actions/automating-your-workflow-with-github-actions/events-that-trigger-workflows


🏁 Script executed:

# Read the EventType enum to check available event types
cat -n backend/app/core/events/enums.py

Repository: AOSSIE-Org/Devr.AI

Length of output: 1217


"pull_request_comment" is not a valid GitHub webhook event.

GitHub sends "pull_request_review_comment" for inline review comments and "pull_request_review" for review submissions — there is no "pull_request_comment" event. This branch is dead code. If the intent is to also handle review submissions, use "pull_request_review" and map it to PR_REVIEWED (which exists in your EventType enum).

🔧 Suggested fix
-    elif event_header in ["pull_request_review_comment", "pull_request_comment"]:
+    elif event_header in ["pull_request_review_comment"]:
         action = payload.action
         if action == "created":
             event_type = EventType.PR_COMMENTED

And optionally add a separate handler for review submissions:

    elif event_header == "pull_request_review":
        action = payload.action
        if action == "submitted":
            event_type = EventType.PR_REVIEWED
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
elif event_header in ["pull_request_review_comment", "pull_request_comment"]:
action = payload.get("action")
action = payload.action
if action == "created":
event_type = EventType.PR_COMMENTED
elif event_header in ["pull_request_review_comment"]:
action = payload.action
if action == "created":
event_type = EventType.PR_COMMENTED
🤖 Prompt for AI Agents
In `@backend/routes.py` around lines 97 - 100, The branch is handling a
non-existent "pull_request_comment" event; remove that dead branch and instead
add handling for the real review-submission event "pull_request_review": keep
the existing handling of "pull_request_review_comment" mapping created ->
EventType.PR_COMMENTED, and add a new branch for event_header ==
"pull_request_review" that checks payload.action == "submitted" and sets
event_type = EventType.PR_REVIEWED (use the existing EventType.PR_REVIEWED
enum); ensure you remove references to "pull_request_comment" so only valid
GitHub event names are used.

@Sagarshresti18
Copy link
Author

Thanks for the suggestion!

You’re right that verifying the X-Hub-Signature-256 header would improve security by ensuring that webhook requests originate from GitHub. Since this PR focuses on adding schema validation and structured error handling for webhook payloads, I’ve kept the scope limited to validation improvements.

I’d be happy to open a follow-up PR to add HMAC signature verification for the webhook endpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve validation and robustness of GitHub webhook handler

1 participant

Comments