Improve validation and robustness of GitHub webhook handler#276
Improve validation and robustness of GitHub webhook handler#276Sagarshresti18 wants to merge 3 commits intoAOSSIE-Org:mainfrom
Conversation
📝 WalkthroughWalkthroughAdded a Pydantic Changes
Sequence Diagram(s)sequenceDiagram
participant GitHub
participant Server
participant Validator
participant Mapper
participant Dispatcher
participant Handlers
GitHub->>Server: POST /github/webhook (JSON + headers)
Server->>Validator: parse raw JSON
alt JSON invalid
Validator-->>Server: parse error
Server-->>GitHub: 400 Bad Request
else JSON parsed
Server->>Validator: validate as GitHubWebhookPayload
alt validation error
Validator-->>Server: validation error
Server-->>GitHub: 422 Unprocessable Entity
else valid payload
Validator-->>Mapper: provide typed payload
Mapper->>Mapper: determine event_type (action, pull_request.merged, etc.)
alt no matching event_type
Mapper-->>Server: log warning
Server-->>GitHub: 200 OK (no dispatch)
else matching event_type
Mapper->>Dispatcher: build Event(actor_id, raw_data, event_type)
Dispatcher->>Handlers: dispatch to registered handlers
Handlers-->>Dispatcher: handler results
Dispatcher-->>Server: dispatch outcome
Server-->>GitHub: 200 OK
end
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
No actionable comments were generated in the recent review. 🎉 🧹 Recent nitpick comments
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@backend/routes.py`:
- Around line 49-54: Narrow the broad except: catch the specific errors from
parsing and validation (ValueError for request.json() and
pydantic.ValidationError for GitHubWebhookPayload) instead of Exception, log the
failure with logging.exception to include the traceback, and re-raise the
HTTPException using raise HTTPException(status_code=400, detail="Invalid webhook
payload") from e to preserve the exception chain; update the try/except around
request.json() and GitHubWebhookPayload(**raw_payload) accordingly.
🧹 Nitpick comments (3)
backend/routes.py (3)
17-20: Webhook payload model validates almost nothing in practice.Since every field is
Optionalwith aNonedefault, any JSON object (including{}) passes validation. The model also omits commonly needed fields likeissue,comment,repository, andnumber, so it doesn't guard against structurally malformed payloads for most event types.Consider:
- Adding fields for
issue,comment, andrepository(at minimum) so they're accessible via the model.- Adding a model-level validator or required field (e.g., making
actionrequired for actionable events) to reject truly empty/garbage payloads.♻️ Suggested expanded model
class GitHubWebhookPayload(BaseModel): + model_config = {"extra": "allow"} + action: Optional[str] = None sender: Optional[Dict[str, Any]] = None pull_request: Optional[Dict[str, Any]] = None + issue: Optional[Dict[str, Any]] = None + comment: Optional[Dict[str, Any]] = None + repository: Optional[Dict[str, Any]] = NoneNote:
extra = "allow"preserves any additional fields GitHub sends, which is useful since webhook schemas evolve.
33-45: Module-levelregister_event_handlers()call risks duplicate handler registration.
register_event_handlers()is invoked at import time (line 45). If this module is imported more than once (e.g., during testing or hot-reload),event_bus.register_handlerlikely appends to a list, resulting insample_handlerbeing called multiple times per event.Consider adding an idempotency guard:
🛡️ Suggested guard
+_handlers_registered = False + def register_event_handlers(): + global _handlers_registered + if _handlers_registered: + return # Issue events event_bus.register_handler(EventType.ISSUE_CREATED, sample_handler, PlatformType.GITHUB) ... + _handlers_registered = True
28-31:sample_handlerappears to be placeholder/debug code.This handler only logs the event and does no real processing. If this is intentional scaffolding, consider adding a comment or a
TODOto replace it with actual handler implementations. If it's meant for production, it should be moved to its own module under the handler package for better separation of concerns.
backend/routes.py
Outdated
| try: | ||
| raw_payload = await request.json() | ||
| payload = GitHubWebhookPayload(**raw_payload) | ||
| except Exception as e: | ||
| logging.error(f"Invalid webhook payload: {e}") | ||
| raise HTTPException(status_code=400, detail="Invalid webhook payload") |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Improve exception handling per static analysis findings.
Three valid concerns flagged by Ruff:
- Blind
Exceptioncatch (BLE001): Narrow to the expected types. logging.error→logging.exception(TRY400): Includes the traceback automatically.- Missing
raise ... from(B904): Preserves the exception chain for debugging.
🔧 Proposed fix
+ from pydantic import ValidationError
try:
raw_payload = await request.json()
payload = GitHubWebhookPayload(**raw_payload)
- except Exception as e:
- logging.error(f"Invalid webhook payload: {e}")
- raise HTTPException(status_code=400, detail="Invalid webhook payload")
+ except (ValueError, ValidationError) as e:
+ logging.exception("Invalid webhook payload")
+ raise HTTPException(status_code=400, detail="Invalid webhook payload") from eValueError covers JSON decode failures from request.json(), and ValidationError covers Pydantic schema violations. The from e clause chains the original exception.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| raw_payload = await request.json() | |
| payload = GitHubWebhookPayload(**raw_payload) | |
| except Exception as e: | |
| logging.error(f"Invalid webhook payload: {e}") | |
| raise HTTPException(status_code=400, detail="Invalid webhook payload") | |
| from pydantic import ValidationError | |
| try: | |
| raw_payload = await request.json() | |
| payload = GitHubWebhookPayload(**raw_payload) | |
| except (ValueError, ValidationError) as e: | |
| logging.exception("Invalid webhook payload") | |
| raise HTTPException(status_code=400, detail="Invalid webhook payload") from e |
🧰 Tools
🪛 Ruff (0.15.0)
[warning] 52-52: Do not catch blind exception: Exception
(BLE001)
[warning] 53-53: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
[warning] 54-54: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🤖 Prompt for AI Agents
In `@backend/routes.py` around lines 49 - 54, Narrow the broad except: catch the
specific errors from parsing and validation (ValueError for request.json() and
pydantic.ValidationError for GitHubWebhookPayload) instead of Exception, log the
failure with logging.exception to include the traceback, and re-raise the
HTTPException using raise HTTPException(status_code=400, detail="Invalid webhook
payload") from e to preserve the exception chain; update the try/except around
request.json() and GitHubWebhookPayload(**raw_payload) accordingly.
|
Thanks for the detailed automated review! I’ll address the static analysis suggestions by refining the exception handling:
I’ll push an updated commit shortly to incorporate these improvements. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@backend/routes.py`:
- Around line 47-48: The github_webhook handler lacks HMAC verification of the
X-Hub-Signature-256 header, so update the github_webhook function to read the
raw request body bytes, compute the expected "sha256=" HMAC using a shared
secret (e.g., GITHUB_WEBHOOK_SECRET from env), and use hmac.compare_digest to
compare the header to the expected value; if it fails, raise
HTTPException(status_code=403) and return early, otherwise parse the JSON from
the raw body and continue processing. Ensure you reference the header name
"X-Hub-Signature-256", avoid calling request.json() before verification, and use
hashlib.sha256 for the HMAC calculation inside github_webhook.
- Around line 97-100: The branch is handling a non-existent
"pull_request_comment" event; remove that dead branch and instead add handling
for the real review-submission event "pull_request_review": keep the existing
handling of "pull_request_review_comment" mapping created ->
EventType.PR_COMMENTED, and add a new branch for event_header ==
"pull_request_review" that checks payload.action == "submitted" and sets
event_type = EventType.PR_REVIEWED (use the existing EventType.PR_REVIEWED
enum); ensure you remove references to "pull_request_comment" so only valid
GitHub event names are used.
🧹 Nitpick comments (2)
backend/routes.py (2)
17-20: Model with all-optional fields provides minimal validation.Since every field defaults to
None, any JSON object (including{}) passes validation — theexcept ValidationErrorblock on line 57 is essentially unreachable for well-formed JSON. Consider making at minimumactionrequired (GitHub always sends it for supported events), or adding a model validator to enforce that at least one expected field is present.♻️ Suggested improvement
class GitHubWebhookPayload(BaseModel): - action: Optional[str] = None + action: str sender: Optional[Dict[str, Any]] = None pull_request: Optional[Dict[str, Any]] = None
33-45: Handler registration uses only a placeholder — consider a TODO or docstring.All event types are wired to
sample_handler, which only logs. This is fine for scaffolding, but a brief docstring or TODO would clarify that real handlers need to be plugged in before this is production-ready.
| @router.post("/github/webhook") | ||
| async def github_webhook(request: Request): |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if webhook signature verification exists anywhere in the codebase
rg -n -i "hub.signature\|webhook.secret\|hmac\|compare_digest" --type=pyRepository: AOSSIE-Org/Devr.AI
Length of output: 44
🏁 Script executed:
#!/bin/bash
# Also check for X-Hub-Signature references
rg -n "X-Hub-Signature\|signature" --type=pyRepository: AOSSIE-Org/Devr.AI
Length of output: 44
🏁 Script executed:
#!/bin/bash
# Get the full implementation of the webhook handler
cat -n backend/routes.py | head -100Repository: AOSSIE-Org/Devr.AI
Length of output: 4624
🏁 Script executed:
#!/bin/bash
# Check if there's more content in routes.py after line 100
wc -l backend/routes.pyRepository: AOSSIE-Org/Devr.AI
Length of output: 83
🏁 Script executed:
#!/bin/bash
# Get the rest of the webhook handler
tail -n +100 backend/routes.py | head -50Repository: AOSSIE-Org/Devr.AI
Length of output: 645
🏁 Script executed:
#!/bin/bash
# Search for any middleware or decorators that might handle signature verification
rg -n "middleware\|verify\|decorator" --type=py backend/Repository: AOSSIE-Org/Devr.AI
Length of output: 44
Missing webhook signature verification (HMAC).
The endpoint does not verify the X-Hub-Signature-256 header against a shared secret. Without this, anyone can POST to this endpoint and trigger event processing. This is a significant security gap for a production webhook handler. Consider verifying the signature before processing the payload.
🔧 Example approach
import hmac
import hashlib
GITHUB_WEBHOOK_SECRET = os.environ.get("GITHUB_WEBHOOK_SECRET", "")
`@router.post`("/github/webhook")
async def github_webhook(request: Request):
body = await request.body()
signature = request.headers.get("X-Hub-Signature-256", "")
expected = "sha256=" + hmac.new(
GITHUB_WEBHOOK_SECRET.encode(), body, hashlib.sha256
).hexdigest()
if not hmac.compare_digest(signature, expected):
raise HTTPException(status_code=403, detail="Invalid signature")
raw_payload = json.loads(body)
# ... rest of handlerNote: when reading the body for HMAC verification, you'll need to parse JSON from the raw bytes yourself rather than calling request.json() twice.
🤖 Prompt for AI Agents
In `@backend/routes.py` around lines 47 - 48, The github_webhook handler lacks
HMAC verification of the X-Hub-Signature-256 header, so update the
github_webhook function to read the raw request body bytes, compute the expected
"sha256=" HMAC using a shared secret (e.g., GITHUB_WEBHOOK_SECRET from env), and
use hmac.compare_digest to compare the header to the expected value; if it
fails, raise HTTPException(status_code=403) and return early, otherwise parse
the JSON from the raw body and continue processing. Ensure you reference the
header name "X-Hub-Signature-256", avoid calling request.json() before
verification, and use hashlib.sha256 for the HMAC calculation inside
github_webhook.
backend/routes.py
Outdated
| elif event_header in ["pull_request_review_comment", "pull_request_comment"]: | ||
| action = payload.get("action") | ||
| action = payload.action | ||
| if action == "created": | ||
| event_type = EventType.PR_COMMENTED |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
GitHub webhook pull_request events documentation
💡 Result:
GitHub’s official documentation for pull_request webhook events is here (section “pull_request”):
https://docs.github.com/en/enterprise-cloud@latest/webhooks/webhook-events-and-payloads#pull_request
Key points from the docs:
- The delivery includes headers like
X-GitHub-Event: pull_request(plus delivery ID and signature headers). (docs.github.com) - The payload includes an
actionfield indicating what happened (for example:opened,edited,closed,reopened,assigned,unassigned,labeled,unlabeled,review_requested,review_request_removed,synchronize). (docs.github.com) - How to create/configure a webhook in the GitHub UI is documented here: (docs.github.com)
Citations:
- 1: https://docs.github.com/en/enterprise-cloud%40latest/webhooks/webhook-events-and-payloads
- 2: https://docs.github.com/developers/webhooks-and-events/events?utm_source=openai
- 3: https://docs.github.com/webhooks-and-events/webhooks/creating-webhooks?utm_source=openai
🏁 Script executed:
# Find and check the EventType enum in the codebase
find . -type f -name "*.py" | xargs rg -l "EventType" | head -5Repository: AOSSIE-Org/Devr.AI
Length of output: 264
🏁 Script executed:
# Find EventType enum definition
rg -n "class EventType|enum EventType" backend/Repository: AOSSIE-Org/Devr.AI
Length of output: 125
🌐 Web query:
GitHub webhook pull_request_review_comment pull_request_review event types
💡 Result:
For GitHub webhooks (and the same “activity types” you can filter on in GitHub Actions), the action field can be:
-
pull_request_reviewsubmittedediteddismissed[1]
-
pull_request_review_comment(comments on the PR diff / “Files changed”)createdediteddeleted[1]
Notes: “Approved” / “changes_requested” are not action values; they appear as review.state when pull_request_review has action: submitted. [1]
Sources
[1] GitHub Docs — “Events that trigger workflows” (pull_request_review, pull_request_review_comment) https://docs.github.com/actions/automating-your-workflow-with-github-actions/events-that-trigger-workflows
🏁 Script executed:
# Read the EventType enum to check available event types
cat -n backend/app/core/events/enums.pyRepository: AOSSIE-Org/Devr.AI
Length of output: 1217
"pull_request_comment" is not a valid GitHub webhook event.
GitHub sends "pull_request_review_comment" for inline review comments and "pull_request_review" for review submissions — there is no "pull_request_comment" event. This branch is dead code. If the intent is to also handle review submissions, use "pull_request_review" and map it to PR_REVIEWED (which exists in your EventType enum).
🔧 Suggested fix
- elif event_header in ["pull_request_review_comment", "pull_request_comment"]:
+ elif event_header in ["pull_request_review_comment"]:
action = payload.action
if action == "created":
event_type = EventType.PR_COMMENTEDAnd optionally add a separate handler for review submissions:
elif event_header == "pull_request_review":
action = payload.action
if action == "submitted":
event_type = EventType.PR_REVIEWED📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| elif event_header in ["pull_request_review_comment", "pull_request_comment"]: | |
| action = payload.get("action") | |
| action = payload.action | |
| if action == "created": | |
| event_type = EventType.PR_COMMENTED | |
| elif event_header in ["pull_request_review_comment"]: | |
| action = payload.action | |
| if action == "created": | |
| event_type = EventType.PR_COMMENTED |
🤖 Prompt for AI Agents
In `@backend/routes.py` around lines 97 - 100, The branch is handling a
non-existent "pull_request_comment" event; remove that dead branch and instead
add handling for the real review-submission event "pull_request_review": keep
the existing handling of "pull_request_review_comment" mapping created ->
EventType.PR_COMMENTED, and add a new branch for event_header ==
"pull_request_review" that checks payload.action == "submitted" and sets
event_type = EventType.PR_REVIEWED (use the existing EventType.PR_REVIEWED
enum); ensure you remove references to "pull_request_comment" so only valid
GitHub event names are used.
|
Thanks for the suggestion! You’re right that verifying the I’d be happy to open a follow-up PR to add HMAC signature verification for the webhook endpoint. |
This PR introduces Pydantic-based schema validation and improved error handling
for the /github/webhook endpoint.
Changes:
Closes #275
Summary by CodeRabbit
Bug Fixes
New Features
Refactor