SSRF clone-URL validation by whisper67265 · Pull Request #117 · cppalliance/cppa-weblate-plugin

whisper67265 · 2026-06-15T21:36:38Z

Close #113.

Summary by CodeRabbit

New Features
- Added BOOST_ALLOWED_CLONE_HOSTS to limit which HTTPS git clone hosts are accepted (comma-separated allowlist; empty/blank disables host-based cloning).
- Strengthened endpoint validation for organization/submodule segments and clone URLs, including safer hostname/IP resolution checks.
Bug Fixes
- Improved structured error reporting with a dedicated invalid clone URL code and corrected submodule/clone-url error mapping.
Documentation
- Updated endpoint API documentation and the deployment runbook to reflect the new configuration and validation behavior.
Tests
- Expanded validation and error-handling coverage for clone URL safety and new error codes.

coderabbitai · 2026-06-15T21:40:04Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c6acf1fb-4db1-4ca9-952d-649f06327f6d

📥 Commits

Reviewing files that changed from the base of the PR and between 9c0d03b and 30cac5e.

📒 Files selected for processing (2)

src/boost_weblate/endpoint/serializers.py
tests/endpoint/test_serializers.py

🚧 Files skipped from review as they are similar to previous changes (1)

src/boost_weblate/endpoint/serializers.py

📝 Walkthrough

Walkthrough

A new validators.py module introduces SSRF-mitigating validation for git clone URLs: repo-segment character restrictions, URL scheme enforcement, private/reserved IP blocking, DNS rebinding detection, and host allowlisting via ALLOWED_CLONE_HOSTS (configurable through BOOST_ALLOWED_CLONE_HOSTS). The serializers and service layer are updated to use these validators before any git operation. A new INVALID_CLONE_URL error code is added, with tests and documentation updated throughout.

Changes

SSRF Clone-URL Validation

Layer / File(s)	Summary
Core validator module with SSRF checks and URL builders `src/boost_weblate/endpoint/validators.py`	Introduces `validate_repo_segment` (character restrictions), `_normalize_clone_url` (scheme enforcement), `_hostname_allowed` (allowlist matching), `_resolve_addresses` (DNS resolution), `_reject_unsafe_ip` (IP range blocking), `_check_literal_host_ip` (literal IP detection), `validate_clone_url` (orchestrates SSRF checks including DNS rebinding), and URL builder helpers `github_https_clone_url` and `github_ssh_repo_url`.
ALLOWED_CLONE_HOSTS configuration and INVALID_CLONE_URL error code `src/boost_weblate/settings_override.py`, `src/boost_weblate/endpoint/errors.py`	`settings_override.py` adds `allowed_clone_hosts()` to parse `BOOST_ALLOWED_CLONE_HOSTS` environment variable with whitespace/blank handling and lowercase normalization, exposing `ALLOWED_CLONE_HOSTS` constant. `errors.py` adds `INVALID_CLONE_URL` enum member.
Serializer organization and submodule validation `src/boost_weblate/endpoint/serializers.py`	Introduces `DrfValidationCode`/`RequestField` enums for standardized error code and field name mapping; adds `_custom_error_fields` tracking to prevent DRF-generated message duplication; refactors `_to_structured_errors` and `_flatten_field_errors`; adds `validate_organization` method emitting `INVALID_CLONE_URL` on unsafe organization segments; extends `validate_add_or_update` to iterate and validate each submodule entry, emitting `INVALID_SUBMODULE_LIST` or `INVALID_SUBMODULE` errors.
Service layer use of validators `src/boost_weblate/endpoint/services.py`	`clone_repository` now calls `github_https_clone_url` and propagates `ValidationError`; `create_or_update_component` uses `github_ssh_repo_url` with early-exit on failure; `process_submodule` validates organization and submodule early and wraps `clone_repository` call to map `ValidationError` to `INVALID_CLONE_URL`.
Comprehensive validator test coverage `tests/endpoint/test_validators.py`	Introduces DNS-mocking helpers and test classes covering repo segment validation, non-HTTPS scheme rejection, private/unspecified IP blocking, allowlist enforcement, DNS rebinding detection, happy-path HTTPS/SCP normalization, and GitHub URL builder output and error handling.
Serializer and service integration tests `tests/endpoint/test_serializers.py`, `tests/endpoint/test_services.py`	Serializer tests adjust import structure and assertion patterns for enum-based codes, then assert rejection of invalid organization and submodule segments with correct error codes and error accumulation. Service tests patch URL builders, verify `clone_repository` propagation of `ValidationError`, and assert `process_submodule` returns `INVALID_CLONE_URL` (not `CLONE_FAILED`) when clone fails.
Settings parsing tests and Django test wiring `tests/test_settings_override.py`, `tests/django_qbk_format_settings.py`	`test_settings_override.py` tests `allowed_clone_hosts()` default value, environment variable parsing, and empty-string handling. `django_qbk_format_settings.py` imports and assigns `ALLOWED_CLONE_HOSTS` in Django settings.
Environment variable documentation and endpoint API updates `.env.example`, `docs/boost-endpoint-api.md`, `docs/deployment-runbook.md`	`.env.example` adds `BOOST_ALLOWED_CLONE_HOSTS` with comments. API docs expand path validation step and update error codes. Runbook documents the environment variable in runtime plugin settings and Weblate environment variables table.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant AddOrUpdateRequestSerializer
    participant validators_py as validators.py
    participant BoostComponentService
    participant GitSubprocess

    Client->>AddOrUpdateRequestSerializer: POST /boost/add-or-update (organization, submodules)
    AddOrUpdateRequestSerializer->>validators_py: validate_repo_segment(organization, field="organization")
    validators_py-->>AddOrUpdateRequestSerializer: ValidationError or ok
    AddOrUpdateRequestSerializer->>validators_py: validate_repo_segment(submodule, field="submodule") for each submodule
    validators_py-->>AddOrUpdateRequestSerializer: ValidationError or ok
    AddOrUpdateRequestSerializer-->>Client: 400 INVALID_CLONE_URL / INVALID_SUBMODULE (if invalid)

    AddOrUpdateRequestSerializer->>BoostComponentService: process_submodule(organization, submodule)
    BoostComponentService->>validators_py: validate_repo_segment(organization), validate_repo_segment(submodule)
    validators_py-->>BoostComponentService: ValidationError or ok
    BoostComponentService->>validators_py: github_https_clone_url(organization, submodule)
    validators_py->>validators_py: validate_clone_url (scheme, allowlist, DNS/IP, rebind checks)
    validators_py-->>BoostComponentService: ValidationError or clone URL
    BoostComponentService->>GitSubprocess: git clone <validated_url>
    GitSubprocess-->>BoostComponentService: result
    BoostComponentService-->>Client: success or INVALID_CLONE_URL error

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

cppalliance/cppa-weblate-plugin#48: Introduced AddOrUpdateRequestSerializer in serializers.py, which this PR extends with validate_organization and per-submodule validate_repo_segment logic.
cppalliance/cppa-weblate-plugin#50: Implemented the BoostComponentService flow including clone_repository and process_submodule, which this PR updates to use the new URL builder validators.
cppalliance/cppa-weblate-plugin#71: Modified tests/endpoint/test_services.py coverage for clone_repository and process_submodule, the same test areas updated here for clone URL validation.

Suggested reviewers

henry0816191
wpak-ai

Poem

🐇 Hop hop, the rabbit checks the URL,
No sneaky IPs slipping past the rule!
SSRF blocked, the allowlist stands strong,
DNS rebinding? That won't last long.
With validators new and tests in a row,
Only safe clones get the green light to go! 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 21.54% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'SSRF clone-URL validation' directly describes the primary change: implementing SSRF vulnerability mitigation for git clone URL validation.
Linked Issues check	✅ Passed	All acceptance criteria from issue `#113` are met: organization/submodule validation against pattern [`#113`], HTTPS-only enforcement with IP range restrictions [`#113`], hostname allowlisting via ALLOWED_CLONE_HOSTS [`#113`], and DNS rebinding mitigation [`#113`].
Out of Scope Changes check	✅ Passed	All changes are directly scoped to SSRF mitigation: validators module, error codes, serializer validation, service layer integration, environment configuration, documentation, and comprehensive test coverage.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

tests/endpoint/test_validators.py (1)
24-25: ⚡ Quick win

Add explicit unspecified-IP coverage to prevent SSRF regression.

Please add a case for 0.0.0.0 (and ideally ::) in the rejected-host IP tests so the security behavior is pinned by tests, not just implementation.

Also applies to: 106-124
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/endpoint/test_validators.py` around lines 24 - 25, Add explicit test
coverage for unspecified IP addresses to prevent SSRF regression. Include
`0.0.0.0` and `::` in the _PRIVATE_IPS tuple (lines 24-25) and ensure that the
rejected-host IP validation tests at lines 106-124 also validate these
unspecified IP addresses are properly rejected. This pins the security behavior
by tests rather than relying solely on implementation details.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/boost_weblate/endpoint/serializers.py`:
- Around line 158-173: The `_custom_validation_errors` attribute is being
replaced rather than accumulated across multiple validation methods. When both
the `validate_organization` method and other field validators (such as
`validate_add_or_update`) encounter errors, one overwrites the other, causing
validation errors to be lost or misclassified. Fix this by changing all
assignments to `_custom_validation_errors` from replacement operations to
accumulation operations (extend or merge the existing errors with new ones) so
that errors from all invalid fields are preserved and reported together.

In `@src/boost_weblate/endpoint/services.py`:
- Around line 320-324: The ValidationError exception at lines 320-324 in the
github_https_clone_url call is being caught and converted to a False return
value, which causes the caller in process_submodule() at lines 953-964 to treat
it as a generic CLONE_FAILED error, losing the distinction between validation
failures and actual clone errors. Remove or modify the try-except block at lines
320-324 to allow ValidationError to propagate from clone_repository(), then in
process_submodule() at lines 953-964, add a separate except handler for
ValidationError that maps it to BoostEndpointErrorCode.INVALID_CLONE_URL instead
of letting it fall through to the generic clone failure handling.

In `@src/boost_weblate/endpoint/validators.py`:
- Around line 98-109: The _reject_unsafe_ip function is missing validation for
unspecified IP addresses (0.0.0.0 and ::), which could allow SSRF attacks. Add a
check for addr.is_unspecified alongside the existing checks for is_private,
is_loopback, is_link_local, is_reserved, and is_multicast, raising a
ValidationError with an appropriate message if the address is unspecified.

In `@tests/test_settings_override.py`:
- Around line 82-90: The test_allowed_clone_hosts_default function depends on
the ambient BOOST_ALLOWED_CLONE_HOSTS environment variable and will produce
flaky results since ALLOWED_CLONE_HOSTS is computed at import time. Before the
import statements at line 82, explicitly remove the BOOST_ALLOWED_CLONE_HOSTS
environment variable (using os.environ.pop or similar) and then reload the
module to ensure ALLOWED_CLONE_HOSTS is recomputed with a clean environment
state. This makes the test deterministic and independent of the calling
environment.

---

Nitpick comments:
In `@tests/endpoint/test_validators.py`:
- Around line 24-25: Add explicit test coverage for unspecified IP addresses to
prevent SSRF regression. Include `0.0.0.0` and `::` in the _PRIVATE_IPS tuple
(lines 24-25) and ensure that the rejected-host IP validation tests at lines
106-124 also validate these unspecified IP addresses are properly rejected. This
pins the security behavior by tests rather than relying solely on implementation
details.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f91ffc74-3a19-4a0d-a85f-b8fba9a2a5df

📥 Commits

Reviewing files that changed from the base of the PR and between f425069 and 0478b7f.

📒 Files selected for processing (13)

.env.example
docs/boost-endpoint-api.md
docs/deployment-runbook.md
src/boost_weblate/endpoint/errors.py
src/boost_weblate/endpoint/serializers.py
src/boost_weblate/endpoint/services.py
src/boost_weblate/endpoint/validators.py
src/boost_weblate/settings_override.py
tests/django_qbk_format_settings.py
tests/endpoint/test_serializers.py
tests/endpoint/test_services.py
tests/endpoint/test_validators.py
tests/test_settings_override.py

coderabbitai

♻️ Duplicate comments (1)

src/boost_weblate/endpoint/serializers.py (1)

75-79: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Custom error accumulation is fixed, but field-level error suppression is still inconsistent.

Line 78 suppresses add_or_update DRF errors whenever any custom error exists, while organization custom errors (Line 163) still allow DRF fallback entries. This can both hide unrelated add_or_update errors and emit duplicate/misclassified required_field-style entries for organization failures. Please track custom errors per field and suppress flattening only for those fields.

Suggested fix

 class AddOrUpdateRequestSerializer(serializers.Serializer):
@@
     def __init__(self, *args: Any, **kwargs: Any) -> None:
         super().__init__(*args, **kwargs)
         self._custom_validation_errors: list[dict[str, Any]] = []
+        self._custom_error_fields: set[str] = set()
         self._structured_errors: list[dict[str, Any]] = []
@@
     def is_valid(self, *, raise_exception: bool = False) -> bool:
         self._custom_validation_errors = []
+        self._custom_error_fields = set()
         valid = super().is_valid(raise_exception=False)
@@
     def _to_structured_errors(self) -> list[dict[str, Any]]:
         structured = list(self._custom_validation_errors)
         for field, messages in self.errors.items():
-            if field == "add_or_update" and self._custom_validation_errors:
+            if field in self._custom_error_fields:
                 continue
@@
         except ValidationError as exc:
             self._custom_validation_errors.extend(
                 boost_validation_errors(
@@
                 )
             )
+            self._custom_error_fields.add("organization")
             raise serializers.ValidationError(str(exc)) from exc
@@
         if items:
             self._custom_validation_errors.extend(boost_validation_errors(items))
+            self._custom_error_fields.add("add_or_update")
             raise serializers.ValidationError({"add_or_update": "invalid"})

Also applies to: 163-173, 251-252

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/boost_weblate/endpoint/serializers.py` around lines 75 - 79, The custom
error tracking in the serializer is using a global check for any custom errors
rather than tracking per field, causing inconsistent suppression across
different fields. Change the internal representation of custom validation errors
from a flat list to a per-field dictionary structure (e.g., from
self._custom_validation_errors to self._custom_validation_errors_by_field where
keys are field names). Update the suppression logic in the _to_structured_errors
method at line 78 to check if the specific field has custom errors before
skipping DRF fallback entries, and similarly update the organization custom
error handling logic at lines 163-173 and any related field-level error
filtering at lines 251-252 to use the per-field tracking structure instead of
the global check.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@src/boost_weblate/endpoint/serializers.py`:
- Around line 75-79: The custom error tracking in the serializer is using a
global check for any custom errors rather than tracking per field, causing
inconsistent suppression across different fields. Change the internal
representation of custom validation errors from a flat list to a per-field
dictionary structure (e.g., from self._custom_validation_errors to
self._custom_validation_errors_by_field where keys are field names). Update the
suppression logic in the _to_structured_errors method at line 78 to check if the
specific field has custom errors before skipping DRF fallback entries, and
similarly update the organization custom error handling logic at lines 163-173
and any related field-level error filtering at lines 251-252 to use the
per-field tracking structure instead of the global check.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b33e08fc-5577-4743-9326-177908ef74f7

📥 Commits

Reviewing files that changed from the base of the PR and between 0478b7f and 9c0d03b.

📒 Files selected for processing (7)

src/boost_weblate/endpoint/serializers.py
src/boost_weblate/endpoint/services.py
src/boost_weblate/endpoint/validators.py
tests/endpoint/test_serializers.py
tests/endpoint/test_services.py
tests/endpoint/test_validators.py
tests/test_settings_override.py

🚧 Files skipped from review as they are similar to previous changes (4)

tests/endpoint/test_serializers.py
tests/test_settings_override.py
tests/endpoint/test_validators.py
src/boost_weblate/endpoint/validators.py

SSRF clone-URL validation

0478b7f

coderabbitai Bot reviewed Jun 15, 2026

View reviewed changes

Comment thread src/boost_weblate/endpoint/serializers.py

Comment thread src/boost_weblate/endpoint/services.py Outdated

Comment thread src/boost_weblate/endpoint/validators.py

Comment thread tests/test_settings_override.py Outdated

fix coderabbbitai review

9c0d03b

coderabbitai Bot reviewed Jun 15, 2026

View reviewed changes

whisper67265 requested a review from henry0816191 June 15, 2026 22:36

henry0816191 reviewed Jun 16, 2026

View reviewed changes

Comment thread src/boost_weblate/endpoint/serializers.py

Comment thread src/boost_weblate/endpoint/serializers.py Outdated

Comment thread src/boost_weblate/endpoint/serializers.py Outdated

fix first reviewer's review

30cac5e

henry0816191 approved these changes Jun 16, 2026

View reviewed changes

henry0816191 requested a review from wpak-ai June 16, 2026 16:00

wpak-ai approved these changes Jun 16, 2026

View reviewed changes

wpak-ai merged commit 7026f96 into cppalliance:develop Jun 16, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SSRF clone-URL validation#117

SSRF clone-URL validation#117
wpak-ai merged 3 commits into
cppalliance:developfrom
whisper67265:fix/SSRF-clone

whisper67265 commented Jun 15, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

whisper67265 commented Jun 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

whisper67265 commented Jun 15, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading