Skip to content

SSRF clone-URL validation#117

Merged
wpak-ai merged 3 commits into
cppalliance:developfrom
whisper67265:fix/SSRF-clone
Jun 16, 2026
Merged

SSRF clone-URL validation#117
wpak-ai merged 3 commits into
cppalliance:developfrom
whisper67265:fix/SSRF-clone

Conversation

@whisper67265

@whisper67265 whisper67265 commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Close #113.

Summary by CodeRabbit

  • New Features
    • Added BOOST_ALLOWED_CLONE_HOSTS to limit which HTTPS git clone hosts are accepted (comma-separated allowlist; empty/blank disables host-based cloning).
    • Strengthened endpoint validation for organization/submodule segments and clone URLs, including safer hostname/IP resolution checks.
  • Bug Fixes
    • Improved structured error reporting with a dedicated invalid clone URL code and corrected submodule/clone-url error mapping.
  • Documentation
    • Updated endpoint API documentation and the deployment runbook to reflect the new configuration and validation behavior.
  • Tests
    • Expanded validation and error-handling coverage for clone URL safety and new error codes.

@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c6acf1fb-4db1-4ca9-952d-649f06327f6d

📥 Commits

Reviewing files that changed from the base of the PR and between 9c0d03b and 30cac5e.

📒 Files selected for processing (2)
  • src/boost_weblate/endpoint/serializers.py
  • tests/endpoint/test_serializers.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/boost_weblate/endpoint/serializers.py

📝 Walkthrough

Walkthrough

A new validators.py module introduces SSRF-mitigating validation for git clone URLs: repo-segment character restrictions, URL scheme enforcement, private/reserved IP blocking, DNS rebinding detection, and host allowlisting via ALLOWED_CLONE_HOSTS (configurable through BOOST_ALLOWED_CLONE_HOSTS). The serializers and service layer are updated to use these validators before any git operation. A new INVALID_CLONE_URL error code is added, with tests and documentation updated throughout.

Changes

SSRF Clone-URL Validation

Layer / File(s) Summary
Core validator module with SSRF checks and URL builders
src/boost_weblate/endpoint/validators.py
Introduces validate_repo_segment (character restrictions), _normalize_clone_url (scheme enforcement), _hostname_allowed (allowlist matching), _resolve_addresses (DNS resolution), _reject_unsafe_ip (IP range blocking), _check_literal_host_ip (literal IP detection), validate_clone_url (orchestrates SSRF checks including DNS rebinding), and URL builder helpers github_https_clone_url and github_ssh_repo_url.
ALLOWED_CLONE_HOSTS configuration and INVALID_CLONE_URL error code
src/boost_weblate/settings_override.py, src/boost_weblate/endpoint/errors.py
settings_override.py adds allowed_clone_hosts() to parse BOOST_ALLOWED_CLONE_HOSTS environment variable with whitespace/blank handling and lowercase normalization, exposing ALLOWED_CLONE_HOSTS constant. errors.py adds INVALID_CLONE_URL enum member.
Serializer organization and submodule validation
src/boost_weblate/endpoint/serializers.py
Introduces DrfValidationCode/RequestField enums for standardized error code and field name mapping; adds _custom_error_fields tracking to prevent DRF-generated message duplication; refactors _to_structured_errors and _flatten_field_errors; adds validate_organization method emitting INVALID_CLONE_URL on unsafe organization segments; extends validate_add_or_update to iterate and validate each submodule entry, emitting INVALID_SUBMODULE_LIST or INVALID_SUBMODULE errors.
Service layer use of validators
src/boost_weblate/endpoint/services.py
clone_repository now calls github_https_clone_url and propagates ValidationError; create_or_update_component uses github_ssh_repo_url with early-exit on failure; process_submodule validates organization and submodule early and wraps clone_repository call to map ValidationError to INVALID_CLONE_URL.
Comprehensive validator test coverage
tests/endpoint/test_validators.py
Introduces DNS-mocking helpers and test classes covering repo segment validation, non-HTTPS scheme rejection, private/unspecified IP blocking, allowlist enforcement, DNS rebinding detection, happy-path HTTPS/SCP normalization, and GitHub URL builder output and error handling.
Serializer and service integration tests
tests/endpoint/test_serializers.py, tests/endpoint/test_services.py
Serializer tests adjust import structure and assertion patterns for enum-based codes, then assert rejection of invalid organization and submodule segments with correct error codes and error accumulation. Service tests patch URL builders, verify clone_repository propagation of ValidationError, and assert process_submodule returns INVALID_CLONE_URL (not CLONE_FAILED) when clone fails.
Settings parsing tests and Django test wiring
tests/test_settings_override.py, tests/django_qbk_format_settings.py
test_settings_override.py tests allowed_clone_hosts() default value, environment variable parsing, and empty-string handling. django_qbk_format_settings.py imports and assigns ALLOWED_CLONE_HOSTS in Django settings.
Environment variable documentation and endpoint API updates
.env.example, docs/boost-endpoint-api.md, docs/deployment-runbook.md
.env.example adds BOOST_ALLOWED_CLONE_HOSTS with comments. API docs expand path validation step and update error codes. Runbook documents the environment variable in runtime plugin settings and Weblate environment variables table.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant AddOrUpdateRequestSerializer
    participant validators_py as validators.py
    participant BoostComponentService
    participant GitSubprocess

    Client->>AddOrUpdateRequestSerializer: POST /boost/add-or-update (organization, submodules)
    AddOrUpdateRequestSerializer->>validators_py: validate_repo_segment(organization, field="organization")
    validators_py-->>AddOrUpdateRequestSerializer: ValidationError or ok
    AddOrUpdateRequestSerializer->>validators_py: validate_repo_segment(submodule, field="submodule") for each submodule
    validators_py-->>AddOrUpdateRequestSerializer: ValidationError or ok
    AddOrUpdateRequestSerializer-->>Client: 400 INVALID_CLONE_URL / INVALID_SUBMODULE (if invalid)

    AddOrUpdateRequestSerializer->>BoostComponentService: process_submodule(organization, submodule)
    BoostComponentService->>validators_py: validate_repo_segment(organization), validate_repo_segment(submodule)
    validators_py-->>BoostComponentService: ValidationError or ok
    BoostComponentService->>validators_py: github_https_clone_url(organization, submodule)
    validators_py->>validators_py: validate_clone_url (scheme, allowlist, DNS/IP, rebind checks)
    validators_py-->>BoostComponentService: ValidationError or clone URL
    BoostComponentService->>GitSubprocess: git clone <validated_url>
    GitSubprocess-->>BoostComponentService: result
    BoostComponentService-->>Client: success or INVALID_CLONE_URL error
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • cppalliance/cppa-weblate-plugin#48: Introduced AddOrUpdateRequestSerializer in serializers.py, which this PR extends with validate_organization and per-submodule validate_repo_segment logic.
  • cppalliance/cppa-weblate-plugin#50: Implemented the BoostComponentService flow including clone_repository and process_submodule, which this PR updates to use the new URL builder validators.
  • cppalliance/cppa-weblate-plugin#71: Modified tests/endpoint/test_services.py coverage for clone_repository and process_submodule, the same test areas updated here for clone URL validation.

Suggested reviewers

  • henry0816191
  • wpak-ai

Poem

🐇 Hop hop, the rabbit checks the URL,
No sneaky IPs slipping past the rule!
SSRF blocked, the allowlist stands strong,
DNS rebinding? That won't last long.
With validators new and tests in a row,
Only safe clones get the green light to go! 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.54% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'SSRF clone-URL validation' directly describes the primary change: implementing SSRF vulnerability mitigation for git clone URL validation.
Linked Issues check ✅ Passed All acceptance criteria from issue #113 are met: organization/submodule validation against pattern [#113], HTTPS-only enforcement with IP range restrictions [#113], hostname allowlisting via ALLOWED_CLONE_HOSTS [#113], and DNS rebinding mitigation [#113].
Out of Scope Changes check ✅ Passed All changes are directly scoped to SSRF mitigation: validators module, error codes, serializer validation, service layer integration, environment configuration, documentation, and comprehensive test coverage.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
tests/endpoint/test_validators.py (1)

24-25: ⚡ Quick win

Add explicit unspecified-IP coverage to prevent SSRF regression.

Please add a case for 0.0.0.0 (and ideally ::) in the rejected-host IP tests so the security behavior is pinned by tests, not just implementation.

Also applies to: 106-124

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/endpoint/test_validators.py` around lines 24 - 25, Add explicit test
coverage for unspecified IP addresses to prevent SSRF regression. Include
`0.0.0.0` and `::` in the _PRIVATE_IPS tuple (lines 24-25) and ensure that the
rejected-host IP validation tests at lines 106-124 also validate these
unspecified IP addresses are properly rejected. This pins the security behavior
by tests rather than relying solely on implementation details.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/boost_weblate/endpoint/serializers.py`:
- Around line 158-173: The `_custom_validation_errors` attribute is being
replaced rather than accumulated across multiple validation methods. When both
the `validate_organization` method and other field validators (such as
`validate_add_or_update`) encounter errors, one overwrites the other, causing
validation errors to be lost or misclassified. Fix this by changing all
assignments to `_custom_validation_errors` from replacement operations to
accumulation operations (extend or merge the existing errors with new ones) so
that errors from all invalid fields are preserved and reported together.

In `@src/boost_weblate/endpoint/services.py`:
- Around line 320-324: The ValidationError exception at lines 320-324 in the
github_https_clone_url call is being caught and converted to a False return
value, which causes the caller in process_submodule() at lines 953-964 to treat
it as a generic CLONE_FAILED error, losing the distinction between validation
failures and actual clone errors. Remove or modify the try-except block at lines
320-324 to allow ValidationError to propagate from clone_repository(), then in
process_submodule() at lines 953-964, add a separate except handler for
ValidationError that maps it to BoostEndpointErrorCode.INVALID_CLONE_URL instead
of letting it fall through to the generic clone failure handling.

In `@src/boost_weblate/endpoint/validators.py`:
- Around line 98-109: The _reject_unsafe_ip function is missing validation for
unspecified IP addresses (0.0.0.0 and ::), which could allow SSRF attacks. Add a
check for addr.is_unspecified alongside the existing checks for is_private,
is_loopback, is_link_local, is_reserved, and is_multicast, raising a
ValidationError with an appropriate message if the address is unspecified.

In `@tests/test_settings_override.py`:
- Around line 82-90: The test_allowed_clone_hosts_default function depends on
the ambient BOOST_ALLOWED_CLONE_HOSTS environment variable and will produce
flaky results since ALLOWED_CLONE_HOSTS is computed at import time. Before the
import statements at line 82, explicitly remove the BOOST_ALLOWED_CLONE_HOSTS
environment variable (using os.environ.pop or similar) and then reload the
module to ensure ALLOWED_CLONE_HOSTS is recomputed with a clean environment
state. This makes the test deterministic and independent of the calling
environment.

---

Nitpick comments:
In `@tests/endpoint/test_validators.py`:
- Around line 24-25: Add explicit test coverage for unspecified IP addresses to
prevent SSRF regression. Include `0.0.0.0` and `::` in the _PRIVATE_IPS tuple
(lines 24-25) and ensure that the rejected-host IP validation tests at lines
106-124 also validate these unspecified IP addresses are properly rejected. This
pins the security behavior by tests rather than relying solely on implementation
details.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f91ffc74-3a19-4a0d-a85f-b8fba9a2a5df

📥 Commits

Reviewing files that changed from the base of the PR and between f425069 and 0478b7f.

📒 Files selected for processing (13)
  • .env.example
  • docs/boost-endpoint-api.md
  • docs/deployment-runbook.md
  • src/boost_weblate/endpoint/errors.py
  • src/boost_weblate/endpoint/serializers.py
  • src/boost_weblate/endpoint/services.py
  • src/boost_weblate/endpoint/validators.py
  • src/boost_weblate/settings_override.py
  • tests/django_qbk_format_settings.py
  • tests/endpoint/test_serializers.py
  • tests/endpoint/test_services.py
  • tests/endpoint/test_validators.py
  • tests/test_settings_override.py

Comment thread src/boost_weblate/endpoint/serializers.py
Comment thread src/boost_weblate/endpoint/services.py Outdated
Comment thread src/boost_weblate/endpoint/validators.py
Comment thread tests/test_settings_override.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/boost_weblate/endpoint/serializers.py (1)

75-79: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Custom error accumulation is fixed, but field-level error suppression is still inconsistent.

Line 78 suppresses add_or_update DRF errors whenever any custom error exists, while organization custom errors (Line 163) still allow DRF fallback entries. This can both hide unrelated add_or_update errors and emit duplicate/misclassified required_field-style entries for organization failures. Please track custom errors per field and suppress flattening only for those fields.

Suggested fix
 class AddOrUpdateRequestSerializer(serializers.Serializer):
@@
     def __init__(self, *args: Any, **kwargs: Any) -> None:
         super().__init__(*args, **kwargs)
         self._custom_validation_errors: list[dict[str, Any]] = []
+        self._custom_error_fields: set[str] = set()
         self._structured_errors: list[dict[str, Any]] = []
@@
     def is_valid(self, *, raise_exception: bool = False) -> bool:
         self._custom_validation_errors = []
+        self._custom_error_fields = set()
         valid = super().is_valid(raise_exception=False)
@@
     def _to_structured_errors(self) -> list[dict[str, Any]]:
         structured = list(self._custom_validation_errors)
         for field, messages in self.errors.items():
-            if field == "add_or_update" and self._custom_validation_errors:
+            if field in self._custom_error_fields:
                 continue
@@
         except ValidationError as exc:
             self._custom_validation_errors.extend(
                 boost_validation_errors(
@@
                 )
             )
+            self._custom_error_fields.add("organization")
             raise serializers.ValidationError(str(exc)) from exc
@@
         if items:
             self._custom_validation_errors.extend(boost_validation_errors(items))
+            self._custom_error_fields.add("add_or_update")
             raise serializers.ValidationError({"add_or_update": "invalid"})

Also applies to: 163-173, 251-252

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/boost_weblate/endpoint/serializers.py` around lines 75 - 79, The custom
error tracking in the serializer is using a global check for any custom errors
rather than tracking per field, causing inconsistent suppression across
different fields. Change the internal representation of custom validation errors
from a flat list to a per-field dictionary structure (e.g., from
self._custom_validation_errors to self._custom_validation_errors_by_field where
keys are field names). Update the suppression logic in the _to_structured_errors
method at line 78 to check if the specific field has custom errors before
skipping DRF fallback entries, and similarly update the organization custom
error handling logic at lines 163-173 and any related field-level error
filtering at lines 251-252 to use the per-field tracking structure instead of
the global check.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@src/boost_weblate/endpoint/serializers.py`:
- Around line 75-79: The custom error tracking in the serializer is using a
global check for any custom errors rather than tracking per field, causing
inconsistent suppression across different fields. Change the internal
representation of custom validation errors from a flat list to a per-field
dictionary structure (e.g., from self._custom_validation_errors to
self._custom_validation_errors_by_field where keys are field names). Update the
suppression logic in the _to_structured_errors method at line 78 to check if the
specific field has custom errors before skipping DRF fallback entries, and
similarly update the organization custom error handling logic at lines 163-173
and any related field-level error filtering at lines 251-252 to use the
per-field tracking structure instead of the global check.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b33e08fc-5577-4743-9326-177908ef74f7

📥 Commits

Reviewing files that changed from the base of the PR and between 0478b7f and 9c0d03b.

📒 Files selected for processing (7)
  • src/boost_weblate/endpoint/serializers.py
  • src/boost_weblate/endpoint/services.py
  • src/boost_weblate/endpoint/validators.py
  • tests/endpoint/test_serializers.py
  • tests/endpoint/test_services.py
  • tests/endpoint/test_validators.py
  • tests/test_settings_override.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • tests/endpoint/test_serializers.py
  • tests/test_settings_override.py
  • tests/endpoint/test_validators.py
  • src/boost_weblate/endpoint/validators.py

Comment thread src/boost_weblate/endpoint/serializers.py
Comment thread src/boost_weblate/endpoint/serializers.py Outdated
Comment thread src/boost_weblate/endpoint/serializers.py Outdated
@henry0816191 henry0816191 requested a review from wpak-ai June 16, 2026 16:00
@wpak-ai wpak-ai merged commit 7026f96 into cppalliance:develop Jun 16, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SSRF clone-URL validation

3 participants