SSRF clone-URL validation#117
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughA new ChangesSSRF Clone-URL Validation
Sequence Diagram(s)sequenceDiagram
participant Client
participant AddOrUpdateRequestSerializer
participant validators_py as validators.py
participant BoostComponentService
participant GitSubprocess
Client->>AddOrUpdateRequestSerializer: POST /boost/add-or-update (organization, submodules)
AddOrUpdateRequestSerializer->>validators_py: validate_repo_segment(organization, field="organization")
validators_py-->>AddOrUpdateRequestSerializer: ValidationError or ok
AddOrUpdateRequestSerializer->>validators_py: validate_repo_segment(submodule, field="submodule") for each submodule
validators_py-->>AddOrUpdateRequestSerializer: ValidationError or ok
AddOrUpdateRequestSerializer-->>Client: 400 INVALID_CLONE_URL / INVALID_SUBMODULE (if invalid)
AddOrUpdateRequestSerializer->>BoostComponentService: process_submodule(organization, submodule)
BoostComponentService->>validators_py: validate_repo_segment(organization), validate_repo_segment(submodule)
validators_py-->>BoostComponentService: ValidationError or ok
BoostComponentService->>validators_py: github_https_clone_url(organization, submodule)
validators_py->>validators_py: validate_clone_url (scheme, allowlist, DNS/IP, rebind checks)
validators_py-->>BoostComponentService: ValidationError or clone URL
BoostComponentService->>GitSubprocess: git clone <validated_url>
GitSubprocess-->>BoostComponentService: result
BoostComponentService-->>Client: success or INVALID_CLONE_URL error
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
tests/endpoint/test_validators.py (1)
24-25: ⚡ Quick winAdd explicit unspecified-IP coverage to prevent SSRF regression.
Please add a case for
0.0.0.0(and ideally::) in the rejected-host IP tests so the security behavior is pinned by tests, not just implementation.Also applies to: 106-124
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/endpoint/test_validators.py` around lines 24 - 25, Add explicit test coverage for unspecified IP addresses to prevent SSRF regression. Include `0.0.0.0` and `::` in the _PRIVATE_IPS tuple (lines 24-25) and ensure that the rejected-host IP validation tests at lines 106-124 also validate these unspecified IP addresses are properly rejected. This pins the security behavior by tests rather than relying solely on implementation details.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/boost_weblate/endpoint/serializers.py`:
- Around line 158-173: The `_custom_validation_errors` attribute is being
replaced rather than accumulated across multiple validation methods. When both
the `validate_organization` method and other field validators (such as
`validate_add_or_update`) encounter errors, one overwrites the other, causing
validation errors to be lost or misclassified. Fix this by changing all
assignments to `_custom_validation_errors` from replacement operations to
accumulation operations (extend or merge the existing errors with new ones) so
that errors from all invalid fields are preserved and reported together.
In `@src/boost_weblate/endpoint/services.py`:
- Around line 320-324: The ValidationError exception at lines 320-324 in the
github_https_clone_url call is being caught and converted to a False return
value, which causes the caller in process_submodule() at lines 953-964 to treat
it as a generic CLONE_FAILED error, losing the distinction between validation
failures and actual clone errors. Remove or modify the try-except block at lines
320-324 to allow ValidationError to propagate from clone_repository(), then in
process_submodule() at lines 953-964, add a separate except handler for
ValidationError that maps it to BoostEndpointErrorCode.INVALID_CLONE_URL instead
of letting it fall through to the generic clone failure handling.
In `@src/boost_weblate/endpoint/validators.py`:
- Around line 98-109: The _reject_unsafe_ip function is missing validation for
unspecified IP addresses (0.0.0.0 and ::), which could allow SSRF attacks. Add a
check for addr.is_unspecified alongside the existing checks for is_private,
is_loopback, is_link_local, is_reserved, and is_multicast, raising a
ValidationError with an appropriate message if the address is unspecified.
In `@tests/test_settings_override.py`:
- Around line 82-90: The test_allowed_clone_hosts_default function depends on
the ambient BOOST_ALLOWED_CLONE_HOSTS environment variable and will produce
flaky results since ALLOWED_CLONE_HOSTS is computed at import time. Before the
import statements at line 82, explicitly remove the BOOST_ALLOWED_CLONE_HOSTS
environment variable (using os.environ.pop or similar) and then reload the
module to ensure ALLOWED_CLONE_HOSTS is recomputed with a clean environment
state. This makes the test deterministic and independent of the calling
environment.
---
Nitpick comments:
In `@tests/endpoint/test_validators.py`:
- Around line 24-25: Add explicit test coverage for unspecified IP addresses to
prevent SSRF regression. Include `0.0.0.0` and `::` in the _PRIVATE_IPS tuple
(lines 24-25) and ensure that the rejected-host IP validation tests at lines
106-124 also validate these unspecified IP addresses are properly rejected. This
pins the security behavior by tests rather than relying solely on implementation
details.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f91ffc74-3a19-4a0d-a85f-b8fba9a2a5df
📒 Files selected for processing (13)
.env.exampledocs/boost-endpoint-api.mddocs/deployment-runbook.mdsrc/boost_weblate/endpoint/errors.pysrc/boost_weblate/endpoint/serializers.pysrc/boost_weblate/endpoint/services.pysrc/boost_weblate/endpoint/validators.pysrc/boost_weblate/settings_override.pytests/django_qbk_format_settings.pytests/endpoint/test_serializers.pytests/endpoint/test_services.pytests/endpoint/test_validators.pytests/test_settings_override.py
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/boost_weblate/endpoint/serializers.py (1)
75-79:⚠️ Potential issue | 🟠 Major | ⚡ Quick winCustom error accumulation is fixed, but field-level error suppression is still inconsistent.
Line 78 suppresses
add_or_updateDRF errors whenever any custom error exists, whileorganizationcustom errors (Line 163) still allow DRF fallback entries. This can both hide unrelatedadd_or_updateerrors and emit duplicate/misclassifiedrequired_field-style entries for organization failures. Please track custom errors per field and suppress flattening only for those fields.Suggested fix
class AddOrUpdateRequestSerializer(serializers.Serializer): @@ def __init__(self, *args: Any, **kwargs: Any) -> None: super().__init__(*args, **kwargs) self._custom_validation_errors: list[dict[str, Any]] = [] + self._custom_error_fields: set[str] = set() self._structured_errors: list[dict[str, Any]] = [] @@ def is_valid(self, *, raise_exception: bool = False) -> bool: self._custom_validation_errors = [] + self._custom_error_fields = set() valid = super().is_valid(raise_exception=False) @@ def _to_structured_errors(self) -> list[dict[str, Any]]: structured = list(self._custom_validation_errors) for field, messages in self.errors.items(): - if field == "add_or_update" and self._custom_validation_errors: + if field in self._custom_error_fields: continue @@ except ValidationError as exc: self._custom_validation_errors.extend( boost_validation_errors( @@ ) ) + self._custom_error_fields.add("organization") raise serializers.ValidationError(str(exc)) from exc @@ if items: self._custom_validation_errors.extend(boost_validation_errors(items)) + self._custom_error_fields.add("add_or_update") raise serializers.ValidationError({"add_or_update": "invalid"})Also applies to: 163-173, 251-252
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/boost_weblate/endpoint/serializers.py` around lines 75 - 79, The custom error tracking in the serializer is using a global check for any custom errors rather than tracking per field, causing inconsistent suppression across different fields. Change the internal representation of custom validation errors from a flat list to a per-field dictionary structure (e.g., from self._custom_validation_errors to self._custom_validation_errors_by_field where keys are field names). Update the suppression logic in the _to_structured_errors method at line 78 to check if the specific field has custom errors before skipping DRF fallback entries, and similarly update the organization custom error handling logic at lines 163-173 and any related field-level error filtering at lines 251-252 to use the per-field tracking structure instead of the global check.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@src/boost_weblate/endpoint/serializers.py`:
- Around line 75-79: The custom error tracking in the serializer is using a
global check for any custom errors rather than tracking per field, causing
inconsistent suppression across different fields. Change the internal
representation of custom validation errors from a flat list to a per-field
dictionary structure (e.g., from self._custom_validation_errors to
self._custom_validation_errors_by_field where keys are field names). Update the
suppression logic in the _to_structured_errors method at line 78 to check if the
specific field has custom errors before skipping DRF fallback entries, and
similarly update the organization custom error handling logic at lines 163-173
and any related field-level error filtering at lines 251-252 to use the
per-field tracking structure instead of the global check.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b33e08fc-5577-4743-9326-177908ef74f7
📒 Files selected for processing (7)
src/boost_weblate/endpoint/serializers.pysrc/boost_weblate/endpoint/services.pysrc/boost_weblate/endpoint/validators.pytests/endpoint/test_serializers.pytests/endpoint/test_services.pytests/endpoint/test_validators.pytests/test_settings_override.py
🚧 Files skipped from review as they are similar to previous changes (4)
- tests/endpoint/test_serializers.py
- tests/test_settings_override.py
- tests/endpoint/test_validators.py
- src/boost_weblate/endpoint/validators.py
Close #113.
Summary by CodeRabbit
BOOST_ALLOWED_CLONE_HOSTSto limit which HTTPS git clone hosts are accepted (comma-separated allowlist; empty/blank disables host-based cloning).