Skip to content

Staging to main#45

Open
thomashebrard wants to merge 34 commits intomainfrom
staging
Open

Staging to main#45
thomashebrard wants to merge 34 commits intomainfrom
staging

Conversation

@thomashebrard
Copy link
Copy Markdown
Member

@thomashebrard thomashebrard commented Apr 10, 2026

Summary by cubic

Adds agent build endpoints (concepts, pipe specs), a models listing API, and async pipeline runs via Temporal with optional webhook/storage delivery and HMAC. Expands model support (OpenRouter, Linkup), standardizes API inputs to arrays, unifies auth with AUTH_MODE, adds /api/v1/api_version, and introduces dev/staging deploy workflows.

  • New Features

    • Deploy: dev/staging GitHub workflows; production role set to pipelex-api-ecr-push-production; make deploy-api (prod) and make deploy-api-staging; Docker image bundles .pipelex defaults.
    • Agent APIs: build concepts and pipe specs; list models by category (incl. search).
    • Pipeline: async runs via Temporal (sync execute remains); webhook/storage delivery targets with HMAC; data URLs normalize to storage by default; /api/v1/api_version.
    • API shape: mthds_contents: list[str] across validate/build/runner/pipeline; bundle_uribundle_uris.
    • Inference: added openrouter backend and linkup search backend; new search deck and routing profiles; LLM defaults updated (claude-4.6-opus, gpt-4o-mini); expanded catalogs; standardized thinking_mode metadata across backends.
    • Auth: unified AUTH_MODE (none, api_key, jwt); request user exposed to routes and used in uploads.
    • Config/UX: environment-specific Pipelex configs for Temporal; ReactFlow view spec removed; dark theme for graphs.
    • Docs: endpoints under /api/v1; builder (inputs/outputs/runner), validate, model listing, and async run docs updated.
  • Migration

    • Send mthds_contents: [content] and handle bundle_uris arrays.
    • Set AUTH_MODE and provide API_KEY or JWT_SECRET as needed.
    • For deploy, set AWS_ACCOUNT_ID and use new role names; use make deploy-api (prod) or make deploy-api-staging.
    • Add provider keys (OPENROUTER_API_KEY, LINKUP_API_KEY) if using new backends.
    • Expect new LLM defaults and updated extract aliases.
    • Data URLs now normalize to storage URIs by default; adjust consumers if relying on raw data URLs.
    • Legacy /build/pipe endpoint removed; use agent build endpoints instead.

Written for commit 3051754. Summary will update on new commits.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Apr 10, 2026

Greptile Summary

This PR merges staging into main, introducing multi-file PLX content support (mthds_contents array), four new agent-builder API endpoints (/build/concept, /build/pipe-spec, /assemble, /models), a reworked three-mode auth system (none/jwt/api_key), and real user identity propagation to the uploader.

  • P1 — dry_run_pipeline unprotected: In validate.py, dry_run_pipeline is called outside the try/except ValidateBundleError block; any failure there becomes an unhandled 500 instead of a clean {"success": false} response.
  • P1 — Identity header spoofing: no_auth in security.py unconditionally trusts gateway-forwarded identity headers from any caller; if the server is reachable directly, callers can inject arbitrary user identities affecting storage scoping in the uploader.
  • P2 — Missing response_model: /validate and /models endpoints lack response_model declarations, contrary to the project coding standard.

Confidence Score: 4/5

Two P1 issues should be addressed before merging to main: unprotected dry_run_pipeline and spoofable identity headers in no_auth.

The P1 in validate.py causes silent 500 errors for valid PLX content that fails dry-run; the P1 in security.py is a real identity-spoofing vector when the server is misconfigured. Both are straightforward to fix. All remaining findings are P2.

api/routes/pipelex/validate.py (unprotected dry_run_pipeline), api/security.py (spoofable identity headers in no_auth)

Security Review

  • Identity header injection (api/security.py L154–165): no_auth trusts forwarded gateway headers from any caller with no origin validation. If the server is exposed without an API Gateway, callers can impersonate arbitrary users, affecting all identity-scoped operations including file uploads.

Important Files Changed

Filename Overview
api/security.py Refactored to support three auth modes; the no_auth mode trusts forwarded identity headers from any caller, enabling impersonation when not behind an API Gateway.
api/routes/pipelex/validate.py Extended validate endpoint with dry_run_pipeline and pipe_structures, but dry_run_pipeline is outside the error-handling block and the endpoint is missing response_model.
api/routes/pipelex/pipeline.py Updated to support mthds_contents array; execute now returns JSONResponse for proper serialization.
api/routes/pipelex/agent/models.py New endpoint to list available model presets; missing response_model on the route decorator.
api/routes/uploader.py Now uses get_request_user dependency to set user_id from authenticated identity rather than a hardcoded placeholder.
Dockerfile Adds cp of .pipelex config into /root/.pipelex to ensure pipelex global config is available at runtime.

Sequence Diagram

sequenceDiagram
    participant Client
    participant FastAPI
    participant AuthLayer
    participant RouteHandler
    participant PipelexCore

    Client->>FastAPI: HTTP Request
    FastAPI->>AuthLayer: get_auth_dependency() selects mode
    alt no_auth mode
        AuthLayer->>AuthLayer: Read forwarded identity headers
        AuthLayer->>FastAPI: Set request.state.user if headers present
    else jwt mode
        AuthLayer->>AuthLayer: Decode and verify JWT token
        AuthLayer->>FastAPI: Set request.state.user from JWT payload
    else api_key mode
        AuthLayer->>AuthLayer: Validate static API key
        FastAPI->>FastAPI: request.state.user remains None
    end
    FastAPI->>RouteHandler: Dispatch to route
    RouteHandler->>RouteHandler: get_request_user() returns RequestUser or None
    RouteHandler->>PipelexCore: validate_bundle / dry_run_pipeline / assemble_bundle
    PipelexCore-->>RouteHandler: Result
    RouteHandler-->>Client: JSONResponse
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: api/routes/pipelex/validate.py
Line: 97

Comment:
**`dry_run_pipeline` called outside error-handling block**

`dry_run_pipeline` is called after the `try/except ValidateBundleError` block, so any exception it raises propagates as an unhandled HTTP 500 rather than returning a clean `{"success": false, ...}` JSON response — breaking the error contract established for `validate_bundle` just above.

```suggestion
    try:
        graph_spec, _ = await dry_run_pipeline(mthds_contents=mthds_contents)
    except Exception as exc:
        return JSONResponse(
            content={
                "success": False,
                "mthds_contents": mthds_contents,
                "message": str(exc),
            }
        )
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: api/security.py
Line: 154-165

Comment:
**Forwarded identity headers are spoofable without API Gateway enforcement**

`no_auth` unconditionally trusts the email and identity headers sent by the caller. If the server is reachable directly rather than via an API Gateway, any caller can inject arbitrary values and receive a fully-populated `RequestUser`. Downstream handlers that use the identity to scope operations (such as the uploader endpoint) will act on the spoofed identity. The docstring documents the API Gateway requirement but provides no runtime enforcement.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: api/routes/pipelex/validate.py
Line: 84-86

Comment:
**Potential `IndexError` if `blueprints` is empty**

If `_find_main_blueprint` returns `None` and `validate_bundle_result.blueprints` happens to be an empty list, `blueprints[0]` will raise an `IndexError`. Adding a guard here makes the error message explicit:

```suggestion
    primary_blueprint = _find_main_blueprint(validate_bundle_result.blueprints)
    if not primary_blueprint:
        if not validate_bundle_result.blueprints:
            return JSONResponse(
                content={
                    "success": False,
                    "mthds_contents": mthds_contents,
                    "message": "No bundle blueprints found after validation",
                }
            )
        primary_blueprint = validate_bundle_result.blueprints[0]
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: api/routes/pipelex/validate.py
Line: 67-68

Comment:
**Missing `response_model` and return type annotation**

Per the coding standards, all endpoints must declare `response_model` and every function must be fully type-annotated. Both are absent here:

```suggestion
@router.post("/validate", response_model=ValidateResponse)
async def validate_plx(request_data: ValidateRequest) -> JSONResponse:
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: api/routes/pipelex/agent/models.py
Line: 14-15

Comment:
**Missing `response_model` on `/models` endpoint**

Per coding standards, all endpoints must declare `response_model`. The return type `dict[str, Any]` is too loose for a well-typed OpenAPI schema, but at minimum an explicit annotation keeps the codebase consistent:

```suggestion
@router.get("/models", response_model=dict[str, Any])
async def get_models(
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "dev to staging (#42)" | Re-trigger Greptile

}
)

graph_spec, _ = await dry_run_pipeline(mthds_contents=mthds_contents)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 dry_run_pipeline called outside error-handling block

dry_run_pipeline is called after the try/except ValidateBundleError block, so any exception it raises propagates as an unhandled HTTP 500 rather than returning a clean {"success": false, ...} JSON response — breaking the error contract established for validate_bundle just above.

Suggested change
graph_spec, _ = await dry_run_pipeline(mthds_contents=mthds_contents)
try:
graph_spec, _ = await dry_run_pipeline(mthds_contents=mthds_contents)
except Exception as exc:
return JSONResponse(
content={
"success": False,
"mthds_contents": mthds_contents,
"message": str(exc),
}
)
Prompt To Fix With AI
This is a comment left during a code review.
Path: api/routes/pipelex/validate.py
Line: 97

Comment:
**`dry_run_pipeline` called outside error-handling block**

`dry_run_pipeline` is called after the `try/except ValidateBundleError` block, so any exception it raises propagates as an unhandled HTTP 500 rather than returning a clean `{"success": false, ...}` JSON response — breaking the error contract established for `validate_bundle` just above.

```suggestion
    try:
        graph_spec, _ = await dry_run_pipeline(mthds_contents=mthds_contents)
    except Exception as exc:
        return JSONResponse(
            content={
                "success": False,
                "mthds_contents": mthds_contents,
                "message": str(exc),
            }
        )
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +154 to +165
async def no_auth(request: Request) -> None:
"""No-op auth dependency for AUTH_MODE=none.

When behind API Gateway, reads identity from forwarded headers
(X-User-Email, X-User-Sub, X-Auth-Method) set by the Lambda authorizer.
"""
email = request.headers.get("x-user-email")
sub = request.headers.get("x-user-sub")
auth_method = request.headers.get("x-auth-method")

if email and sub:
_set_request_user(request, email=email, sub=sub, auth_method=auth_method or "gateway")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 security Forwarded identity headers are spoofable without API Gateway enforcement

no_auth unconditionally trusts the email and identity headers sent by the caller. If the server is reachable directly rather than via an API Gateway, any caller can inject arbitrary values and receive a fully-populated RequestUser. Downstream handlers that use the identity to scope operations (such as the uploader endpoint) will act on the spoofed identity. The docstring documents the API Gateway requirement but provides no runtime enforcement.

Prompt To Fix With AI
This is a comment left during a code review.
Path: api/security.py
Line: 154-165

Comment:
**Forwarded identity headers are spoofable without API Gateway enforcement**

`no_auth` unconditionally trusts the email and identity headers sent by the caller. If the server is reachable directly rather than via an API Gateway, any caller can inject arbitrary values and receive a fully-populated `RequestUser`. Downstream handlers that use the identity to scope operations (such as the uploader endpoint) will act on the spoofed identity. The docstring documents the API Gateway requirement but provides no runtime enforcement.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +84 to +86
primary_blueprint = _find_main_blueprint(validate_bundle_result.blueprints)
if not primary_blueprint:
primary_blueprint = validate_bundle_result.blueprints[0]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Potential IndexError if blueprints is empty

If _find_main_blueprint returns None and validate_bundle_result.blueprints happens to be an empty list, blueprints[0] will raise an IndexError. Adding a guard here makes the error message explicit:

Suggested change
primary_blueprint = _find_main_blueprint(validate_bundle_result.blueprints)
if not primary_blueprint:
primary_blueprint = validate_bundle_result.blueprints[0]
primary_blueprint = _find_main_blueprint(validate_bundle_result.blueprints)
if not primary_blueprint:
if not validate_bundle_result.blueprints:
return JSONResponse(
content={
"success": False,
"mthds_contents": mthds_contents,
"message": "No bundle blueprints found after validation",
}
)
primary_blueprint = validate_bundle_result.blueprints[0]
Prompt To Fix With AI
This is a comment left during a code review.
Path: api/routes/pipelex/validate.py
Line: 84-86

Comment:
**Potential `IndexError` if `blueprints` is empty**

If `_find_main_blueprint` returns `None` and `validate_bundle_result.blueprints` happens to be an empty list, `blueprints[0]` will raise an `IndexError`. Adding a guard here makes the error message explicit:

```suggestion
    primary_blueprint = _find_main_blueprint(validate_bundle_result.blueprints)
    if not primary_blueprint:
        if not validate_bundle_result.blueprints:
            return JSONResponse(
                content={
                    "success": False,
                    "mthds_contents": mthds_contents,
                    "message": "No bundle blueprints found after validation",
                }
            )
        primary_blueprint = validate_bundle_result.blueprints[0]
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +67 to 68
@router.post("/validate")
async def validate_plx(request_data: ValidateRequest):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing response_model and return type annotation

Per the coding standards, all endpoints must declare response_model and every function must be fully type-annotated. Both are absent here:

Suggested change
@router.post("/validate")
async def validate_plx(request_data: ValidateRequest):
@router.post("/validate", response_model=ValidateResponse)
async def validate_plx(request_data: ValidateRequest) -> JSONResponse:
Prompt To Fix With AI
This is a comment left during a code review.
Path: api/routes/pipelex/validate.py
Line: 67-68

Comment:
**Missing `response_model` and return type annotation**

Per the coding standards, all endpoints must declare `response_model` and every function must be fully type-annotated. Both are absent here:

```suggestion
@router.post("/validate", response_model=ValidateResponse)
async def validate_plx(request_data: ValidateRequest) -> JSONResponse:
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +14 to +15
model_type: Annotated[list[str] | None, Query(alias="type", description="Filter by model category: llm, extract, img_gen, search")] = None,
) -> dict[str, Any]:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing response_model on /models endpoint

Per coding standards, all endpoints must declare response_model. The return type dict[str, Any] is too loose for a well-typed OpenAPI schema, but at minimum an explicit annotation keeps the codebase consistent:

Suggested change
model_type: Annotated[list[str] | None, Query(alias="type", description="Filter by model category: llm, extract, img_gen, search")] = None,
) -> dict[str, Any]:
@router.get("/models", response_model=dict[str, Any])
async def get_models(
Prompt To Fix With AI
This is a comment left during a code review.
Path: api/routes/pipelex/agent/models.py
Line: 14-15

Comment:
**Missing `response_model` on `/models` endpoint**

Per coding standards, all endpoints must declare `response_model`. The return type `dict[str, Any]` is too loose for a well-typed OpenAPI schema, but at minimum an explicit annotation keeps the codebase consistent:

```suggestion
@router.get("/models", response_model=dict[str, Any])
async def get_models(
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant