Skip to content

ci(railway): add Railway OSS deployment framework and preview environment CI#3787

Merged
mmabrouk merged 16 commits intomainfrom
ci/railway-preview-environments
Feb 23, 2026
Merged

ci(railway): add Railway OSS deployment framework and preview environment CI#3787
mmabrouk merged 16 commits intomainfrom
ci/railway-preview-environments

Conversation

@mmabrouk
Copy link
Member

@mmabrouk mmabrouk commented Feb 19, 2026

Summary

  • Add complete Railway OSS deployment infrastructure under hosting/railway/oss/
  • Add 3 GitHub Actions workflows for automated per-PR preview environments
  • Add design docs covering architecture decisions, caveats, and phased rollout plan

What's included

Deployment scripts (hosting/railway/oss/scripts/)

  • bootstrap.sh -- create Railway project, services, volumes (idempotent)
  • configure.sh -- set all environment variables per service
  • deploy-from-images.sh -- full deploy flow from pre-built GHCR images
  • smoke.sh -- health check validation for /w, /api/health, /services/health
  • preview-create-or-update.sh -- create/update PR preview project
  • preview-destroy.sh -- delete PR preview project
  • preview-cleanup-stale.sh -- delete previews older than configurable TTL
  • Plus: build-and-push-images.sh, deploy-gateway.sh, deploy-services.sh, init-databases.sh, upgrade.sh

Gateway (hosting/railway/oss/gateway/)

  • Nginx config with Railway IPv6 DNS resolver ([fd12::10])
  • Variable-based proxy_pass for dynamic DNS re-resolution
  • Rewrite rules for path prefix stripping

CI Workflows (.github/workflows/)

  • 06-railway-preview-build.yml -- build and push PR-tagged images to GHCR (Docker Buildx + GHA cache)
  • 07-railway-preview-deploy.yml -- deploy preview and post URL as PR comment
  • 08-railway-preview-cleanup.yml -- destroy on PR close + daily stale cleanup cron

Design docs (docs/design/railway-preview-environments/)

  • Context, research, plan, status, deployment notes, QA strategy

Testing

This PR itself tests the CI workflows. The build workflow should trigger on this PR, build the 3 images, then deploy a preview environment and post the URL as a comment.

Requires RAILWAY_TOKEN GitHub Actions secret (already configured).


Open with Devin

…ment CI

Add complete Railway OSS deployment infrastructure:
- Bootstrap, configure, deploy, and smoke test scripts
- Nginx gateway with Railway IPv6 DNS resolver and dynamic proxy_pass
- Wrapper Dockerfiles for all 11 services (api, web, services, workers, cron, alembic, etc.)
- Preview lifecycle scripts (create/update, destroy, stale cleanup)
- Three GitHub Actions workflows for automated PR preview environments:
  - 06: build and push PR-tagged images to GHCR
  - 07: deploy preview environment and post URL as PR comment
  - 08: destroy on PR close + daily stale cleanup cron
- Design docs covering architecture, caveats, and phased rollout plan
@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Feb 19, 2026
@vercel
Copy link

vercel bot commented Feb 19, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Feb 22, 2026 4:57pm

Request Review

The deploy job calls a reusable workflow that posts PR comments.
The caller's permissions block must include pull-requests:write
for the called workflow to use it via secrets:inherit.
devin-ai-integration[bot]

This comment was marked as resolved.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 19, 2026

Railway Preview Environment

Status Destroyed (PR closed)

Updated at 2026-02-23T11:19:05.466Z

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

The Railway CLI uses --version flag, not a version subcommand.
devin-ai-integration[bot]

This comment was marked as resolved.

Railway CLI uses two different env vars:
- RAILWAY_TOKEN: project-scoped actions only
- RAILWAY_API_TOKEN: account/workspace-level actions (create/list/delete projects)

Our preview scripts need account-level access. Updated all scripts
to accept either variable, and CI workflows to set RAILWAY_API_TOKEN.
Passes through COMPOSIO_API_KEY to the api service if set.
Skipped silently if not provided.
- preview-cleanup-stale.sh: use process substitution instead of
  pipe-to-while so DELETED/SKIPPED counters are not lost in subshell
- smoke.sh: propagate check_endpoint exit code after repair instead
  of unconditional return 0
- 06-railway-preview-build.yml: add path filters so docs-only PRs
  don't trigger full image builds and Railway deploys
- README.md: add security note about placeholder auth/crypt keys
devin-ai-integration[bot]

This comment was marked as resolved.

- Use updatedAt instead of createdAt in stale cleanup to avoid deleting
  active previews that were created more than 24h ago
- Skip preview builds for draft PRs; destroy preview on convert-to-draft;
  rebuild on ready_for_review
- Warn on stderr when default placeholder auth/crypt keys are in use
- Add user-facing Railway deploy guide under self-host docs (image-based
  deploy with latest tags as the default flow)
devin-ai-integration[bot]

This comment was marked as resolved.

Only increment DELETED when railway delete actually succeeds. Count
failed deletes under SKIPPED instead. In dry-run mode, DELETED counts
would-be deletions and the summary clearly labels it as dry-run.
devin-ai-integration[bot]

This comment was marked as resolved.

…nfigure

Add lib.sh with a railway_call wrapper that detects Railway's rate-limit
response ('You are being ratelimited') and retries with exponential
backoff (default: 5 attempts, starting at 10s).

Bootstrap and configure now use railway_call for all CLI invocations.
deploy-from-images.sh adds a 5s pause between bootstrap and configure
to reduce the burst of API calls that triggers the rate limiter on fresh
project deploys.
The static web Dockerfile was using CMD ["node", ...] directly, which
skips entrypoint.sh. That script generates __env.js with runtime config
(API URLs, auth flags, etc.). Without it the frontend loads with missing
configuration. The dynamic wrapper in deploy-from-images.sh already did
this correctly.
devin-ai-integration[bot]

This comment was marked as resolved.

- Skip unset_vars in preview flow (CONFIGURE_SKIP_UNSETS=true), saving
  ~73 API calls per deploy on fresh projects
- Fix railway_call to not trigger set -e on non-zero exit codes
- Use railway_call for all remaining bare railway calls in configure.sh
  and preview-create-or-update.sh
bootstrap.sh now calls 'railway whoami' to verify the token works before
proceeding. Previously, an invalid or revoked token would silently cause
'railway project list --json' to return non-JSON output, triggering a
confusing 'jq: parse error' message. The script would then fall through
to 'railway init' which also failed, crashing with exit code 1 and no
clear indication of the root cause.

Also suppress jq stderr in ensure_project_linked so invalid JSON from
transient failures does not pollute CI logs.
…raints

Switch Railway CLI installation from curl-based install script to
npm install. The install script fetches the latest GitHub release and
gets rate-limited in CI runners, causing 'Failed to fetch latest
version from GitHub' errors that crash the deploy job.

Add a 'Rate Limits and Token Types' section to the README documenting
the workspace token CLI limitation and a future TODO to migrate
high-call operations to direct GraphQL mutations for Pro rate limits.
devin-ai-integration[bot]

This comment was marked as resolved.

The render_api_like_wrapper function generates Dockerfiles for
worker-tracing, worker-evaluations, and cron but was missing
AGENTA_API_URL and AGENTA_API_INTERNAL_URL. Without these, workers
fall back to the default http://localhost/api which does not work on
Railway where the API runs in a separate container. The static
Dockerfiles already set these to http://api.railway.internal:8000/api.
Copy link
Member

@jp-agenta jp-agenta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm @mmabrouk

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Feb 23, 2026
@mmabrouk mmabrouk merged commit 3924b5b into main Feb 23, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/cd lgtm This PR has been approved by a maintainer size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants