ci(railway): add Railway OSS deployment framework and preview environment CI#3787
Merged
ci(railway): add Railway OSS deployment framework and preview environment CI#3787
Conversation
…ment CI Add complete Railway OSS deployment infrastructure: - Bootstrap, configure, deploy, and smoke test scripts - Nginx gateway with Railway IPv6 DNS resolver and dynamic proxy_pass - Wrapper Dockerfiles for all 11 services (api, web, services, workers, cron, alembic, etc.) - Preview lifecycle scripts (create/update, destroy, stale cleanup) - Three GitHub Actions workflows for automated PR preview environments: - 06: build and push PR-tagged images to GHCR - 07: deploy preview environment and post URL as PR comment - 08: destroy on PR close + daily stale cleanup cron - Design docs covering architecture, caveats, and phased rollout plan
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
The deploy job calls a reusable workflow that posts PR comments. The caller's permissions block must include pull-requests:write for the called workflow to use it via secrets:inherit.
Contributor
Railway Preview Environment
Updated at 2026-02-23T11:19:05.466Z |
The Railway CLI uses --version flag, not a version subcommand.
Railway CLI uses two different env vars: - RAILWAY_TOKEN: project-scoped actions only - RAILWAY_API_TOKEN: account/workspace-level actions (create/list/delete projects) Our preview scripts need account-level access. Updated all scripts to accept either variable, and CI workflows to set RAILWAY_API_TOKEN.
Passes through COMPOSIO_API_KEY to the api service if set. Skipped silently if not provided.
- preview-cleanup-stale.sh: use process substitution instead of pipe-to-while so DELETED/SKIPPED counters are not lost in subshell - smoke.sh: propagate check_endpoint exit code after repair instead of unconditional return 0 - 06-railway-preview-build.yml: add path filters so docs-only PRs don't trigger full image builds and Railway deploys - README.md: add security note about placeholder auth/crypt keys
- Use updatedAt instead of createdAt in stale cleanup to avoid deleting active previews that were created more than 24h ago - Skip preview builds for draft PRs; destroy preview on convert-to-draft; rebuild on ready_for_review - Warn on stderr when default placeholder auth/crypt keys are in use - Add user-facing Railway deploy guide under self-host docs (image-based deploy with latest tags as the default flow)
Only increment DELETED when railway delete actually succeeds. Count failed deletes under SKIPPED instead. In dry-run mode, DELETED counts would-be deletions and the summary clearly labels it as dry-run.
…nfigure
Add lib.sh with a railway_call wrapper that detects Railway's rate-limit
response ('You are being ratelimited') and retries with exponential
backoff (default: 5 attempts, starting at 10s).
Bootstrap and configure now use railway_call for all CLI invocations.
deploy-from-images.sh adds a 5s pause between bootstrap and configure
to reduce the burst of API calls that triggers the rate limiter on fresh
project deploys.
The static web Dockerfile was using CMD ["node", ...] directly, which skips entrypoint.sh. That script generates __env.js with runtime config (API URLs, auth flags, etc.). Without it the frontend loads with missing configuration. The dynamic wrapper in deploy-from-images.sh already did this correctly.
- Skip unset_vars in preview flow (CONFIGURE_SKIP_UNSETS=true), saving ~73 API calls per deploy on fresh projects - Fix railway_call to not trigger set -e on non-zero exit codes - Use railway_call for all remaining bare railway calls in configure.sh and preview-create-or-update.sh
bootstrap.sh now calls 'railway whoami' to verify the token works before proceeding. Previously, an invalid or revoked token would silently cause 'railway project list --json' to return non-JSON output, triggering a confusing 'jq: parse error' message. The script would then fall through to 'railway init' which also failed, crashing with exit code 1 and no clear indication of the root cause. Also suppress jq stderr in ensure_project_linked so invalid JSON from transient failures does not pollute CI logs.
…raints Switch Railway CLI installation from curl-based install script to npm install. The install script fetches the latest GitHub release and gets rate-limited in CI runners, causing 'Failed to fetch latest version from GitHub' errors that crash the deploy job. Add a 'Rate Limits and Token Types' section to the README documenting the workspace token CLI limitation and a future TODO to migrate high-call operations to direct GraphQL mutations for Pro rate limits.
The render_api_like_wrapper function generates Dockerfiles for worker-tracing, worker-evaluations, and cron but was missing AGENTA_API_URL and AGENTA_API_INTERNAL_URL. Without these, workers fall back to the default http://localhost/api which does not work on Railway where the API runs in a separate container. The static Dockerfiles already set these to http://api.railway.internal:8000/api.
jp-agenta
approved these changes
Feb 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
hosting/railway/oss/What's included
Deployment scripts (
hosting/railway/oss/scripts/)bootstrap.sh-- create Railway project, services, volumes (idempotent)configure.sh-- set all environment variables per servicedeploy-from-images.sh-- full deploy flow from pre-built GHCR imagessmoke.sh-- health check validation for/w,/api/health,/services/healthpreview-create-or-update.sh-- create/update PR preview projectpreview-destroy.sh-- delete PR preview projectpreview-cleanup-stale.sh-- delete previews older than configurable TTLbuild-and-push-images.sh,deploy-gateway.sh,deploy-services.sh,init-databases.sh,upgrade.shGateway (
hosting/railway/oss/gateway/)[fd12::10])proxy_passfor dynamic DNS re-resolutionCI Workflows (
.github/workflows/)06-railway-preview-build.yml-- build and push PR-tagged images to GHCR (Docker Buildx + GHA cache)07-railway-preview-deploy.yml-- deploy preview and post URL as PR comment08-railway-preview-cleanup.yml-- destroy on PR close + daily stale cleanup cronDesign docs (
docs/design/railway-preview-environments/)Testing
This PR itself tests the CI workflows. The build workflow should trigger on this PR, build the 3 images, then deploy a preview environment and post the URL as a comment.
Requires
RAILWAY_TOKENGitHub Actions secret (already configured).