Local run#123
Open
calavera wants to merge 27 commits into
Open
Conversation
Adds opt-in 100k-sandbox burst benchmark module alongside the daily ~100-burst path. Includes the design plan + implementation checklist, idempotent Postgres schema, and a coordinator (types, e2b provider, pg/R2 sinks, p-limit ramp runner) bundled via esbuild. Local smoke validated against e2b/Neon/R2; launch script + workflow are next.
scripts/burst-100k-launch.sh provisions a Namespace VM, applies the schema, uploads the bundled coordinator, records the run, and starts the coordinator detached. Uses --bare/--cidfile for nsc, installs nodejs via apk, passes env via an uploaded chmod-600 startup script (printf %q-quoted) that self-destructs after detaching node, and pgrep-verifies the hand-off succeeded. Validated end-to-end at N=10 against e2b/Neon/R2.
Same S3-compatible API, different provider. Renames sinks/r2.ts → sinks/tigris.ts (R2Sink → TigrisSink), env vars R2_* → TIGRIS_STORAGE_*, and the runs.r2_prefix column → tigris_prefix. Also fixes launch.sh's pgrep false-negative (now retries up to 10s and matches against coordinator.cjs) and updates the plan doc to reflect Tigris and the current --bare/--cidfile nsc flags. Validated end-to-end at N=10: 10/10 sandboxes ok, all three Tigris objects (raw.jsonl, heartbeat.json, meta.json) written.
Coordinator now reads $COORDINATOR_LOG_PATH (set to /root/run.log by launch.sh) and pushes its own stdout/stderr to Tigris on every heartbeat and at shutdown. Closes the "logs die with the VM" gap. Local runs skip silently when the env var is unset.
Coordinator tallies per-status counts during the burst and writes them to new columns on runs (timeouts, http_errors, network_errors) plus an error_histogram object in Tigris meta.json. Schema migration is idempotent (ALTER TABLE ADD COLUMN IF NOT EXISTS), so re-running the launch script catches up existing DBs.
Coordinator now tracks every sandbox's start/end timestamps and builds an interval-overlap sweep at run-end. Writes concurrency_summary (peak_concurrent, peak_t_ms, mean_concurrent, total_run_ms) and a 1Hz concurrency_timeline to Tigris meta.json. Lets us tell whether the ramp actually behaved and where the burst saturates.
Runner reflects every primitive prop off the adapter's returned
sandbox object (skipping credential-shaped keys) and stores the
result as a JSONB column on sandbox_results and as a field in
Tigris raw.jsonl. Verified on e2b and runloop — both expose
{ provider, sandboxId }, which lets us cross-reference against
provider dashboards. Schema migration is idempotent.
Coordinator samples every 5s (process CPU, memory, event-loop lag percentiles, load averages, /proc/self/fd count, /proc/net/sockstat) into <run_id>/metrics.jsonl. Uploaded on every 30s heartbeat for partial-result durability plus a final flush at shutdown. Headline peaks land in meta.json.metrics_summary for at-a-glance review.
Adds a small logger module (ISO-timestamped, level-tagged lines with phase markers) and replaces ad-hoc console.* calls throughout the coordinator and runner. Per-sandbox events are sampled at high N (pickSamplingPeriod) so coordinator.log stays bounded — every sandbox at N<=1000, ~100 sampled + every error at higher N. Adds milestone progress lines with rate/ETA every ~10% of work done.
Removes pickSamplingPeriod() so every sandbox gets a [ok]/[error] line in coordinator.log regardless of N. At full 100k this produces a ~14 MB log file (still cheap to upload + store via the existing heartbeat-cadence Tigris flush). Trade-off documented in the data inventory.
Runner now runs `node -v` after each successful sandbox.create() and records the two phases separately. SandboxResult gains first_command_ms; Postgres sandbox_results gets a matching nullable column (idempotent ALTER). meta.json adds first_command_distribution and tti_distribution alongside the existing (allocate-only) latency_distribution. Mirrors the daily benchmark's readiness check so numbers are directly comparable.
Remove the 60s linear ramp from the 100k burst — all sandbox-create requests now go out as fast as the event loop dispatches them. The ramp was hiding the very provider-side overload behaviour we want to measure. Drops `rampSeconds` from BurstProviderConfig, renames the meta.json `ramp_segments` bucket to `submission_segments` (idx now reflects event-loop submission order, not ramp position), and removes the now-stale `ramp_seconds_configured` field from concurrency_summary.
…ncurrency
The old runner destroyed each sandbox as soon as its create+readiness check
returned, so peak concurrency was bounded by per-sandbox lifetime, not by
the provider's actual capacity to hold N sandboxes simultaneously. This
made the headline number a measure of churn, not concurrency.
Reshape the runner into two phases:
1. parallel create + `node -v` readiness; survivors stay alive
2. after all phase-1 tasks settle, run a final `node -v` liveness probe
against every survivor, then destroy
Replace the 'ok | timeout | http_error | network_error' status with a
four-state lifecycle taxonomy:
- success created, readiness passed, alive at end-of-test
- partial created, readiness passed, died before end-of-test
- readiness_failed created, but first `node -v` never returned
- failed sandbox.create() errored
Move the timeout/http_error/network_error sub-classification into a new
`failure_class` column so it works across any non-success status. Bump
sandboxOptions.timeoutMs to 30 min on providers that support it so they
don't auto-destroy mid-burst. Schema updated idempotently: CHECK
constraints swapped to the new values (NOT VALID for back-compat),
`failure_class` added to sandbox_results, `partials` + `readiness_failures`
added to runs.
Verified end-to-end with a 100-sandbox modal smoke run: 100/100 success,
peak_concurrent=100 (vs. the old model where peak depended on destroy
timing), Postgres + Tigris meta.json both write the new shape.
…d watch scripts Add scripts/burst-100k-launch-multi.ts (provisions one Namespace VM per provider via launch.sh, defaults to all 5 × 1000) and scripts/burst-100k-watch.ts (polls Postgres for one-or-more RUN_IDs until all reach a terminal state). Both wired up as npm scripts.
Adds an alternate way to run the burst when a single VM can't hold the target concurrency (file descriptors, NIC queue depth, event-loop lag). The sharded launcher spawns N namespace VMs in parallel, each firing total/N sandboxes at t=0, all tagged with a shared group_id. An aggregator collapses the per-shard rows back into the same metrics shape an unsharded burst produces. npm run bench:burst-100k:sharded -- --provider e2b --total 100000 --vms 20 npm run bench:burst-100k:aggregate -- --recent Persistence * runs gains group_id / shard_index / shard_count columns so shards in a group are queryable as a unit. * New run_groups table holds one row per group with the aggregate scalars + full meta.json (JSONB). Mirrors runs' columns so dashboards can union per-VM and per-group views. * Aggregator also uploads the full meta.json to s3://<bucket>/groups/<group_id>/meta.json. Both writes are on by default; opt out with --no-pg / --no-tigris. Sharding mechanics * burst-100k-launch-sharded.ts validates total % vms == 0, generates a group_id, and spawn()s N burst-100k-launch.sh children in parallel with per-child stdout prefixed [sNN]. Schema is applied once up-front and children get SKIP_SCHEMA=1 — CREATE TABLE/INDEX IF NOT EXISTS isn't race-safe under parallel applies, and Neon's -pooler endpoint breaks session-level advisory locks. * burst-100k-launch.sh forwards GROUP_ID / SHARD_INDEX / SHARD_COUNT to the VM startup script and the pre-handoff INSERT. * Coordinator reads the shard env, threads it through pg.bootstrap, and tags its own Tigris meta.json with the group fields. Single-VM runs (bench:burst-100k:local, :multi) are unchanged — every new field is optional.
Contributor License AgreementThe following contributors need CLA coverage: |
| SHARD_IDX_LIT="$( [ -n "${SHARD_INDEX:-}" ] && printf "%d" "$SHARD_INDEX" || echo NULL )" | ||
| SHARD_CNT_LIT="$( [ -n "${SHARD_COUNT:-}" ] && printf "%d" "$SHARD_COUNT" || echo NULL )" | ||
| psql "$PG_URL" -v ON_ERROR_STOP=1 -q -c " | ||
| INSERT INTO runs (id, provider, commit_sha, instance_id, started_at, status, tigris_prefix, |
There was a problem hiding this comment.
P2: SQL injection via unescaped variable interpolation in psql command
SQL variables interpolated directly into psql -c string without escaping or parameterization.
Pass SQL variables via psql -v flags and use :variable syntax instead of shell interpolation.
AI prompt
Check if this security scanner issue is valid. If so, understand the root cause and fix it. If appropriate, update or add tests. Keep the change focused and preserve intended behavior.
<file name="scripts/burst-100k-launch.sh">
<violation number="1" location="scripts/burst-100k-launch.sh:89">
<priority>P2</priority>
<title>SQL injection via unescaped variable interpolation in psql command</title>
<evidence>psql "$PG_URL" -v ON_ERROR_STOP=1 -q -c "
INSERT INTO runs (id, provider, commit_sha, instance_id, started_at, status, tigris_prefix,
group_id, shard_index, shard_count)
VALUES ('$RUN_ID', '$PROVIDER', '$GITHUB_SHA', '$INSTANCE_ID', now(), 'running',
's3://${TIGRIS_STORAGE_BUCKET}/${RUN_ID}/',
$GROUP_ID_LIT, $SHARD_IDX_LIT, $SHARD_CNT_LIT)
ON CONFLICT (id) DO NOTHING;
"</evidence>
<recommendation>Use psql's -v flag to pass variables safely and reference them as :variable in the SQL, or execute the INSERT via a parameterized query through the Node pg client. If shell-level execution is required, at minimum validate and sanitize $PROVIDER and $RUN_ID before interpolation.</recommendation>
</violation>
</file>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.