Skip to content

Local run#123

Open
calavera wants to merge 27 commits into
computesdk:masterfrom
tensorlakeai:local-run
Open

Local run#123
calavera wants to merge 27 commits into
computesdk:masterfrom
tensorlakeai:local-run

Conversation

@calavera
Copy link
Copy Markdown
Contributor

@calavera calavera commented Jun 1, 2026

No description provided.

kisernl and others added 27 commits May 13, 2026 09:59
Adds opt-in 100k-sandbox burst benchmark module alongside the daily
~100-burst path. Includes the design plan + implementation checklist,
idempotent Postgres schema, and a coordinator (types, e2b provider,
pg/R2 sinks, p-limit ramp runner) bundled via esbuild. Local smoke
validated against e2b/Neon/R2; launch script + workflow are next.
scripts/burst-100k-launch.sh provisions a Namespace VM, applies the
schema, uploads the bundled coordinator, records the run, and starts
the coordinator detached. Uses --bare/--cidfile for nsc, installs
nodejs via apk, passes env via an uploaded chmod-600 startup script
(printf %q-quoted) that self-destructs after detaching node, and
pgrep-verifies the hand-off succeeded.

Validated end-to-end at N=10 against e2b/Neon/R2.
Same S3-compatible API, different provider. Renames sinks/r2.ts →
sinks/tigris.ts (R2Sink → TigrisSink), env vars R2_* → TIGRIS_STORAGE_*,
and the runs.r2_prefix column → tigris_prefix. Also fixes launch.sh's
pgrep false-negative (now retries up to 10s and matches against
coordinator.cjs) and updates the plan doc to reflect Tigris and the
current --bare/--cidfile nsc flags.

Validated end-to-end at N=10: 10/10 sandboxes ok, all three Tigris
objects (raw.jsonl, heartbeat.json, meta.json) written.
Coordinator now reads $COORDINATOR_LOG_PATH (set to /root/run.log by
launch.sh) and pushes its own stdout/stderr to Tigris on every
heartbeat and at shutdown. Closes the "logs die with the VM" gap.
Local runs skip silently when the env var is unset.
Coordinator tallies per-status counts during the burst and writes them
to new columns on runs (timeouts, http_errors, network_errors) plus
an error_histogram object in Tigris meta.json. Schema migration is
idempotent (ALTER TABLE ADD COLUMN IF NOT EXISTS), so re-running the
launch script catches up existing DBs.
Coordinator now tracks every sandbox's start/end timestamps and builds
an interval-overlap sweep at run-end. Writes concurrency_summary
(peak_concurrent, peak_t_ms, mean_concurrent, total_run_ms) and a
1Hz concurrency_timeline to Tigris meta.json. Lets us tell whether
the ramp actually behaved and where the burst saturates.
Runner reflects every primitive prop off the adapter's returned
sandbox object (skipping credential-shaped keys) and stores the
result as a JSONB column on sandbox_results and as a field in
Tigris raw.jsonl. Verified on e2b and runloop — both expose
{ provider, sandboxId }, which lets us cross-reference against
provider dashboards. Schema migration is idempotent.
Coordinator samples every 5s (process CPU, memory, event-loop lag
percentiles, load averages, /proc/self/fd count, /proc/net/sockstat)
into <run_id>/metrics.jsonl. Uploaded on every 30s heartbeat for
partial-result durability plus a final flush at shutdown. Headline
peaks land in meta.json.metrics_summary for at-a-glance review.
Adds a small logger module (ISO-timestamped, level-tagged lines with
phase markers) and replaces ad-hoc console.* calls throughout the
coordinator and runner. Per-sandbox events are sampled at high N
(pickSamplingPeriod) so coordinator.log stays bounded — every sandbox
at N<=1000, ~100 sampled + every error at higher N. Adds milestone
progress lines with rate/ETA every ~10% of work done.
Removes pickSamplingPeriod() so every sandbox gets a [ok]/[error] line
in coordinator.log regardless of N. At full 100k this produces a
~14 MB log file (still cheap to upload + store via the existing
heartbeat-cadence Tigris flush). Trade-off documented in the data
inventory.
Runner now runs `node -v` after each successful sandbox.create() and
records the two phases separately. SandboxResult gains
first_command_ms; Postgres sandbox_results gets a matching nullable
column (idempotent ALTER). meta.json adds first_command_distribution
and tti_distribution alongside the existing (allocate-only)
latency_distribution. Mirrors the daily benchmark's readiness check
so numbers are directly comparable.
Remove the 60s linear ramp from the 100k burst — all sandbox-create
requests now go out as fast as the event loop dispatches them. The
ramp was hiding the very provider-side overload behaviour we want to
measure. Drops `rampSeconds` from BurstProviderConfig, renames the
meta.json `ramp_segments` bucket to `submission_segments` (idx now
reflects event-loop submission order, not ramp position), and removes
the now-stale `ramp_seconds_configured` field from concurrency_summary.
…ncurrency

The old runner destroyed each sandbox as soon as its create+readiness check
returned, so peak concurrency was bounded by per-sandbox lifetime, not by
the provider's actual capacity to hold N sandboxes simultaneously. This
made the headline number a measure of churn, not concurrency.

Reshape the runner into two phases:
  1. parallel create + `node -v` readiness; survivors stay alive
  2. after all phase-1 tasks settle, run a final `node -v` liveness probe
     against every survivor, then destroy

Replace the 'ok | timeout | http_error | network_error' status with a
four-state lifecycle taxonomy:
  - success          created, readiness passed, alive at end-of-test
  - partial          created, readiness passed, died before end-of-test
  - readiness_failed created, but first `node -v` never returned
  - failed           sandbox.create() errored

Move the timeout/http_error/network_error sub-classification into a new
`failure_class` column so it works across any non-success status. Bump
sandboxOptions.timeoutMs to 30 min on providers that support it so they
don't auto-destroy mid-burst. Schema updated idempotently: CHECK
constraints swapped to the new values (NOT VALID for back-compat),
`failure_class` added to sandbox_results, `partials` + `readiness_failures`
added to runs.

Verified end-to-end with a 100-sandbox modal smoke run: 100/100 success,
peak_concurrent=100 (vs. the old model where peak depended on destroy
timing), Postgres + Tigris meta.json both write the new shape.
…d watch scripts

Add scripts/burst-100k-launch-multi.ts (provisions one Namespace VM
per provider via launch.sh, defaults to all 5 × 1000) and
scripts/burst-100k-watch.ts (polls Postgres for one-or-more RUN_IDs
until all reach a terminal state). Both wired up as npm scripts.
Adds an alternate way to run the burst when a single VM can't hold the
target concurrency (file descriptors, NIC queue depth, event-loop lag).
The sharded launcher spawns N namespace VMs in parallel, each firing
total/N sandboxes at t=0, all tagged with a shared group_id. An
aggregator collapses the per-shard rows back into the same metrics
shape an unsharded burst produces.

  npm run bench:burst-100k:sharded -- --provider e2b --total 100000 --vms 20
  npm run bench:burst-100k:aggregate -- --recent

Persistence

* runs gains group_id / shard_index / shard_count columns so shards
  in a group are queryable as a unit.
* New run_groups table holds one row per group with the aggregate
  scalars + full meta.json (JSONB). Mirrors runs' columns so dashboards
  can union per-VM and per-group views.
* Aggregator also uploads the full meta.json to
  s3://<bucket>/groups/<group_id>/meta.json. Both writes are on by
  default; opt out with --no-pg / --no-tigris.

Sharding mechanics

* burst-100k-launch-sharded.ts validates total % vms == 0, generates a
  group_id, and spawn()s N burst-100k-launch.sh children in parallel
  with per-child stdout prefixed [sNN]. Schema is applied once up-front
  and children get SKIP_SCHEMA=1 — CREATE TABLE/INDEX IF NOT EXISTS
  isn't race-safe under parallel applies, and Neon's -pooler endpoint
  breaks session-level advisory locks.
* burst-100k-launch.sh forwards GROUP_ID / SHARD_INDEX / SHARD_COUNT to
  the VM startup script and the pre-handoff INSERT.
* Coordinator reads the shard env, threads it through pg.bootstrap, and
  tags its own Tigris meta.json with the group fields.

Single-VM runs (bench:burst-100k:local, :multi) are unchanged — every
new field is optional.
@open-cla
Copy link
Copy Markdown

open-cla Bot commented Jun 1, 2026

Contributor License Agreement

The following contributors need CLA coverage:

Review and sign the CLA

@superagent-security superagent-security Bot added the pr:flagged PR flagged for review by security analysis. label Jun 1, 2026
Copy link
Copy Markdown

@superagent-security superagent-security Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superagent found 1 security concern(s).

SHARD_IDX_LIT="$( [ -n "${SHARD_INDEX:-}" ] && printf "%d" "$SHARD_INDEX" || echo NULL )"
SHARD_CNT_LIT="$( [ -n "${SHARD_COUNT:-}" ] && printf "%d" "$SHARD_COUNT" || echo NULL )"
psql "$PG_URL" -v ON_ERROR_STOP=1 -q -c "
INSERT INTO runs (id, provider, commit_sha, instance_id, started_at, status, tigris_prefix,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: SQL injection via unescaped variable interpolation in psql command

SQL variables interpolated directly into psql -c string without escaping or parameterization.

Pass SQL variables via psql -v flags and use :variable syntax instead of shell interpolation.

AI prompt
Check if this security scanner issue is valid. If so, understand the root cause and fix it. If appropriate, update or add tests. Keep the change focused and preserve intended behavior.

<file name="scripts/burst-100k-launch.sh">
<violation number="1" location="scripts/burst-100k-launch.sh:89">
<priority>P2</priority>
<title>SQL injection via unescaped variable interpolation in psql command</title>
<evidence>psql "$PG_URL" -v ON_ERROR_STOP=1 -q -c "
  INSERT INTO runs (id, provider, commit_sha, instance_id, started_at, status, tigris_prefix,
                    group_id, shard_index, shard_count)
  VALUES ('$RUN_ID', '$PROVIDER', '$GITHUB_SHA', '$INSTANCE_ID', now(), 'running',
          's3://${TIGRIS_STORAGE_BUCKET}/${RUN_ID}/',
          $GROUP_ID_LIT, $SHARD_IDX_LIT, $SHARD_CNT_LIT)
  ON CONFLICT (id) DO NOTHING;
"</evidence>
<recommendation>Use psql's -v flag to pass variables safely and reference them as :variable in the SQL, or execute the INSERT via a parameterized query through the Node pg client. If shell-level execution is required, at minimum validate and sanitize $PROVIDER and $RUN_ID before interpolation.</recommendation>
</violation>
</file>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr:flagged PR flagged for review by security analysis.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants