Local run by calavera · Pull Request #123 · computesdk/benchmarks

calavera · 2026-06-01T21:49:33Z

No description provided.

Adds opt-in 100k-sandbox burst benchmark module alongside the daily ~100-burst path. Includes the design plan + implementation checklist, idempotent Postgres schema, and a coordinator (types, e2b provider, pg/R2 sinks, p-limit ramp runner) bundled via esbuild. Local smoke validated against e2b/Neon/R2; launch script + workflow are next.

scripts/burst-100k-launch.sh provisions a Namespace VM, applies the schema, uploads the bundled coordinator, records the run, and starts the coordinator detached. Uses --bare/--cidfile for nsc, installs nodejs via apk, passes env via an uploaded chmod-600 startup script (printf %q-quoted) that self-destructs after detaching node, and pgrep-verifies the hand-off succeeded. Validated end-to-end at N=10 against e2b/Neon/R2.

Same S3-compatible API, different provider. Renames sinks/r2.ts → sinks/tigris.ts (R2Sink → TigrisSink), env vars R2_* → TIGRIS_STORAGE_*, and the runs.r2_prefix column → tigris_prefix. Also fixes launch.sh's pgrep false-negative (now retries up to 10s and matches against coordinator.cjs) and updates the plan doc to reflect Tigris and the current --bare/--cidfile nsc flags. Validated end-to-end at N=10: 10/10 sandboxes ok, all three Tigris objects (raw.jsonl, heartbeat.json, meta.json) written.

Coordinator now reads $COORDINATOR_LOG_PATH (set to /root/run.log by launch.sh) and pushes its own stdout/stderr to Tigris on every heartbeat and at shutdown. Closes the "logs die with the VM" gap. Local runs skip silently when the env var is unset.

Coordinator tallies per-status counts during the burst and writes them to new columns on runs (timeouts, http_errors, network_errors) plus an error_histogram object in Tigris meta.json. Schema migration is idempotent (ALTER TABLE ADD COLUMN IF NOT EXISTS), so re-running the launch script catches up existing DBs.

Coordinator now tracks every sandbox's start/end timestamps and builds an interval-overlap sweep at run-end. Writes concurrency_summary (peak_concurrent, peak_t_ms, mean_concurrent, total_run_ms) and a 1Hz concurrency_timeline to Tigris meta.json. Lets us tell whether the ramp actually behaved and where the burst saturates.

Runner reflects every primitive prop off the adapter's returned sandbox object (skipping credential-shaped keys) and stores the result as a JSONB column on sandbox_results and as a field in Tigris raw.jsonl. Verified on e2b and runloop — both expose { provider, sandboxId }, which lets us cross-reference against provider dashboards. Schema migration is idempotent.

Coordinator samples every 5s (process CPU, memory, event-loop lag percentiles, load averages, /proc/self/fd count, /proc/net/sockstat) into <run_id>/metrics.jsonl. Uploaded on every 30s heartbeat for partial-result durability plus a final flush at shutdown. Headline peaks land in meta.json.metrics_summary for at-a-glance review.

Adds a small logger module (ISO-timestamped, level-tagged lines with phase markers) and replaces ad-hoc console.* calls throughout the coordinator and runner. Per-sandbox events are sampled at high N (pickSamplingPeriod) so coordinator.log stays bounded — every sandbox at N<=1000, ~100 sampled + every error at higher N. Adds milestone progress lines with rate/ETA every ~10% of work done.

Removes pickSamplingPeriod() so every sandbox gets a [ok]/[error] line in coordinator.log regardless of N. At full 100k this produces a ~14 MB log file (still cheap to upload + store via the existing heartbeat-cadence Tigris flush). Trade-off documented in the data inventory.

Runner now runs `node -v` after each successful sandbox.create() and records the two phases separately. SandboxResult gains first_command_ms; Postgres sandbox_results gets a matching nullable column (idempotent ALTER). meta.json adds first_command_distribution and tti_distribution alongside the existing (allocate-only) latency_distribution. Mirrors the daily benchmark's readiness check so numbers are directly comparable.

Remove the 60s linear ramp from the 100k burst — all sandbox-create requests now go out as fast as the event loop dispatches them. The ramp was hiding the very provider-side overload behaviour we want to measure. Drops `rampSeconds` from BurstProviderConfig, renames the meta.json `ramp_segments` bucket to `submission_segments` (idx now reflects event-loop submission order, not ramp position), and removes the now-stale `ramp_seconds_configured` field from concurrency_summary.

…ncurrency The old runner destroyed each sandbox as soon as its create+readiness check returned, so peak concurrency was bounded by per-sandbox lifetime, not by the provider's actual capacity to hold N sandboxes simultaneously. This made the headline number a measure of churn, not concurrency. Reshape the runner into two phases: 1. parallel create + `node -v` readiness; survivors stay alive 2. after all phase-1 tasks settle, run a final `node -v` liveness probe against every survivor, then destroy Replace the 'ok | timeout | http_error | network_error' status with a four-state lifecycle taxonomy: - success created, readiness passed, alive at end-of-test - partial created, readiness passed, died before end-of-test - readiness_failed created, but first `node -v` never returned - failed sandbox.create() errored Move the timeout/http_error/network_error sub-classification into a new `failure_class` column so it works across any non-success status. Bump sandboxOptions.timeoutMs to 30 min on providers that support it so they don't auto-destroy mid-burst. Schema updated idempotently: CHECK constraints swapped to the new values (NOT VALID for back-compat), `failure_class` added to sandbox_results, `partials` + `readiness_failures` added to runs. Verified end-to-end with a 100-sandbox modal smoke run: 100/100 success, peak_concurrent=100 (vs. the old model where peak depended on destroy timing), Postgres + Tigris meta.json both write the new shape.

…d watch scripts Add scripts/burst-100k-launch-multi.ts (provisions one Namespace VM per provider via launch.sh, defaults to all 5 × 1000) and scripts/burst-100k-watch.ts (polls Postgres for one-or-more RUN_IDs until all reach a terminal state). Both wired up as npm scripts.

Adds an alternate way to run the burst when a single VM can't hold the target concurrency (file descriptors, NIC queue depth, event-loop lag). The sharded launcher spawns N namespace VMs in parallel, each firing total/N sandboxes at t=0, all tagged with a shared group_id. An aggregator collapses the per-shard rows back into the same metrics shape an unsharded burst produces. npm run bench:burst-100k:sharded -- --provider e2b --total 100000 --vms 20 npm run bench:burst-100k:aggregate -- --recent Persistence * runs gains group_id / shard_index / shard_count columns so shards in a group are queryable as a unit. * New run_groups table holds one row per group with the aggregate scalars + full meta.json (JSONB). Mirrors runs' columns so dashboards can union per-VM and per-group views. * Aggregator also uploads the full meta.json to s3://<bucket>/groups/<group_id>/meta.json. Both writes are on by default; opt out with --no-pg / --no-tigris. Sharding mechanics * burst-100k-launch-sharded.ts validates total % vms == 0, generates a group_id, and spawn()s N burst-100k-launch.sh children in parallel with per-child stdout prefixed [sNN]. Schema is applied once up-front and children get SKIP_SCHEMA=1 — CREATE TABLE/INDEX IF NOT EXISTS isn't race-safe under parallel applies, and Neon's -pooler endpoint breaks session-level advisory locks. * burst-100k-launch.sh forwards GROUP_ID / SHARD_INDEX / SHARD_COUNT to the VM startup script and the pre-handoff INSERT. * Coordinator reads the shard env, threads it through pg.bootstrap, and tags its own Tigris meta.json with the group fields. Single-VM runs (bench:burst-100k:local, :multi) are unchanged — every new field is optional.

open-cla · 2026-06-01T21:49:41Z

Contributor License Agreement

The following contributors need CLA coverage:

@calavera

Review and sign the CLA

superagent-security

Superagent found 1 security concern(s).

superagent-security · 2026-06-01T21:59:51Z

+SHARD_IDX_LIT="$( [ -n "${SHARD_INDEX:-}" ] && printf "%d"   "$SHARD_INDEX" || echo NULL )"
+SHARD_CNT_LIT="$( [ -n "${SHARD_COUNT:-}" ] && printf "%d"   "$SHARD_COUNT" || echo NULL )"
+psql "$PG_URL" -v ON_ERROR_STOP=1 -q -c "
+  INSERT INTO runs (id, provider, commit_sha, instance_id, started_at, status, tigris_prefix,


P2: SQL injection via unescaped variable interpolation in psql command

SQL variables interpolated directly into psql -c string without escaping or parameterization.

Pass SQL variables via psql -v flags and use :variable syntax instead of shell interpolation.

AI prompt

Check if this security scanner issue is valid. If so, understand the root cause and fix it. If appropriate, update or add tests. Keep the change focused and preserve intended behavior. <file name="scripts/burst-100k-launch.sh"> <violation number="1" location="scripts/burst-100k-launch.sh:89"> <priority>P2</priority> <title>SQL injection via unescaped variable interpolation in psql command</title> <evidence>psql "$PG_URL" -v ON_ERROR_STOP=1 -q -c " INSERT INTO runs (id, provider, commit_sha, instance_id, started_at, status, tigris_prefix, group_id, shard_index, shard_count) VALUES ('$RUN_ID', '$PROVIDER', '$GITHUB_SHA', '$INSTANCE_ID', now(), 'running', 's3://${TIGRIS_STORAGE_BUCKET}/${RUN_ID}/', $GROUP_ID_LIT, $SHARD_IDX_LIT, $SHARD_CNT_LIT) ON CONFLICT (id) DO NOTHING; "</evidence> <recommendation>Use psql's -v flag to pass variables safely and reference them as :variable in the SQL, or execute the INSERT via a parameterized query through the Node pg client. If shell-level execution is required, at minimum validate and sanitize $PROVIDER and $RUN_ID before interpolation.</recommendation> </violation> </file>

kisernl and others added 27 commits May 13, 2026 09:59

update tigris fro storage

d80fcd1

add modal + data inventory md

7456b0a

add runloop to burst-100k

6c8b31e

add latency histogram data to tigris

63c19b3

add tensorlake as burst-100k provider

3f4f617

add declaw as burst-100k provider

dcff249

add json data to log per sandbox

9c08610

update pkg json for modal

4147cf8

update modal provider

6fe9d92

update package-lock.json

6d301bb

update e2b pkg

f955348

Run 100k locally.

0100854

superagent-security Bot added the pr:flagged PR flagged for review by security analysis. label Jun 1, 2026

superagent-security Bot reviewed Jun 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local run#123

Local run#123
calavera wants to merge 27 commits into
computesdk:masterfrom
tensorlakeai:local-run

calavera commented Jun 1, 2026

Uh oh!

open-cla Bot commented Jun 1, 2026

Uh oh!

superagent-security Bot left a comment

Uh oh!

superagent-security Bot Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

calavera commented Jun 1, 2026

Uh oh!

open-cla Bot commented Jun 1, 2026

Contributor License Agreement

Uh oh!

superagent-security Bot left a comment

Choose a reason for hiding this comment

Uh oh!

superagent-security Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants