diff --git a/.gitignore b/.gitignore index 8fd34cc2d5..24c0cf6e2e 100644 --- a/.gitignore +++ b/.gitignore @@ -33,3 +33,5 @@ target zkvm-prover/*.json .work/ rollup/tests.test +local-secrets.md +tmp/ diff --git a/AGENTS.md b/AGENTS.md index 449aecbdc8..03e036174b 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -20,6 +20,22 @@ Follow the structured testing guide in [`docs/testing/openvm-upgrade-testing-gui 4. End-to-end proving 5. Docker image builds +## Shadow Coordinator + Prover Testing (Production Task Replay) + +For testing proof generation against **real mainnet production tasks** without interfering with the live system, use the **Shadow Coordinator** approach. This is significantly faster than a full shadow fork: + +- **Architecture**: Local coordinator (`:8390`) + local prover (GPU), fed by imported production task data. +- **Docs**: [`docs/shadow-testing/README.md`](docs/shadow-testing/README.md) — full setup guide, troubleshooting, config reference. +- **Quick Start**: [`scripts/shadow-testing/QUICKSTART.md`](scripts/shadow-testing/QUICKSTART.md) +- **Automation**: [`scripts/shadow-testing/setup.sh`](scripts/shadow-testing/setup.sh) — one-command setup for postgres, coordinator, and prover. + +Key hard-won rules: +- **L2 RPC**: Must support `debug_executionWitness`. `https://mainnet-rpc.scroll.io` works; `https://rpc.scroll.io` does not. +- **S3 circuit URLs**: v0.8.0 uses `v0.8.0/` prefix (no `/releases/`). +- **l2_block table**: Coordinator needs this for block hash lookups. Must be populated and linked via `chunk_hash`. +- **Blocks**: Must be post-fork (GalileoV2 / codec V10 = blocks ≥ 33,750,000 on mainnet). +- **L1 messages**: If chunks contain L1 messages, prover needs `scroll_getL1MessagesInBlock` RPC support. Most chunks at current mainnet height do NOT contain L1 messages, so this is usually non-blocking. + ## Useful Commands ```bash @@ -116,4 +132,5 @@ make coordinator_setup | [`docs/prover-coordinator-overview.md`](docs/prover-coordinator-overview.md) | Architecture, data flow, component relationships, common operations | | [`docs/testing/openvm-upgrade-testing-guide.md`](docs/testing/openvm-upgrade-testing-guide.md) | Step-by-step testing checklist after OpenVM / zkvm-prover upgrades | | [`docs/testing/docker-compose-e2e-guide.md`](docs/testing/docker-compose-e2e-guide.md) | Production-like E2E testing with Docker Compose + Coordinator Proxy | +| [`docs/shadow-testing/README.md`](docs/shadow-testing/README.md) | Shadow coordinator + local prover setup for production task replay | | [`docs/testing_reports/openvm-v1.6.0-guest-v0.8.0-May19.md`](docs/testing_reports/openvm-v1.6.0-guest-v0.8.0-May19.md) | Test report for PR #1783 (OpenVM 1.6.0, guest v0.8.0) | diff --git a/common/version/version.go b/common/version/version.go index 1703340026..fadf9e9533 100644 --- a/common/version/version.go +++ b/common/version/version.go @@ -5,7 +5,7 @@ import ( "runtime/debug" ) -var tag = "v4.7.13" +var tag = "v4.7.13-openvm16" var commit = func() string { if info, ok := debug.ReadBuildInfo(); ok { diff --git a/docs/shadow-testing/README.md b/docs/shadow-testing/README.md new file mode 100644 index 0000000000..05e2a8819d --- /dev/null +++ b/docs/shadow-testing/README.md @@ -0,0 +1,543 @@ +# Shadow Coordinator + Prover Testing Guide + +This guide documents how to set up a **shadow coordinator** + **local prover** environment for testing proof generation without interfering with production. This approach is significantly simpler than a full shadow fork — we use a local coordinator with imported production task data and a local prover that fetches tasks from it. + +## Architecture + +``` +┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ +│ Production RDS │ │ Shadow DB │ │ Shadow │ +│ (read-only via │────▶│ (local :5433) │────▶│ Coordinator │ +│ port-forward) │ │ │ │ (localhost:8390)│ +└──────────────────┘ └──────────────────┘ └────────┬─────────┘ + │ + │ assigns tasks + ▼ +┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ +│ L2 RPC │ │ Local Prover │ │ Verifier Assets │ +│ (mainnet-rpc. │◀────│ (GPU/CPU) │ │ (/tmp/shadow- │ +│ scroll.io) │ │ │ │ verifier-assets)│ +└──────────────────┘ └──────────────────┘ └──────────────────┘ +``` + +## Prerequisites + +### Hardware +- GPU with CUDA support (tested on RTX 3090) +- ~50GB disk space for Docker images + verifier assets + circuit downloads +- 16GB+ RAM + +### Software +- Docker + docker-compose +- PostgreSQL client (`psql`) +- Rust toolchain (for local prover binary) +- `kubectl` or SSH access to IDC for port-forwarding to production RDS + +### Network +- Access to IDC machine with port-forward to mainnet RDS (e.g., `idc-us-1-19`) +- Internet access for L2 RPC and S3 circuit downloads + +## Quick Start + +If you just want to get running, use the provided script: + +```bash +# 1. Set up shadow PostgreSQL +cd scripts/shadow-testing +./setup.sh --postgres + +# 2. Import production task data (requires RDS port-forward) +./import-production-data.sh + +# 3. Start shadow coordinator +./setup.sh --coordinator + +# 4. Start prover (in another terminal) +./setup.sh --prover +``` + +## Step-by-Step Setup + +### Step 1: Set up IDC Port-Forward to Production RDS + +On the IDC machine (e.g., `idc-us-1-19`), ensure the port-forward is active: + +```bash +# Mainnet RDS should be accessible on localhost:15432 +# Credentials are loaded from .env (see .env.example) +psql -h localhost -p 15432 -U "$PROD_DB_USER" -d rollup -c "SELECT version();" +``` + +If not already set up, configure SSH tunnel or kubectl port-forward from your workstation. + +### Step 2: Start Local PostgreSQL (Shadow DB) + +```bash +docker run -d \ + --name shadow-coordinator-postgres \ + -e POSTGRES_USER=postgres \ + -e POSTGRES_PASSWORD="${SHADOW_DB_PASSWORD}" \ + -e POSTGRES_DB=shadow_rollup \ + -p 5433:5432 \ + -v shadow-coordinator-postgres-data:/var/lib/postgresql/data \ + postgres:15 + +# Wait for DB to be ready +sleep 5 +docker exec shadow-coordinator-postgres pg_isready -U postgres +``` + +### Step 3: Download Verifier Assets + +The coordinator needs verifier assets for each supported fork: + +```bash +VERIFIER_DIR="/tmp/shadow-verifier-assets" +mkdir -p "$VERIFIER_DIR" + +# feynman (OpenVM 0.5.6) +mkdir -p "$VERIFIER_DIR/openvm-0.5.6" +# Download or copy verifier assets for feynman + +# galileo (v0.7.1) +mkdir -p "$VERIFIER_DIR/openvm-v0.7.1" +# Download or copy verifier assets for galileo + +# galileoV2 (v0.8.0) — NOTE: v0.8.0 does NOT use /releases/ prefix in S3 URLs +mkdir -p "$VERIFIER_DIR/openvm-v0.8.0" +# Download or copy verifier assets for galileoV2 +``` + +> ⚠️ **Important**: v0.8.0 assets use `v0.8.0/` path prefix, NOT `releases/v0.8.0/`. Using the wrong prefix causes HTTP 403 errors. + +### Step 4: Initialize Shadow DB Schema + +Use the coordinator's built-in migration or apply schema manually. The coordinator container will auto-migrate on first start. + +### Step 5: Import Production Task Data + +Export the latest N batches + their chunks + bundles from production RDS and import into shadow DB: + +```bash +# Edit these variables as needed +# Credentials loaded from .env (see scripts/shadow-testing/.env.example) +PROD_DB="postgresql://${PROD_DB_USER}:${PROD_DB_PASSWORD}@${PROD_DB_HOST}:${PROD_DB_PORT}/${PROD_DB_NAME}" +SHADOW_DB="postgresql://${SHADOW_DB_USER}:${SHADOW_DB_PASSWORD}@${SHADOW_DB_HOST}:${SHADOW_DB_PORT}/${SHADOW_DB_NAME}" +BATCH_LIMIT=50 + +# Export batches +psql "$PROD_DB" -c " + COPY ( + SELECT * FROM batch + ORDER BY index DESC + LIMIT $BATCH_LIMIT + ) TO STDOUT WITH CSV HEADER; +" > /tmp/batches.csv + +# Export chunks in those batches +psql "$PROD_DB" -c " + COPY ( + SELECT c.* FROM chunk c + JOIN batch b ON b.start_chunk_index <= c.index AND c.index <= b.end_chunk_index + WHERE b.index IN (SELECT index FROM batch ORDER BY index DESC LIMIT $BATCH_LIMIT) + ORDER BY c.index + ) TO STDOUT WITH CSV HEADER; +" > /tmp/chunks.csv + +# Export bundles (all or limited) +psql "$PROD_DB" -c " + COPY ( + SELECT * FROM bundle + ORDER BY index DESC + LIMIT 20000 + ) TO STDOUT WITH CSV HEADER; +" > /tmp/bundles.csv + +# Import into shadow DB (truncate first) +psql "$SHADOW_DB" -c "TRUNCATE batch, chunk, bundle CASCADE;" + +# Use \copy for local import +psql "$SHADOW_DB" -c "\\copy batch FROM '/tmp/batches.csv' WITH CSV HEADER;" +psql "$SHADOW_DB" -c "\\copy chunk FROM '/tmp/chunks.csv' WITH CSV HEADER;" +psql "$SHADOW_DB" -c "\\copy bundle FROM '/tmp/bundles.csv' WITH CSV HEADER;" + +# Reset proving status to unassigned (1) +psql "$SHADOW_DB" -c "UPDATE chunk SET proving_status = 1, total_attempts = 0, active_attempts = 0;" +psql "$SHADOW_DB" -c "UPDATE batch SET proving_status = 1, total_attempts = 0, active_attempts = 0, chunk_proofs_status = 0;" +psql "$SHADOW_DB" -c "UPDATE bundle SET proving_status = 1, total_attempts = 0, active_attempts = 0;" +``` + +### Step 6: Populate l2_block Table + +The coordinator needs `l2_block` records to format chunk tasks (for block hashes and hardfork name resolution). + +Use the provided Python script or fetch blocks via L2 RPC: + +```bash +python3 scripts/shadow-testing/fetch-l2-blocks.py \ + --rpc https://mainnet-rpc.scroll.io \ + --db "postgresql://$SHADOW_DB_USER:$SHADOW_DB_PASSWORD@$SHADOW_DB_HOST:$SHADOW_DB_PORT/$SHADOW_DB_NAME" \ + --start-block 26000000 \ + --end-block 27000000 +``` + +After inserting blocks, link them to chunks: + +```bash +psql "$SHADOW_DB" -c " + UPDATE l2_block lb + SET chunk_hash = c.hash + FROM chunk c + WHERE lb.number >= c.start_block_number + AND lb.number <= c.end_block_number; +" +``` + +### Step 7: Start Shadow Coordinator + +Use Docker (recommended) or run locally: + +```bash +# Via Docker +docker run -d \ + --name shadow-coordinator-api-test \ + --network host \ + -v /tmp/shadow-coordinator-config.json:/app/conf/config.json \ + -v /tmp/shadow-verifier-assets:/verifier \ + zhuoatscroll/coordinator-api:v4.7.13-openvm16 + +# Wait for startup (takes 2-3 min for OpenVM keygen) +docker logs -f shadow-coordinator-api-test | grep -m1 "Start coordinator api successfully" +``` + +### Step 8: Start Prover + +Build or use prebuilt binary: + +```bash +# Build locally +cd /path/to/scroll-repo +cargo build --release -p prover-bin + +# Or use Docker image +docker run -d \ + --name shadow-prover \ + --network host \ + --gpus all \ + -v /tmp/prover-local.json:/app/config.json \ + -v ~/.openvm/params:/root/.openvm/params:ro \ + zhuoatscroll/prover:v4.7.13-openvm16 + +# Or run binary directly +./target/release/prover --config /tmp/prover-local.json +``` + +> ℹ️ **Note**: Prover will download circuit assets from S3 on first run (several GB). Subsequent runs use cached assets in `.work/galileo/`. + +## Monitoring + +### Check coordinator health +```bash +curl -s http://localhost:8390/ | head +``` + +### Check prover health +```bash +curl -s http://localhost:10080/health +``` + +### Watch coordinator logs +```bash +docker logs -f shadow-coordinator-api-test --tail 100 +``` + +### Watch prover logs +```bash +# If running via docker +docker logs -f shadow-prover --tail 100 + +# If running binary directly, logs go to stdout +``` + +### Check DB task status +```bash +psql "$SHADOW_DB" -c " + SELECT proving_status, COUNT(*) FROM chunk GROUP BY proving_status; +" +``` + +Proving status values: +- `1` = Unassigned +- `2` = Assigned +- `3` = Proving +- `4` = Proven (success) +- `5` = Failed + +## Troubleshooting + +### Coordinator says "Start coordinator api successfully" but prover gets no tasks +- Verify `l2_block` table has records for the chunk's block range +- Check `proving_status = 1` on chunks +- Check `codec_version != 5` (chunks with codec_version = 5 are skipped) +- Ensure chunk's `end_block_number <= coordinator's block height` + +### "mismatched post-state root" or codec errors +- Verify you're using blocks after the hardfork. For GalileoV2 (codec V10), use blocks ≥ 33,750,000 on mainnet. +- Ensure `SCROLL_FORK_NAME` and verifier assets match the block's fork. + +### "Failed to execute witness" or "Method not found" +- The L2 RPC must support `debug_executionWitness` and `debug_dbGet`. +- `https://mainnet-rpc.scroll.io` supports these; `https://rpc.scroll.io` does not. + +### "Failed to get l1 messages in block" (-32601) +- Your RPC does not support `scroll_getL1MessagesInBlock`. This is non-fatal if the block contains no L1 messages. +- If L1 messages exist, you need an RPC that supports this method. + +### S3 403 errors when downloading circuit assets +- v0.8.0 assets: `https://circuit-release.s3.us-west-2.amazonaws.com/scroll-zkvm/v0.8.0/` +- v0.7.1 and earlier: `https://circuit-release.s3.us-west-2.amazonaws.com/scroll-zkvm/releases/v0.7.1/` +- Verify with `curl -sI ` before running. + +### "bind: address already in use" (port 8390) +- Kill old coordinator: `pkill -f coordinator_api` or `docker rm -f shadow-coordinator-api-test` + +### Port conflicts with local PostgreSQL +- If you have system PostgreSQL on 5432, use 5433 for shadow DB (already configured). +- Ensure all configs use the correct port. + +### Multi-GPU prover cache conflicts +When running multiple prover instances on the same machine, the shared `.work/galileo` cache directory can cause `File exists (os error 17)` conflicts if two provers write the same temp file simultaneously. + +**Mitigation**: Ensure each prover has its own work directory, or symlink `.work/galileo` to a shared read-only cache while giving each instance a distinct write directory. Example launch script: +```bash +for i in 0 1 2 3; do + mkdir -p /tmp/prover-gpu${i}/work + ln -s /shared/cache/galileo /tmp/prover-gpu${i}/work/galileo + CUDA_VISIBLE_DEVICES=$i ./prover --config /tmp/prover-gpu${i}/config.json & +done +``` + +### Bundle proving never starts +If coordinator is actively assigning chunk/batch tasks but never assigns bundle tasks, the most likely cause is **orphan bundles** — bundle records whose corresponding batch data no longer exists in the shadow DB. + +**Diagnosis**: +```sql +-- Count bundles with no linked batches +SELECT COUNT(*) FROM bundle b +WHERE NOT EXISTS ( + SELECT 1 FROM batch bat + WHERE bat.index BETWEEN b.start_batch_index AND b.end_batch_index +); +``` + +**Root cause**: The bundle table often retains historical records from production (e.g., batch 308516+) while the batch table only holds recently imported batches (e.g., 517760+). Coordinator's `GetUnassignedBundle` picks the lowest-index bundle with `batch_proofs_status = 2`, finds it has no batches, and fails silently in a loop. + +**Fix**: +```sql +UPDATE bundle +SET batch_proofs_status = 1 +WHERE index NOT IN ( + SELECT DISTINCT b.index + FROM bundle b + JOIN batch bat ON bat.index BETWEEN b.start_batch_index AND b.end_batch_index +); +``` + +### DB data inconsistency after import +If imported chunks have `proving_status = 2` (assigned) but `proof = NULL`, coordinator may incorrectly set `batch.chunk_proofs_status = 2` and then fail when formatting batch tasks. + +**Fix**: +```sql +UPDATE chunk SET proving_status = 1, total_attempts = 0, active_attempts = 0 +WHERE proving_status = 2 AND proof IS NULL; + +UPDATE batch SET chunk_proofs_status = 0 +WHERE chunk_proofs_status != 0 + AND EXISTS ( + SELECT 1 FROM chunk c + WHERE c.batch_hash = batch.hash AND c.proving_status != 4 + ); +``` + +## Configuration Reference + +### Shadow Coordinator Config + +See `configs/shadow-coordinator-config.json` in this directory. + +Key fields: +- `db.dsn`: Points to shadow PostgreSQL +- `l2.l2geth.endpoint`: L2 RPC with `debug_executionWitness` support +- `prover_manager.verifier.verifiers`: List of verifier asset paths and fork names + +### Prover Config + +See `configs/prover-local.json` in this directory. + +Key fields: +- `sdk_config.coordinator.base_url`: Shadow coordinator API (`http://localhost:8390`) +- `circuits.galileoV2.base_url`: S3 path for circuit assets (no `/releases/` for v0.8.0) +- `sdk_config.prover.supported_proof_types`: `[1, 2, 3]` for chunk, batch, bundle + +## Rollup Relayer Dry-Run Mode + +For testing the **rollup-relayer's transaction construction logic** (e.g., `finalizeBundle` calldata) without spending real gas or modifying chain state, the sender module supports a **dry-run mode**. + +When `"dry_run": true` is set in the sender config: +- Transactions are **simulated** via `eth_call` instead of being broadcast +- `pending_transaction` table is **not** populated (avoids DB pollution) +- Nonce is still incremented to simulate real behavior +- If the `eth_call` fails (e.g., contract revert), the error is propagated just like a real send failure + +### Usage + +1. Build the rollup-relayer binary: +```bash +cd rollup && go build -o rollup_relayer ./cmd/rollup_relayer/app +``` + +2. Configure `dry_run: true` in the sender config (see `scripts/shadow-testing/configs/rollup-relayer-dryrun.json`) + +3. Start the relayer: +```bash +./rollup_relayer --config /path/to/rollup-relayer-dryrun.json +``` + +### What Dry-Run Verifies + +| Aspect | Verified? | Notes | +|--------|-----------|-------| +| Calldata encoding (ABI pack) | ✅ | `constructFinalizeBundlePayloadCodecV7` etc. | +| Gas estimation | ✅ | Full `EstimateGas` + `CreateAccessList` path | +| Contract revert | ✅ | `eth_call` returns revert reason | +| Signature / nonce | ⚠️ | Nonce incremented but tx not broadcast | +| Pending tx lifecycle | ❌ | Skipped to avoid DB pollution | +| Receipt confirmation | ❌ | No real tx = no receipt | + +For **full end-to-end** validation (including signature + receipt), use **Anvil** with `evm_snapshot`/`evm_revert` instead. + +### Anvil + Mock ScrollChain Setup (Recommended for Dry-Run) + +For the most realistic dry-run testing, deploy a minimal mock ScrollChain contract on a local Anvil node: + +```bash +# 1. Start Anvil forked from mainnet (or standalone) +anvil --fork-url https://eth-mainnet.g.alchemy.com/v2/YOUR_KEY --fork-block-number 33878313 + +# 2. Deploy mock contract (minimal Solidity with no-op commitBatches / finalizeBundle) +cat > MockScrollChain.sol << 'EOF' +// SPDX-License-Identifier: MIT +pragma solidity ^0.8.0; +contract MockScrollChain { + mapping(address => bool) public isProver; + address public owner; + constructor() { owner = msg.sender; } + function addProver(address _prover) external { + require(msg.sender == owner, "Not owner"); + isProver[_prover] = true; + } + function commitBatches(uint8 version, bytes32 parentBatchHash, bytes32 batchHash) external {} + function finalizeBundlePostEuclidV2NoProof(bytes calldata, uint256, bytes32, bytes32) external {} + function finalizeBundlePostEuclidV2(bytes calldata, uint256, bytes32, bytes32, bytes calldata) external {} +} +EOF + +# Compile and deploy +solc --bin MockScrollChain.sol -o /tmp/mock +BYTECODE=$(cat /tmp/mock/MockScrollChain.bin) +cast send --rpc-url http://localhost:18545 \ + --private-key 0xac0974bec39a17e36ba4a6b4d238ff944bacb478cbed5efcae784d7bf4f2ff80 \ + --create "0x$BYTECODE" +# → contractAddress: 0x1fA02b2d6A771842690194Cf62D91bdd92BfE28d + +# 3. Fund sender accounts and add prover +COMMIT_ADDR="0x1e32ABcfE6db15c1570709E3fC02725335f50A47" +FINALIZE_ADDR="0x33e0F539E31B35170FAaA062af703b76a8282bf7" +cast rpc anvil_setBalance "$COMMIT_ADDR" "0x3635c9adc5dea00000" --rpc-url http://localhost:18545 +cast rpc anvil_setBalance "$FINALIZE_ADDR" "0x3635c9adc5dea00000" --rpc-url http://localhost:18545 +cast send "addProver(address)" "$FINALIZE_ADDR" --rpc-url http://localhost:18545 \ + --private-key 0xac0974bec39a17e36ba4a6b4d238ff944bacb478cbed5efcae784d7bf4f2ff80 +``` + +**Key sender config changes**: +```json +{ + "sender_config": { + "endpoint": "http://localhost:18545", + "dry_run": true + } +} +``` + +**Dry-run gas estimation skip**: Anvil may fail `EstimateGas` on blob transactions or missing functions. A small patch to `rollup/internal/controller/sender/estimategas.go` skips gas estimation in dry-run mode: +```go +func (s *Sender) estimateGasLimit(...) (uint64, *types.AccessList, error) { + if s.config.DryRun { + return 10000000, nil, nil // skip estimation + } + // ... original logic +} +``` + +### What We Verified in Practice + +| Transaction | Status | Notes | +|-------------|--------|-------| +| `commitBatches` | ✅ `eth_call` succeeded | Selector `0x9bbaa2ba` via mock `commitBatches(uint8,bytes32,bytes32)` | +| `finalizeBundlePostEuclidV2NoProof` | ✅ `eth_call` succeeded | Selector `0xbd6f916b` via mock no-op | +| `finalizeBundlePostEuclidV2` (with proof) | ✅ `eth_call` succeeded | Bundle 17301 with valid `OpenVMBundleProof` | + +## Known Limitations + +1. **L1 messages**: If chunks contain L1 messages, the prover needs `scroll_getL1MessagesInBlock` RPC support. Most public RPCs don't expose this. Workaround: select chunks/blocks with no L1 messages, or use an internal RPC. In non-validium mode, the prover does not call this RPC at all. + +2. **Full batch proving**: Batch tasks require `chunk_proofs_status = 2` (all chunks proven). For quick chunk-only testing, you don't need to prove full batches. + +3. **Coordinator startup time**: First startup performs OpenVM keygen (~2-3 min). Be patient. + +4. **Circuit download**: First prover run downloads ~5-10GB of circuit assets. Ensure good internet. + +5. **Bundle vs batch count mismatch**: The shadow DB's `bundle` table may contain 10,000+ historical records while `batch` only holds ~500 recent ones. This is expected when importing production data — the bundle table retains full history but batches are truncated. **Crucially**, orphan bundles (those with no matching batches) must have `batch_proofs_status = 1` or coordinator will deadlock trying to prove them. See "Bundle proving never starts" in Troubleshooting. + +## Common DB Fixes + +After importing production data or running for extended periods, these SQL fixes resolve common coordinator deadlocks: + +### 1. Reset proving status after import +```sql +UPDATE chunk SET proving_status = 1, total_attempts = 0, active_attempts = 0; +UPDATE batch SET proving_status = 1, total_attempts = 0, active_attempts = 0, chunk_proofs_status = 0; +UPDATE bundle SET proving_status = 1, total_attempts = 0, active_attempts = 0; +``` + +### 2. Mark orphan bundles (no linked batches) +```sql +UPDATE bundle +SET batch_proofs_status = 1 +WHERE index NOT IN ( + SELECT DISTINCT b.index + FROM bundle b + JOIN batch bat ON bat.index BETWEEN b.start_batch_index AND b.end_batch_index +); +``` + +### 3. Fix stale assigned chunks without proofs +```sql +UPDATE chunk SET proving_status = 1, total_attempts = 0, active_attempts = 0 +WHERE proving_status = 2 AND proof IS NULL; + +UPDATE batch SET chunk_proofs_status = 0 +WHERE chunk_proofs_status != 0 + AND EXISTS ( + SELECT 1 FROM chunk c + WHERE c.batch_hash = batch.hash AND c.proving_status != 4 + ); +``` + +## Scripts Reference + +| Script | Purpose | +|--------|---------| +| `setup.sh` | One-command setup for PostgreSQL, coordinator, or prover | +| `import-production-data.sh` | Export from production RDS and import to shadow DB | +| `fetch-l2-blocks.py` | Fetch block headers from L2 RPC and populate `l2_block` table | diff --git a/rollup/internal/config/relayer.go b/rollup/internal/config/relayer.go index 2e50969ada..ea831179b9 100644 --- a/rollup/internal/config/relayer.go +++ b/rollup/internal/config/relayer.go @@ -37,6 +37,9 @@ type SenderConfig struct { MaxPendingBlobTxs int64 `json:"max_pending_blob_txs"` // The timestamp of the Ethereum Fusaka upgrade in seconds since epoch. FusakaTimestamp uint64 `json:"fusaka_timestamp"` + // If true, transactions will be simulated via eth_call instead of being sent to the chain. + // This is useful for testing the transaction construction logic without spending gas. + DryRun bool `json:"dry_run"` } type BatchSubmission struct { diff --git a/rollup/internal/controller/sender/sender.go b/rollup/internal/controller/sender/sender.go index 5b37473596..ece95dbebd 100644 --- a/rollup/internal/controller/sender/sender.go +++ b/rollup/internal/controller/sender/sender.go @@ -12,6 +12,7 @@ import ( "github.com/holiman/uint256" "github.com/prometheus/client_golang/prometheus" + "github.com/scroll-tech/go-ethereum" "github.com/scroll-tech/go-ethereum/common" "github.com/scroll-tech/go-ethereum/common/hexutil" gethTypes "github.com/scroll-tech/go-ethereum/core/types" @@ -205,11 +206,44 @@ func (s *Sender) getFeeData(target *common.Address, data []byte, sidecar *gethTy } // sendTransactionToMultipleClients sends a transaction to all write clients in parallel -// and returns success if at least one client succeeds +// and returns success if at least one client succeeds. +// In dry-run mode, it uses eth_call to simulate the transaction instead. func (s *Sender) sendTransactionToMultipleClients(signedTx *gethTypes.Transaction) error { ctx, cancel := context.WithTimeout(s.ctx, 15*time.Second) defer cancel() + // Dry-run mode: simulate the transaction via eth_call instead of sending it. + if s.config.DryRun { + msg := ethereum.CallMsg{ + From: s.transactionSigner.GetAddr(), + To: signedTx.To(), + Gas: signedTx.Gas(), + GasPrice: signedTx.GasPrice(), + GasTipCap: signedTx.GasTipCap(), + GasFeeCap: signedTx.GasFeeCap(), + Value: signedTx.Value(), + Data: signedTx.Data(), + } + if signedTx.Type() == gethTypes.BlobTxType { + msg.BlobHashes = signedTx.BlobHashes() + msg.BlobGasFeeCap = signedTx.BlobGasFeeCap() + } + _, err := s.client.CallContract(ctx, msg, nil) + if err != nil { + log.Warn("dry-run eth_call failed", + "txHash", signedTx.Hash().Hex(), + "nonce", signedTx.Nonce(), + "from", s.transactionSigner.GetAddr().String(), + "error", err) + return fmt.Errorf("dry-run eth_call failed: %w", err) + } + log.Info("dry-run eth_call succeeded", + "txHash", signedTx.Hash().Hex(), + "nonce", signedTx.Nonce(), + "from", s.transactionSigner.GetAddr().String()) + return nil + } + if len(s.writeClients) == 1 { // Single client - use direct approach return s.writeClients[0].SendTransaction(ctx, signedTx) @@ -342,19 +376,25 @@ func (s *Sender) SendTransaction(contextID string, target *common.Address, data return common.Hash{}, 0, fmt.Errorf("failed to create signed transaction, err: %w", err) } - // Insert the transaction into the pending transaction table. - // A corner case is that the transaction is inserted into the table but not sent to the chain, because the server is stopped in the middle. - // This case will be handled by the checkPendingTransaction function. - if err = s.pendingTransactionOrm.InsertPendingTransaction(s.ctx, contextID, s.getSenderMeta(), signedTx, blockNumber); err != nil { - log.Error("failed to insert transaction", "from", s.transactionSigner.GetAddr().String(), "nonce", s.transactionSigner.GetNonce(), "err", err) - return common.Hash{}, 0, fmt.Errorf("failed to insert transaction, err: %w", err) + // In dry-run mode, skip pending transaction tracking to avoid polluting the DB. + if !s.config.DryRun { + // Insert the transaction into the pending transaction table. + // A corner case is that the transaction is inserted into the table but not sent to the chain, because the server is stopped in the middle. + // This case will be handled by the checkPendingTransaction function. + if err = s.pendingTransactionOrm.InsertPendingTransaction(s.ctx, contextID, s.getSenderMeta(), signedTx, blockNumber); err != nil { + log.Error("failed to insert transaction", "from", s.transactionSigner.GetAddr().String(), "nonce", s.transactionSigner.GetNonce(), "err", err) + return common.Hash{}, 0, fmt.Errorf("failed to insert transaction, err: %w", err) + } } if err := s.sendTransactionToMultipleClients(signedTx); err != nil { - // Delete the transaction from the pending transaction table if it fails to send. - if updateErr := s.pendingTransactionOrm.DeleteTransactionByTxHash(s.ctx, signedTx.Hash()); updateErr != nil { - log.Error("failed to delete transaction", "tx hash", signedTx.Hash().String(), "from", s.transactionSigner.GetAddr().String(), "nonce", signedTx.Nonce(), "err", updateErr) - return common.Hash{}, 0, fmt.Errorf("failed to delete transaction, err: %w", updateErr) + // In dry-run mode, skip pending transaction cleanup. + if !s.config.DryRun { + // Delete the transaction from the pending transaction table if it fails to send. + if updateErr := s.pendingTransactionOrm.DeleteTransactionByTxHash(s.ctx, signedTx.Hash()); updateErr != nil { + log.Error("failed to delete transaction", "tx hash", signedTx.Hash().String(), "from", s.transactionSigner.GetAddr().String(), "nonce", signedTx.Nonce(), "err", updateErr) + return common.Hash{}, 0, fmt.Errorf("failed to delete transaction, err: %w", updateErr) + } } log.Error("failed to send tx", "tx hash", signedTx.Hash().String(), "from", s.transactionSigner.GetAddr().String(), "nonce", signedTx.Nonce(), "err", err) diff --git a/scripts/shadow-testing/.env.example b/scripts/shadow-testing/.env.example new file mode 100644 index 0000000000..3a44df2cc2 --- /dev/null +++ b/scripts/shadow-testing/.env.example @@ -0,0 +1,57 @@ +# Shadow Coordinator + Prover Environment Variables +# Copy this file to .env and fill in real values + +# ============================================================================ +# PRODUCTION RDS (read-only, via IDC port-forward) +# ============================================================================ +PROD_DB_HOST=localhost +PROD_DB_PORT=15432 +PROD_DB_NAME=rollup +PROD_DB_USER=YOUR_PROD_USER_HERE +PROD_DB_PASSWORD=YOUR_PROD_PASSWORD_HERE + +# Full DSN (constructed from above, or override directly) +# PROD_DB=postgresql://YOUR_PROD_USER_HERE:YOUR_PROD_PASSWORD_HERE@localhost:15432/rollup + +# ============================================================================ +# SHADOW DATABASE (local PostgreSQL in Docker) +# ============================================================================ +SHADOW_DB_HOST=localhost +SHADOW_DB_PORT=5433 +SHADOW_DB_NAME=shadow_rollup +SHADOW_DB_USER=postgres +SHADOW_DB_PASSWORD=YOUR_SHADOW_PASSWORD_HERE + +# Full DSN (constructed from above, or override directly) +# SHADOW_DB=postgresql://postgres:YOUR_SHADOW_PASSWORD_HERE@localhost:5433/shadow_rollup + +# ============================================================================ +# COORDINATOR AUTH +# ============================================================================ +# JWT secret for prover login challenge-response. +# MUST match between coordinator config and prover expectations. +COORDINATOR_AUTH_SECRET=YOUR_RANDOM_SECRET_HERE + +# ============================================================================ +# DOCKER IMAGE TAG +# ============================================================================ +IMAGE_TAG=v4.7.13-openvm16 + +# ============================================================================ +# L2 RPC ENDPOINT +# Must support debug_executionWitness and debug_dbGet. +# https://mainnet-rpc.scroll.io works; https://rpc.scroll.io does NOT. +# ============================================================================ +L2_RPC=https://mainnet-rpc.scroll.io + +# ============================================================================ +# VERIFIER ASSETS PATH +# Directory containing subdirectories: openvm-0.5.6, openvm-v0.7.1, openvm-v0.8.0 +# ============================================================================ +VERIFIER_DIR=/tmp/shadow-verifier-assets + +# ============================================================================ +# DATA IMPORT LIMITS +# ============================================================================ +BATCH_LIMIT=50 +BUNDLE_LIMIT=20000 diff --git a/scripts/shadow-testing/QUICKSTART.md b/scripts/shadow-testing/QUICKSTART.md new file mode 100644 index 0000000000..a0c3b057ef --- /dev/null +++ b/scripts/shadow-testing/QUICKSTART.md @@ -0,0 +1,84 @@ +# Quick Start: Shadow Coordinator + Prover + +For full details, see `docs/shadow-testing/README.md`. + +## Prerequisites + +1. **IDC port-forward active**: Mainnet RDS on `localhost:15432` +2. **Docker installed** with GPU support (for prover) +3. **Verifier assets** at `/tmp/shadow-verifier-assets/` (feynman, galileo, galileoV2) +4. **SRS params** at `~/.openvm/params/` (kzg_bn254_22.srs, kzg_bn254_23.srs, kzg_bn254_24.srs) + +## One-Command Setup + +```bash +cd scripts/shadow-testing + +# Step 1: Start PostgreSQL +./setup.sh --postgres + +# Step 2: Import production tasks (requires RDS port-forward) +./import-production-data.sh + +# Step 3: Fetch L2 block headers +python3 fetch-l2-blocks.py \ + --rpc https://mainnet-rpc.scroll.io \ + --db "postgresql://:@localhost:5433/shadow_rollup" \ + --start-block 33750000 --end-block 33770000 + +# Step 4: Link blocks to chunks +psql "postgresql://:@localhost:5433/shadow_rollup" -c " + UPDATE l2_block lb SET chunk_hash = c.hash + FROM chunk c + WHERE lb.number >= c.start_block_number AND lb.number <= c.end_block_number; +" + +# Step 5: Start coordinator (takes 2-3 min) +./setup.sh --coordinator + +# Step 6: Start prover (in another terminal) +./setup.sh --prover +``` + +## Monitoring + +```bash +# Check everything is running +./setup.sh --status + +# Watch coordinator logs +docker logs -f shadow-coordinator-api-test --tail 100 + +# Watch prover logs (if using docker) +docker logs -f shadow-prover --tail 100 + +# Check task assignment +psql "postgresql://:@localhost:5433/shadow_rollup" -c " + SELECT proving_status, COUNT(*) FROM chunk GROUP BY proving_status; +" +``` + +## Stop Everything + +```bash +./setup.sh --stop +``` + +## Key Configuration Files + +| File | Purpose | +|------|---------| +| `configs/shadow-coordinator-config.json` | Coordinator config template | +| `configs/prover-local.json` | Prover config template | +| `/tmp/shadow-coordinator-config.json` | Generated coordinator config (with L2 RPC) | +| `/tmp/prover-local.json` | Generated prover config | + +## Environment Variables + +| Variable | Default | Description | +|----------|---------|-------------| +| `PROD_DB` | `postgresql://...localhost:15432/rollup` | Production RDS connection | +| `SHADOW_DB` | `postgresql://...localhost:5433/shadow_rollup` | Shadow DB connection | +| `VERIFIER_DIR` | `/tmp/shadow-verifier-assets` | Verifier asset path | +| `IMAGE_TAG` | `v4.7.13-openvm16` | Docker image tag | +| `L2_RPC` | `https://mainnet-rpc.scroll.io` | L2 RPC endpoint | diff --git a/scripts/shadow-testing/configs/prover-local.json b/scripts/shadow-testing/configs/prover-local.json new file mode 100644 index 0000000000..db301609e7 --- /dev/null +++ b/scripts/shadow-testing/configs/prover-local.json @@ -0,0 +1,24 @@ +{ + "sdk_config": { + "prover_name_prefix": "galileo6-shadowfork-prover", + "keys_dir": ".work", + "coordinator": { + "base_url": "http://localhost:8390", + "retry_count": 10, + "retry_wait_time_sec": 10, + "connection_timeout_sec": 1800 + }, + "prover": { + "supported_proof_types": [1, 2, 3], + "circuit_version": "v0.13.1" + }, + "health_listener_addr": "127.0.0.1:10080", + "db_path": ".work/db" + }, + "circuits": { + "galileoV2": { + "base_url": "https://circuit-release.s3.us-west-2.amazonaws.com/scroll-zkvm/galileov2/", + "workspace_path": ".work/galileo" + } + } +} diff --git a/scripts/shadow-testing/configs/rollup-relayer-dryrun.json b/scripts/shadow-testing/configs/rollup-relayer-dryrun.json new file mode 100644 index 0000000000..9971fc68b7 --- /dev/null +++ b/scripts/shadow-testing/configs/rollup-relayer-dryrun.json @@ -0,0 +1,75 @@ +{ + "l2_config": { + "confirmations": "0x10", + "endpoint": "http://10.6.13.141:8545", + "l2_message_queue_address": "0x5300000000000000000000000000000000000000", + "relayer_config": { + "rollup_contract_address": "0xa13BAF47339d63B743e7Da8741db5456DAc1E556", + "sender_config": { + "endpoint": "https://rpc.scroll.io", + "write_endpoints": [ + "https://rpc.scroll.io" + ], + "escalate_blocks": 6, + "escalate_multiple_num": 12, + "escalate_multiple_den": 10, + "min_gas_tip": 2000000000, + "max_gas_price": 2000000000000, + "max_blob_gas_price": 500000000000, + "tx_type": "DynamicFeeTx", + "check_pending_time": 10, + "confirmations": "0x6", + "max_pending_blob_txs": 6, + "fusaka_timestamp": 1764798551, + "dry_run": true + }, + "batch_submission": { + "min_batches": 3, + "max_batches": 3, + "timeout": 8400, + "backlog_max": 75, + "blob_fee_tolerance": 50 + }, + "chain_monitor": { + "enabled": false, + "timeout": 3, + "try_times": 5, + "base_url": "http://localhost:8080" + }, + "commit_sender_signer_config": { + "signer_type": "PrivateKey", + "private_key_signer_config": { + "private_key": "0x0000000000000000000000000000000000000000000000000000000000000001" + } + }, + "finalize_sender_signer_config": { + "signer_type": "PrivateKey", + "private_key_signer_config": { + "private_key": "0x0000000000000000000000000000000000000000000000000000000000000002" + } + } + }, + "chunk_proposer_config": { + "propose_interval_milliseconds": 100, + "max_l2_gas_per_chunk": 24000000, + "chunk_timeout_sec": 3600, + "max_uncompressed_batch_bytes_size": 8388608 + }, + "batch_proposer_config": { + "propose_interval_milliseconds": 1000, + "batch_timeout_sec": 720000, + "max_chunks_per_batch": 45, + "max_uncompressed_batch_bytes_size": 8388608 + }, + "bundle_proposer_config": { + "max_batch_num_per_bundle": 45, + "bundle_timeout_sec": 5400 + } + }, + "db_config": { + "driver_name": "postgres", + "dsn": "postgresql://:@localhost:5433/shadow_rollup?sslmode=disable", + "maxOpenNum": 50, + "maxIdleNum": 20 + } +} diff --git a/scripts/shadow-testing/configs/shadow-coordinator-config.json b/scripts/shadow-testing/configs/shadow-coordinator-config.json new file mode 100644 index 0000000000..9cc08f37a0 --- /dev/null +++ b/scripts/shadow-testing/configs/shadow-coordinator-config.json @@ -0,0 +1,45 @@ +{ + "prover_manager": { + "provers_per_session": 1, + "session_attempts": 100, + "external_prover_threshold": 10, + "chunk_collection_time_sec": 3600, + "batch_collection_time_sec": 2700, + "bundle_collection_time_sec": 3600, + "verifier": { + "min_prover_version": "v4.5.32", + "verifiers": [ + { + "features": "legacy_witness:openvm_13", + "assets_path": "/verifier/openvm-0.5.6", + "fork_name": "feynman" + }, + { + "assets_path": "/verifier/openvm-v0.7.1", + "fork_name": "galileo" + }, + { + "assets_path": "/verifier/openvm-v0.8.0", + "fork_name": "galileoV2" + } + ] + } + }, + "db": { + "driver_name": "postgres", + "dsn": "postgresql://:@localhost:5433/shadow_rollup?sslmode=disable", + "maxOpenNum": 200, + "maxIdleNum": 20 + }, + "l2": { + "chain_id": 534352, + "l2geth": { + "endpoint": "https://mainnet-rpc.scroll.io" + } + }, + "auth": { + "secret": "", + "challenge_expire_duration_sec": 10, + "login_expire_duration_sec": 3600 + } +} diff --git a/scripts/shadow-testing/fetch-l2-blocks.py b/scripts/shadow-testing/fetch-l2-blocks.py new file mode 100755 index 0000000000..9948de66ea --- /dev/null +++ b/scripts/shadow-testing/fetch-l2-blocks.py @@ -0,0 +1,191 @@ +#!/usr/bin/env python3 +""" +Fetch L2 block headers from RPC and populate l2_block table in shadow DB. + +The coordinator needs l2_block records to format chunk tasks (for block hashes +and hardfork name resolution). This script fetches blocks in batches and +inserts them into the shadow database. + +Usage: + python3 fetch-l2-blocks.py --rpc https://mainnet-rpc.scroll.io \ + --db "postgresql://:@localhost:5433/shadow_rollup" \ + --start-block 26000000 --end-block 27000000 + +After running, link blocks to chunks: + UPDATE l2_block lb + SET chunk_hash = c.hash + FROM chunk c + WHERE lb.number >= c.start_block_number + AND lb.number <= c.end_block_number; +""" + +import argparse +import sys +import time +import concurrent.futures +from typing import Optional + +import requests +import psycopg2 +from psycopg2.extras import execute_values + + +def fetch_block_batch(rpc_url: str, block_numbers: list[int]) -> list[dict]: + """Fetch multiple blocks via batch JSON-RPC request.""" + payload = [ + { + "jsonrpc": "2.0", + "method": "eth_getBlockByNumber", + "params": [hex(num), False], + "id": i, + } + for i, num in enumerate(block_numbers) + ] + + try: + resp = requests.post(rpc_url, json=payload, headers={"Content-Type": "application/json"}, timeout=60) + resp.raise_for_status() + results = resp.json() + + blocks = [] + for result in results: + if "error" in result: + print(f" Error fetching block: {result['error']}", file=sys.stderr) + continue + block = result.get("result") + if block is None: + continue + blocks.append(block) + return blocks + except Exception as e: + print(f" Request failed: {e}", file=sys.stderr) + return [] + + +def insert_blocks(db_url: str, blocks: list[dict]) -> int: + """Insert blocks into l2_block table.""" + if not blocks: + return 0 + + rows = [] + for block in blocks: + try: + number = int(block["number"], 16) + hash_val = block["hash"] + parent_hash = block["parentHash"] + timestamp = int(block["timestamp"], 16) + gas_used = int(block["gasUsed"], 16) + rows.append((number, hash_val, parent_hash, timestamp, gas_used)) + except (KeyError, ValueError) as e: + print(f" Skipping malformed block: {e}", file=sys.stderr) + continue + + if not rows: + return 0 + + conn = psycopg2.connect(db_url) + try: + with conn.cursor() as cur: + execute_values( + cur, + """ + INSERT INTO l2_block (number, hash, parent_hash, timestamp, gas_used) + VALUES %s + ON CONFLICT (number) DO UPDATE SET + hash = EXCLUDED.hash, + parent_hash = EXCLUDED.parent_hash, + timestamp = EXCLUDED.timestamp, + gas_used = EXCLUDED.gas_used + """, + rows, + ) + conn.commit() + return len(rows) + finally: + conn.close() + + +def get_existing_block_range(db_url: str) -> tuple[Optional[int], Optional[int]]: + """Get min/max block numbers already in the DB.""" + conn = psycopg2.connect(db_url) + try: + with conn.cursor() as cur: + cur.execute("SELECT MIN(number), MAX(number) FROM l2_block") + return cur.fetchone() + finally: + conn.close() + + +def main(): + parser = argparse.ArgumentParser(description="Fetch L2 blocks into shadow DB") + parser.add_argument("--rpc", required=True, help="L2 RPC endpoint URL") + parser.add_argument("--db", required=True, help="Shadow DB connection string") + parser.add_argument("--start-block", type=int, required=True, help="First block to fetch") + parser.add_argument("--end-block", type=int, required=True, help="Last block to fetch") + parser.add_argument("--batch-size", type=int, default=100, help="RPC batch size (default: 100)") + parser.add_argument("--workers", type=int, default=4, help="Concurrent workers (default: 4)") + parser.add_argument("--delay", type=float, default=0.1, help="Delay between batches in seconds (default: 0.1)") + parser.add_argument("--skip-existing", action="store_true", help="Skip blocks already in DB") + args = parser.parse_args() + + existing_min, existing_max = get_existing_block_range(args.db) + print(f"Existing blocks in DB: {existing_min or 'none'} to {existing_max or 'none'}") + + start = args.start_block + end = args.end_block + + if args.skip_existing and existing_min is not None: + # Only fetch gaps or new blocks + # Simple approach: just fetch the requested range, ON CONFLICT will handle it + pass + + total_blocks = end - start + 1 + print(f"Fetching {total_blocks} blocks from {start} to {end} via {args.rpc}") + + fetched = 0 + failed = 0 + + # Generate batch ranges + ranges = [] + current = start + while current <= end: + batch_end = min(current + args.batch_size - 1, end) + ranges.append(list(range(current, batch_end + 1))) + current = batch_end + 1 + + with concurrent.futures.ThreadPoolExecutor(max_workers=args.workers) as executor: + futures = { + executor.submit(fetch_block_batch, args.rpc, block_nums): block_nums + for block_nums in ranges + } + + for future in concurrent.futures.as_completed(futures): + block_nums = futures[future] + try: + blocks = future.result() + if blocks: + inserted = insert_blocks(args.db, blocks) + fetched += inserted + print(f" Blocks {block_nums[0]}-{block_nums[-1]}: inserted {inserted}/{len(blocks)}") + else: + failed += len(block_nums) + print(f" Blocks {block_nums[0]}-{block_nums[-1]}: FAILED") + except Exception as e: + failed += len(block_nums) + print(f" Blocks {block_nums[0]}-{block_nums[-1]}: ERROR {e}") + + time.sleep(args.delay) + + print(f"\nDone! Fetched: {fetched}, Failed: {failed}") + print("\nNext step: link blocks to chunks:") + print(""" + UPDATE l2_block lb + SET chunk_hash = c.hash + FROM chunk c + WHERE lb.number >= c.start_block_number + AND lb.number <= c.end_block_number; + """) + + +if __name__ == "__main__": + main() diff --git a/scripts/shadow-testing/import-production-data.sh b/scripts/shadow-testing/import-production-data.sh new file mode 100755 index 0000000000..708d6fb9fa --- /dev/null +++ b/scripts/shadow-testing/import-production-data.sh @@ -0,0 +1,134 @@ +#!/bin/bash +set -euo pipefail + +# Import Production Task Data into Shadow DB +# This script exports recent batches/chunks/bundles from production RDS +# and imports them into the local shadow database. + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +# Load .env if present +if [ -f "$SCRIPT_DIR/.env" ]; then + export $(grep -v '^#' "$SCRIPT_DIR/.env" | xargs) +fi + +# Build DSNs from components if not already set +PROD_DB_PASSWORD="${PROD_DB_PASSWORD:-}" +SHADOW_DB_PASSWORD="${SHADOW_DB_PASSWORD:-}" +PROD_DB="${PROD_DB:-postgresql://$PROD_DB_USER:$PROD_DB_PASSWORD@$PROD_DB_HOST:$PROD_DB_PORT/$PROD_DB_NAME}" +SHADOW_DB="${SHADOW_DB:-postgresql://$SHADOW_DB_USER:$SHADOW_DB_PASSWORD@$SHADOW_DB_HOST:$SHADOW_DB_PORT/$SHADOW_DB_NAME}" + +BATCH_LIMIT="${BATCH_LIMIT:-50}" +BUNDLE_LIMIT="${BUNDLE_LIMIT:-20000}" + +# Derived paths +EXPORT_DIR="${EXPORT_DIR:-/tmp/shadow-export}" +mkdir -p "$EXPORT_DIR" + +log_info() { + echo "[INFO] $1" +} + +log_error() { + echo "[ERROR] $1" >&2 +} + +# Verify connectivity +log_info "Checking production RDS connectivity..." +if ! psql "$PROD_DB" -c "SELECT 1;" >/dev/null 2>&1; then + log_error "Cannot connect to production RDS at $PROD_DB" + log_error "Ensure IDC port-forward is active (e.g., ssh -L 15432:...)" + exit 1 +fi + +log_info "Checking shadow DB connectivity..." +if ! psql "$SHADOW_DB" -c "SELECT 1;" >/dev/null 2>&1; then + log_error "Cannot connect to shadow DB at $SHADOW_DB" + log_error "Run: ./setup.sh --postgres" + exit 1 +fi + +# Get export timestamp +TIMESTAMP=$(date +%Y%m%d_%H%M%S) +log_info "Starting export at $TIMESTAMP" + +# Export batches +log_info "Exporting latest $BATCH_LIMIT batches from production..." +psql "$PROD_DB" -c " + COPY ( + SELECT * FROM batch + ORDER BY index DESC + LIMIT $BATCH_LIMIT + ) TO STDOUT WITH CSV HEADER; +" > "$EXPORT_DIR/batches_$TIMESTAMP.csv" + +BATCH_COUNT=$(tail -n +2 "$EXPORT_DIR/batches_$TIMESTAMP.csv" | wc -l) +log_info "Exported $BATCH_COUNT batches" + +# Get batch index range for chunk export +read -r MIN_BATCH_INDEX MAX_BATCH_INDEX <<< $(psql "$PROD_DB" -t -c " + SELECT MIN(index), MAX(index) FROM ( + SELECT index FROM batch ORDER BY index DESC LIMIT $BATCH_LIMIT + ) t; +" | xargs) + +# Export chunks belonging to these batches +log_info "Exporting chunks for batches $MIN_BATCH_INDEX to $MAX_BATCH_INDEX..." +psql "$PROD_DB" -c " + COPY ( + SELECT c.* FROM chunk c + JOIN batch b ON b.start_chunk_index <= c.index AND c.index <= b.end_chunk_index + WHERE b.index >= $MIN_BATCH_INDEX AND b.index <= $MAX_BATCH_INDEX + ORDER BY c.index + ) TO STDOUT WITH CSV HEADER; +" > "$EXPORT_DIR/chunks_$TIMESTAMP.csv" + +CHUNK_COUNT=$(tail -n +2 "$EXPORT_DIR/chunks_$TIMESTAMP.csv" | wc -l) +log_info "Exported $CHUNK_COUNT chunks" + +# Export bundles +log_info "Exporting latest $BUNDLE_LIMIT bundles..." +psql "$PROD_DB" -c " + COPY ( + SELECT * FROM bundle + ORDER BY index DESC + LIMIT $BUNDLE_LIMIT + ) TO STDOUT WITH CSV HEADER; +" > "$EXPORT_DIR/bundles_$TIMESTAMP.csv" + +BUNDLE_COUNT=$(tail -n +2 "$EXPORT_DIR/bundles_$TIMESTAMP.csv" | wc -l) +log_info "Exported $BUNDLE_COUNT bundles" + +# Truncate shadow tables +log_info "Clearing shadow tables..." +psql "$SHADOW_DB" -c "TRUNCATE batch, chunk, bundle CASCADE;" + +# Import into shadow DB +log_info "Importing batches..." +psql "$SHADOW_DB" -c "\\copy batch FROM '$EXPORT_DIR/batches_$TIMESTAMP.csv' WITH CSV HEADER;" + +log_info "Importing chunks..." +psql "$SHADOW_DB" -c "\\copy chunk FROM '$EXPORT_DIR/chunks_$TIMESTAMP.csv' WITH CSV HEADER;" + +log_info "Importing bundles..." +psql "$SHADOW_DB" -c "\\copy bundle FROM '$EXPORT_DIR/bundles_$TIMESTAMP.csv' WITH CSV HEADER;" + +# Reset proving status +log_info "Resetting proving status to unassigned..." +psql "$SHADOW_DB" -c " + UPDATE chunk SET proving_status = 1, total_attempts = 0, active_attempts = 0; + UPDATE batch SET proving_status = 1, total_attempts = 0, active_attempts = 0, chunk_proofs_status = 0; + UPDATE bundle SET proving_status = 1, total_attempts = 0, active_attempts = 0; +" + +# Summary +log_info "Import complete!" +psql "$SHADOW_DB" -c " + SELECT 'batch' as table, COUNT(*) as cnt FROM batch + UNION ALL SELECT 'chunk', COUNT(*) FROM chunk + UNION ALL SELECT 'bundle', COUNT(*) FROM bundle; +" + +log_info "Export files saved to: $EXPORT_DIR" diff --git a/scripts/shadow-testing/setup.sh b/scripts/shadow-testing/setup.sh new file mode 100755 index 0000000000..d2e035a74e --- /dev/null +++ b/scripts/shadow-testing/setup.sh @@ -0,0 +1,294 @@ +#!/bin/bash +set -euo pipefail + +# Shadow Coordinator + Prover Setup Script +# Usage: ./setup.sh [--postgres] [--coordinator] [--prover] [--all] + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +CONFIG_DIR="$SCRIPT_DIR/configs" + +# Load .env if present +if [ -f "$SCRIPT_DIR/.env" ]; then + export $(grep -v '^#' "$SCRIPT_DIR/.env" | xargs) +fi + +# Build DSNs from components if not already set +PROD_DB_PASSWORD="${PROD_DB_PASSWORD:-}" +SHADOW_DB_PASSWORD="${SHADOW_DB_PASSWORD:-}" +PROD_DB="${PROD_DB:-postgresql://$PROD_DB_USER:$PROD_DB_PASSWORD@$PROD_DB_HOST:$PROD_DB_PORT/$PROD_DB_NAME}" +SHADOW_DB="${SHADOW_DB:-postgresql://$SHADOW_DB_USER:$SHADOW_DB_PASSWORD@$SHADOW_DB_HOST:$SHADOW_DB_PORT/$SHADOW_DB_NAME}" + +VERIFIER_DIR="${VERIFIER_DIR:-/tmp/shadow-verifier-assets}" +IMAGE_TAG="${IMAGE_TAG:-v4.7.13-openvm16}" +L2_RPC="${L2_RPC:-https://mainnet-rpc.scroll.io}" + +show_help() { + cat <&2 +} + +wait_for_postgres() { + local max_attempts=30 + local attempt=1 + while [ $attempt -le $max_attempts ]; do + if docker exec shadow-coordinator-postgres pg_isready -U postgres >/dev/null 2>&1; then + log_info "PostgreSQL is ready" + return 0 + fi + log_info "Waiting for PostgreSQL... ($attempt/$max_attempts)" + sleep 2 + ((attempt++)) + done + log_error "PostgreSQL failed to start" + return 1 +} + +setup_postgres() { + log_info "Setting up shadow PostgreSQL..." + + # Stop and remove existing container + if docker ps -a --format '{{.Names}}' | grep -q '^shadow-coordinator-postgres$'; then + log_info "Removing existing PostgreSQL container..." + docker rm -f shadow-coordinator-postgres >/dev/null + fi + + docker run -d \ + --name shadow-coordinator-postgres \ + -e POSTGRES_USER=postgres \ + -e POSTGRES_PASSWORD="${SHADOW_DB_PASSWORD:?SHADOW_DB_PASSWORD must be set}" \ + -e POSTGRES_DB=shadow_rollup \ + -p 5433:5432 \ + -v shadow-coordinator-postgres-data:/var/lib/postgresql/data \ + postgres:15 >/dev/null + + wait_for_postgres + + # Apply migrations if db_cli image is available + if docker images --format '{{.Repository}}:{{.Tag}}' | grep -q "zhuoatscroll/db_cli:$IMAGE_TAG"; then + log_info "Running database migrations..." + docker run --rm \ + --network host \ + -e DATABASE_URL="$SHADOW_DB?sslmode=disable" \ + zhuoatscroll/db_cli:$IMAGE_TAG \ + migrate up || log_info "Migration may have failed or already applied" + fi + + log_info "PostgreSQL setup complete at $SHADOW_DB" +} + +setup_coordinator() { + log_info "Setting up shadow coordinator..." + + # Check prerequisites + if [ ! -d "$VERIFIER_DIR/openvm-0.5.6" ] || [ ! -d "$VERIFIER_DIR/openvm-v0.8.0" ]; then + log_error "Verifier assets not found at $VERIFIER_DIR" + log_error "Please download verifier assets first." + exit 1 + fi + + # Generate config with correct L2 RPC + local config_file="/tmp/shadow-coordinator-config.json" + cp "$CONFIG_DIR/shadow-coordinator-config.json" "$config_file" + # Update L2 RPC if different from default + if [ "$L2_RPC" != "https://mainnet-rpc.scroll.io" ]; then + sed -i "s|https://mainnet-rpc.scroll.io|$L2_RPC|g" "$config_file" + fi + + # Stop existing container + if docker ps -a --format '{{.Names}}' | grep -q '^shadow-coordinator-api-test$'; then + log_info "Removing existing coordinator container..." + docker rm -f shadow-coordinator-api-test >/dev/null + fi + + # Kill any stale coordinator processes on host + pkill -f "coordinator_api" 2>/dev/null || true + + log_info "Starting coordinator container (this will take 2-3 min for OpenVM keygen)..." + docker run -d \ + --name shadow-coordinator-api-test \ + --network host \ + -v "$config_file":/app/conf/config.json \ + -v "$VERIFIER_DIR":/verifier:ro \ + zhuoatscroll/coordinator-api:$IMAGE_TAG >/dev/null + + log_info "Waiting for coordinator to start..." + local attempt=1 + local max_attempts=60 + while [ $attempt -le $max_attempts ]; do + if docker logs shadow-coordinator-api-test 2>&1 | grep -q "Start coordinator api successfully"; then + log_info "Coordinator is ready at http://localhost:8390" + return 0 + fi + if ! docker ps --format '{{.Names}}' | grep -q '^shadow-coordinator-api-test$'; then + log_error "Coordinator container exited unexpectedly" + docker logs shadow-coordinator-api-test --tail 50 + exit 1 + fi + echo -n "." + sleep 5 + ((attempt++)) + done + log_error "Coordinator failed to start within timeout" + docker logs shadow-coordinator-api-test --tail 100 + exit 1 +} + +setup_prover() { + log_info "Setting up prover..." + + # Check for prover binary or use docker + local prover_binary="" + if [ -f "$SCRIPT_DIR/../../target/release/prover" ]; then + prover_binary="$SCRIPT_DIR/../../target/release/prover" + elif [ -f "$(pwd)/target/release/prover" ]; then + prover_binary="$(pwd)/target/release/prover" + fi + + local config_file="/tmp/prover-local.json" + cp "$CONFIG_DIR/prover-local.json" "$config_file" + + if [ -n "$prover_binary" ]; then + log_info "Using local prover binary: $prover_binary" + log_info "Starting prover..." + "$prover_binary" --config "$config_file" & + log_info "Prover started in background (PID: $!)" + log_info "Monitor with: tail -f /tmp/prover.log" + else + log_info "Prover binary not found, using Docker..." + if docker ps -a --format '{{.Names}}' | grep -q '^shadow-prover$'; then + docker rm -f shadow-prover >/dev/null + fi + docker run -d \ + --name shadow-prover \ + --network host \ + --gpus all \ + -v "$config_file":/app/config.json \ + -v "$HOME/.openvm/params":/root/.openvm/params:ro \ + zhuoatscroll/prover:$IMAGE_TAG >/dev/null + log_info "Prover container started" + fi + + log_info "Prover health check: curl http://localhost:10080/health" +} + +stop_all() { + log_info "Stopping all shadow services..." + docker rm -f shadow-coordinator-api-test 2>/dev/null || true + docker rm -f shadow-prover 2>/dev/null || true + docker rm -f shadow-coordinator-postgres 2>/dev/null || true + pkill -f "coordinator_api" 2>/dev/null || true + pkill -f "prover " 2>/dev/null || true + log_info "All shadow services stopped" +} + +show_status() { + echo "=== Shadow Services Status ===" + echo "" + echo "Containers:" + docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}' | grep -E 'shadow|NAMES' || echo " No shadow containers running" + echo "" + echo "Port usage:" + ss -tlnp 2>/dev/null | grep -E '8390|5433|10080' || echo " No shadow ports in use" + echo "" + echo "Database:" + if docker exec shadow-coordinator-postgres pg_isready -U postgres >/dev/null 2>&1; then + echo " PostgreSQL: RUNNING on :5433" + psql "$SHADOW_DB" -c "SELECT 'batch' as table, COUNT(*) as cnt FROM batch UNION ALL SELECT 'chunk', COUNT(*) FROM chunk UNION ALL SELECT 'bundle', COUNT(*) FROM bundle UNION ALL SELECT 'l2_block', COUNT(*) FROM l2_block;" 2>/dev/null || echo " (Unable to query)" + else + echo " PostgreSQL: NOT RUNNING" + fi + echo "" + echo "Coordinator API:" + if curl -s http://localhost:8390/ >/dev/null 2>&1; then + echo " Coordinator: RESPONDING on :8390" + else + echo " Coordinator: NOT RESPONDING" + fi + echo "" + echo "Prover:" + if curl -s http://localhost:10080/health >/dev/null 2>&1; then + echo " Prover: RESPONDING on :10080" + else + echo " Prover: NOT RESPONDING" + fi +} + +# Main +if [ $# -eq 0 ]; then + show_help + exit 0 +fi + +while [ $# -gt 0 ]; do + case "$1" in + --postgres) + setup_postgres + ;; + --coordinator) + setup_coordinator + ;; + --prover) + setup_prover + ;; + --all) + setup_postgres + setup_coordinator + setup_prover + ;; + --stop) + stop_all + ;; + --status) + show_status + ;; + -h|--help) + show_help + exit 0 + ;; + *) + log_error "Unknown option: $1" + show_help + exit 1 + ;; + esac + shift +done