Manual end-to-end tests verify the full CodeForge lifecycle against a real GitHub repository. These tests use real CLI execution (Claude Code / Codex), real GitHub API calls, and validate the complete flow from task creation to PR cleanup.
- Dev environment running:
task dev(ortask dev:detach) - GitHub access key registered as
my-github(or any name — adjustprovider_keybelow) ghCLI authenticated (for PR verification and cleanup)- Test repository:
https://github.com/freema/fb-pilot.git(or any repo accessible with the registered key)
# Verify environment
curl -s http://localhost:8080/health | jq .
curl -s -H "Authorization: Bearer dev-token" http://localhost:8080/api/v1/auth/verify | jq .
curl -s -H "Authorization: Bearer dev-token" http://localhost:8080/api/v1/cli | jq .
curl -s -H "Authorization: Bearer dev-token" http://localhost:8080/api/v1/keys | jq .All tests use these values — adjust as needed:
export BASE=http://localhost:8080
export TOKEN="dev-token"
export AUTH="Authorization: Bearer $TOKEN"
export REPO="https://github.com/freema/fb-pilot.git"
export PROVIDER_KEY="my-github"
export GH_REPO="freema/fb-pilot"Full lifecycle: clone → Claude Code task → follow-up instruction → create PR → verify → cleanup.
TASK=$(curl -s -X POST $BASE/api/v1/tasks \
-H "$AUTH" -H "Content-Type: application/json" \
-d "{
\"repo_url\": \"$REPO\",
\"provider_key\": \"$PROVIDER_KEY\",
\"prompt\": \"Add a comment at the top of README.md: // E2E Test 1\",
\"config\": {\"cli\": \"claude-code\", \"timeout_seconds\": 300}
}")
TASK_ID=$(echo $TASK | jq -r .id)
echo "Task: $TASK_ID"while true; do
STATUS=$(curl -s -H "$AUTH" "$BASE/api/v1/tasks/$TASK_ID" | jq -r .status)
echo "Status: $STATUS"
[ "$STATUS" = "completed" ] || [ "$STATUS" = "failed" ] && break
sleep 5
doneExpected: status: completed, result contains description of changes, iteration: 1.
curl -s -X POST "$BASE/api/v1/tasks/$TASK_ID/instruct" \
-H "$AUTH" -H "Content-Type: application/json" \
-d '{"prompt": "Add a second comment line: // E2E Test 1 follow-up"}' | jq .Expected: iteration: 2, status: awaiting_instruction.
Wait for completion again (same polling loop). Then verify iterations:
curl -s -H "$AUTH" "$BASE/api/v1/tasks/$TASK_ID?include=iterations" | jq '.iterations | length'Expected: 2 iterations.
curl -s -X POST "$BASE/api/v1/tasks/$TASK_ID/create-pr" \
-H "$AUTH" -H "Content-Type: application/json" -d '{}' | jq .Expected: pr_url, pr_number, branch starting with codeforge/.
# Verify PR exists on GitHub
gh pr view <PR_NUMBER> --repo $GH_REPO --json title,state,headRefName
# Cleanup
gh pr close <PR_NUMBER> --repo $GH_REPO --delete-branchExpected: PR is OPEN, title starts with "CodeForge:". After close: branch deleted.
Flow: clone → Claude task → code review (Codex) → create PR → post review comments → verify → cleanup.
Important: Review must happen BEFORE create-pr. The state machine does not allow
pr_created → reviewing.
TASK=$(curl -s -X POST $BASE/api/v1/tasks \
-H "$AUTH" -H "Content-Type: application/json" \
-d "{
\"repo_url\": \"$REPO\",
\"provider_key\": \"$PROVIDER_KEY\",
\"prompt\": \"Add a file CODEFORGE_TEST.md with: # E2E Test 2\",
\"config\": {\"cli\": \"claude-code\", \"timeout_seconds\": 300}
}")
TASK_ID=$(echo $TASK | jq -r .id)Wait for completed.
curl -s -X POST "$BASE/api/v1/tasks/$TASK_ID/review" \
-H "$AUTH" -H "Content-Type: application/json" \
-d '{"cli": "codex"}' | jq .Expected: verdict, score, summary, reviewed_by: "codex:...".
Verify review_result on task:
curl -s -H "$AUTH" "$BASE/api/v1/tasks/$TASK_ID" | jq .review_resultcurl -s -X POST "$BASE/api/v1/tasks/$TASK_ID/create-pr" \
-H "$AUTH" -H "Content-Type: application/json" -d '{}' | jq .curl -s -X POST "$BASE/api/v1/tasks/$TASK_ID/post-review" \
-H "$AUTH" -H "Content-Type: application/json" -d '{}' | jq .Expected: review_url (GitHub URL), comments_posted (number), pr_number.
# Check review exists on PR
gh pr view <PR_NUMBER> --repo $GH_REPO --json reviews
# Cleanup
gh pr close <PR_NUMBER> --repo $GH_REPO --delete-branchExpected: Review comment with CodeForge summary posted on PR.
Reversed CLI combination: Codex writes code, Claude Code reviews.
TASK=$(curl -s -X POST $BASE/api/v1/tasks \
-H "$AUTH" -H "Content-Type: application/json" \
-d "{
\"repo_url\": \"$REPO\",
\"provider_key\": \"$PROVIDER_KEY\",
\"prompt\": \"Create CODEX_TEST.md with: # Codex Test\",
\"config\": {\"cli\": \"codex\", \"timeout_seconds\": 300}
}")
TASK_ID=$(echo $TASK | jq -r .id)Wait for completed.
curl -s -X POST "$BASE/api/v1/tasks/$TASK_ID/review" \
-H "$AUTH" -H "Content-Type: application/json" \
-d '{"cli": "claude-code"}' | jq .Expected: verdict, score, reviewed_by: "claude-code:...".
Same as Test 2 steps 3-5.
Known limitation: If Claude Code gives verdict
approve,post-reviewwill fail on GitHub with "Can not approve your own pull request" when the same token owns the PR. Workaround: use a different token for review posting, or acceptCOMMENTverdict.
Verify that canceling a running/cloning task transitions it to failed.
# Create a long-running task
TASK=$(curl -s -X POST $BASE/api/v1/tasks \
-H "$AUTH" -H "Content-Type: application/json" \
-d "{
\"repo_url\": \"$REPO\",
\"provider_key\": \"$PROVIDER_KEY\",
\"prompt\": \"Write a detailed analysis of every file in the repo\",
\"config\": {\"cli\": \"claude-code\", \"timeout_seconds\": 300}
}")
TASK_ID=$(echo $TASK | jq -r .id)
# Wait until cloning or running
# (poll status until cloning/running)
# Cancel
curl -s -X POST "$BASE/api/v1/tasks/$TASK_ID/cancel" -H "$AUTH" | jq .Expected: Response: status: canceling. After a few seconds, task status becomes failed with error canceled by user.
curl -s -X POST $BASE/api/v1/tasks \
-H "$AUTH" -H "Content-Type: application/json" \
-d '{"repo_url": "https://github.com/freema/fb-pilot.git", "prompt": "Review", "task_type": "pr_review"}' | jq .Expected: 400 with fields.pr_number: "pr_number is required for pr_review tasks".
curl -s -H "$AUTH" "$BASE/api/v1/tasks/nonexistent-id" | jq .Expected: 404 with "task nonexistent-id not found".
curl -s -X POST $BASE/api/v1/tasks \
-H "$AUTH" -H "Content-Type: application/json" \
-d '{"repo_url": "not-a-url", "prompt": "test"}' | jq .Expected: 400 validation error.
curl -s -X POST $BASE/api/v1/tasks \
-H "$AUTH" -H "Content-Type: application/json" \
-d '{"repo_url": "https://github.com/freema/fb-pilot.git"}' | jq .Expected: 400 with fields.Prompt: "field is required".
curl -s $BASE/api/v1/tasks | jq .Expected: 401 "missing or invalid Bearer token".
curl -s -X POST "$BASE/api/v1/tasks/fake-id/review" \
-H "$AUTH" -H "Content-Type: application/json" -d '{}' | jq .Expected: 404 "task fake-id not found".
| Issue | Description |
|---|---|
| Review before PR | State machine requires review BEFORE create-pr. Order: task → review → create-pr → post-review |
| Approve own PR | GitHub API rejects APPROVE review on own PR. Use different token or expect COMMENT |
| Codex diff issue | Codex review may report HEAD~1 not available — review prompt could provide diff explicitly |
Record test runs here for tracking:
| Date | Tester | Tests Run | Pass | Fail | Bugs Found | Notes |
|---|---|---|---|---|---|---|
| 2026-03-12 | Claude Code | 1-5 | 4 | 0 | 2 fixed | nil comments in post-review, cancel stuck in cloning |