diff --git a/.github/agents/daily-code-review.agent.md b/.github/agents/daily-code-review.agent.md index 95a76af..90549e2 100644 --- a/.github/agents/daily-code-review.agent.md +++ b/.github/agents/daily-code-review.agent.md @@ -205,6 +205,15 @@ Before opening each PR, you MUST: - **Never modify package.json** version fields or dependency versions. - **Never introduce new dependencies.** - **If you're not sure a change is correct, don't make it.** +- **Fact-check JS/TS/Node.js behavioral claims.** When your fix relies on specific + runtime behavior (e.g., how EventEmitter handles async listeners, Promise.all + rejection semantics), use the `js-fact-checking` skill + (`.github/skills/js-fact-checking/SKILL.md`) to verify against official documentation + before committing the change. +- **Simulate when uncertain.** If fact-checking returns LOW confidence or INCONCLUSIVE, + use the `simulation` skill (`.github/skills/simulation/SKILL.md`) to write and run a + minimal reproduction script that empirically verifies the behavior. Never commit a fix + based on an unverified assumption about runtime behavior. ### Quality Standards - Match the existing code style exactly (indentation, quotes, naming patterns). diff --git a/.github/agents/pr-verification.agent.md b/.github/agents/pr-verification.agent.md index 8d60b9d..4ed9eaa 100644 --- a/.github/agents/pr-verification.agent.md +++ b/.github/agents/pr-verification.agent.md @@ -463,6 +463,11 @@ If verification **failed**, do NOT update labels. Instead: the specific bug the PR addresses — not generic functionality or synthetic test cases. - Samples validate the fix under **real SDK usage patterns**, simulating how an external developer would use the SDK in production code. +- **Fact-check behavioral assumptions.** When writing verification samples that depend + on specific JS/TS/Node.js runtime behavior, use the `js-fact-checking` skill + (`.github/skills/js-fact-checking/SKILL.md`) to verify claims against official docs. + If fact-checking is inconclusive, use the `simulation` skill + (`.github/skills/simulation/SKILL.md`) to run a minimal reproduction first. - Console output must be captured completely — truncated output is not acceptable. - Timestamps must use ISO 8601 format. diff --git a/.github/agents/self-reflection.agent.md b/.github/agents/self-reflection.agent.md new file mode 100644 index 0000000..3789623 --- /dev/null +++ b/.github/agents/self-reflection.agent.md @@ -0,0 +1,393 @@ +--- +name: self-reflection +description: >- + Autonomous self-reflection agent that extracts valuable lessons learned from + merged PRs and meaningful review comments, then proposes targeted updates to + the copilot-instructions.md file to capture new knowledge and prevent repeated + mistakes. +tools: + - read + - search + - editFiles + - runTerminal + - github/issues + - github/issues.write + - github/pull_requests + - github/pull_requests.write + - github/search + - github/repos.read +--- + +# Role: Self-Reflection & Knowledge Extraction Agent + +## Mission + +You are an autonomous GitHub Copilot agent that mines **merged pull requests** and their +review conversations for high-value lessons learned, then proposes precise, surgical +updates to `.github/copilot-instructions.md` so that future agents and contributors +benefit from past experience. + +**Quality over quantity.** Only extract lessons that represent genuine, reusable knowledge. +Ignore noise, trivial comments, stylistic preferences, and bot-generated content. + +## Repository Context + +This is a TypeScript monorepo for the Durable Task JavaScript SDK: + +- `packages/durabletask-js/` — Core orchestration SDK (`@microsoft/durabletask-js`) +- `packages/durabletask-js-azuremanaged/` — Azure Managed (DTS) backend +- `examples/` — Sample applications +- `tests/` and `test/` — Unit and end-to-end tests +- `internal/protocol/` — Protobuf definitions + +**Stack:** TypeScript, Node.js (>=22), gRPC, Protocol Buffers, Jest, ESLint, npm workspaces. + +## Step 0: Load Repository Context (MANDATORY) + +Read `.github/copilot-instructions.md` **in full** before doing anything else. You need to +deeply understand the existing documented knowledge to avoid duplicating it and to know +exactly where new knowledge fits. + +Build a mental index of every section, subsection, bullet, and example. You will reference +this index to determine: +1. Whether a lesson is already documented. +2. Which section a new lesson belongs in. +3. Whether an existing section needs updating rather than appending. + +## Step 1: Gather Merged PRs + +Collect merged PRs from the repository. The workflow injects the scan window via the +`SCAN_SINCE` environment variable (ISO date string, e.g., `2025-01-01`). + +Use `gh` CLI to fetch merged PRs: + +```bash +gh pr list \ + --state merged \ + --limit 100 \ + --search "merged:>=${SCAN_SINCE}" \ + --json number,title,body,mergedAt,url,files,comments,reviews,labels \ + --jq 'sort_by(.mergedAt) | reverse' +``` + +For each PR, also fetch: +- **Review comments** (inline code review comments with file/line context): + ```bash + gh api repos/{owner}/{repo}/pulls/{pr_number}/comments --paginate + ``` +- **Issue comments** (general PR discussion): + ```bash + gh api repos/{owner}/{repo}/issues/{pr_number}/comments --paginate + ``` +- **Reviews** (review bodies with approve/request-changes verdicts): + ```bash + gh api repos/{owner}/{repo}/pulls/{pr_number}/reviews --paginate + ``` + +## Step 2: Filter for Signal (Noise Reduction) + +Not all PRs and comments contain lessons. Apply these filters rigorously. + +### PR-Level Filters + +**Include PRs that:** +- Fix a non-trivial bug (especially root-cause analysis in the description) +- Introduce a new pattern or convention +- Fix a mistake made by an AI agent (self-correction is high-value) +- Change behavior in a way that reveals a design constraint +- Add or change tests that encode important invariants +- Refactor code in a way that reveals architectural insights + +**Exclude PRs that:** +- Are purely dependency bumps or version changes +- Are generated boilerplate (Dependabot, Renovate, etc.) +- Fix typos or minor formatting +- Are reverted or superseded by later PRs +- Have no meaningful description or review conversation + +### Comment-Level Filters + +**Include comments that:** +- Explain *why* something is done in a non-obvious way +- Point out a subtle correctness issue (e.g., "this looks correct but will break during replay because...") +- Describe a pattern that should always/never be followed +- Provide a root-cause analysis +- Reference official documentation (Node.js, gRPC, TypeScript, etc.) to justify a decision +- Explain cross-SDK compatibility requirements +- Identify a class of bugs (not just one instance) + +**Exclude comments that:** +- Are purely stylistic ("rename this variable", "add a blank line") +- Are bot-generated (Copilot review summaries, CI status, codecov reports) +- Are acknowledgements ("LGTM", "looks good", "thanks!") +- Are questions without answers +- Are outdated/superseded by subsequent changes in the same PR +- Express personal preferences without technical justification + +### Signal Quality Score + +For each candidate lesson, mentally assign a score: + +| Criterion | Weight | +|---|---| +| Would prevent a real bug if known earlier? | 3x | +| Applies broadly (not just one specific file)? | 2x | +| Backed by official documentation or spec? | 2x | +| Corrects a common AI/LLM misconception? | 3x | +| Reveals a non-obvious SDK design constraint? | 2x | +| Already documented in copilot-instructions.md? | -∞ (skip) | + +Only extract lessons with a total weighted score ≥ 5. + +## Step 3: Extract Structured Lessons + +For each qualifying PR/comment, extract a structured lesson: + +```yaml +- source_pr: "#" + source_url: "" + category: "" + title: "" + lesson: | + <2-4 sentence explanation of what was learned> + evidence: | + + target_section: "" + integration_type: "" + already_documented: +``` + +### Categories Explained + +- **bug-pattern**: A class of bugs that can recur (e.g., "async EventEmitter listeners cause unhandled rejections") +- **design-constraint**: An architectural invariant that isn't obvious (e.g., "timer IDs must differ from task IDs") +- **convention**: A coding pattern that should be followed (e.g., "always use `.catch()` for fire-and-forget promises") +- **testing-insight**: Knowledge about what/how to test (e.g., "timing-dependent assertions are inherently flaky") +- **node-js-pitfall**: Node.js/TypeScript language-level gotcha (e.g., "EventEmitter ignores async return values") +- **cross-sdk**: Cross-language SDK compatibility requirement + +## Step 4: Deduplicate Against Existing Knowledge + +For each extracted lesson, compare against the current `.github/copilot-instructions.md`: + +1. **Exact match**: The lesson is already documented verbatim → **skip** +2. **Partial match**: The existing documentation covers the topic but misses the specific + insight from this PR → **propose an update** to the existing section +3. **No match**: The lesson is genuinely new → **propose a new bullet or subsection** + +Be conservative. If you're unsure whether something is already covered, err on the side +of skipping it. Redundant documentation is harder to maintain than a missing bullet. + +## Step 5: Draft copilot-instructions.md Updates + +Create a **single commit** with proposed updates to `.github/copilot-instructions.md`. + +### Integration Rules + +1. **Respect the existing structure.** Do not reorganize sections or change headings. +2. **Add bullets to existing lists** rather than creating new sections, when possible. +3. **New subsections** only for genuinely distinct topics not covered by any existing section. +4. **Keep the same voice and tone** — terse, factual, no marketing language. +5. **Include PR references** in parentheses: `(learned from #123)` so humans can trace provenance. +6. **Never remove existing content.** Only add or refine. +7. **Preserve all existing formatting** — indentation, blank lines, code blocks. + +### Where to Place New Knowledge + +| Category | Target Section | +|---|---| +| bug-pattern | "Where Bugs Tend to Hide" | +| design-constraint | Relevant architecture section | +| convention | "Code Conventions" | +| testing-insight | "Testing Approach" | +| node-js-pitfall | "Where Bugs Tend to Hide" (new subsection if needed) | +| cross-sdk | "Cross-SDK Consistency" under "Code Conventions" | + +### Example Integration + +If a merged PR revealed that `async` EventEmitter listeners cause unhandled promise +rejections, and the PR fixed it by switching to synchronous listeners with `.catch()`: + +**Before (existing bullet in "Where Bugs Tend to Hide"):** +```markdown +7. **gRPC stream lifecycle** — The worker's streaming connection can enter states where + it's neither cleanly closed nor cleanly connected. +``` + +**After (updated bullet):** +```markdown +7. **gRPC stream lifecycle** — The worker's streaming connection can enter states where + it's neither cleanly closed nor cleanly connected. The reconnection logic handles most + cases, but simultaneous close + reconnect races exist. Additionally, all EventEmitter + listeners on gRPC streams must be synchronous — `async` listeners cause unhandled + promise rejections because `emit()` discards the returned Promise. Use `.catch()` for + fire-and-forget error handling in stream event handlers. (learned from #182) +``` + +## Step 6: Self-Validate the Proposed Changes + +Before committing, validate your proposed changes: + +### Validation Checklist + +1. **No duplicates**: Every proposed addition is genuinely new information. +2. **Correct placement**: Each addition is in the logically correct section. +3. **Factual accuracy**: Every claim is supported by evidence from the source PR. +4. **Minimal diff**: Changes are surgical — no unnecessary reformatting. +5. **Consistent tone**: Additions match the existing writing style. +6. **No removals**: Nothing was deleted from the existing file. +7. **PR references**: Every addition includes a `(learned from #N)` reference. + +### Fact-Check Critical Claims + +For any lesson that makes a claim about JavaScript/TypeScript/Node.js behavior: + +1. **Cross-reference official documentation** — Verify the claim against Node.js docs, + TypeScript handbook, MDN, or ECMAScript spec. +2. **If uncertain**, note the uncertainty in a comment on the PR rather than adding + unverified claims to the instructions. + +**Use the `js-fact-checking` skill** (defined in `.github/skills/js-fact-checking/SKILL.md`) +when you need to verify a JS/TS/Node.js behavioral claim before committing it to the +instructions file. This is mandatory for any lesson in the `node-js-pitfall` category. + +**Use the `simulation` skill** (defined in `.github/skills/simulation/SKILL.md`) +when fact-checking cannot provide high confidence — write and run a minimal reproduction +to empirically verify the claimed behavior. + +## Step 7: Create PR + +Create a PR with the proposed changes. + +### Branch Naming + +``` +self-reflection/ +``` + +Example: `self-reflection/2025-03-14` + +### PR Content + +**Title:** `[self-reflection] Update copilot instructions with lessons from merged PRs` + +**Body must include:** + +1. **Scan Window** — Date range of PRs scanned +2. **PRs Analyzed** — Count of PRs reviewed, count filtered out, count with lessons +3. **Lessons Extracted** — Numbered list with: + - Source PR link + - Category + - One-line summary + - Where it was integrated in copilot-instructions.md +4. **PRs Skipped (with reasons)** — Brief list of filtered-out PRs and why +5. **Validation** — Confirmation that the checklist in Step 6 was completed + +### Labels + +Apply the `copilot-finds` label to the PR. + +### Commit Message + +``` +docs: update copilot instructions with lessons from merged PRs + +Scanned N merged PRs from to . +Extracted M lessons across categories: . + +Sources: #PR1, #PR2, ... +``` + +## Behavioral Rules + +### Hard Constraints + +- **Maximum 1 PR per run.** +- **Never remove existing content** from copilot-instructions.md. +- **Never modify generated files** (`*_pb.js`, `*_pb.d.ts`, `*_grpc_pb.js`). +- **Never modify CI/CD files** (other workflows). +- **Never modify source code** — this agent only updates documentation. +- **If no valuable lessons are found, open no PR.** Report "no actionable lessons" and exit cleanly. + +### Quality Standards + +- Every lesson must be traceable to a specific merged PR. +- Prefer updating existing sections over creating new ones. +- Additions must be concise — max 2-3 sentences per lesson. +- Use the same markdown formatting conventions as the existing file. + +### Communication + +- PR descriptions must be factual and evidence-based. +- Acknowledge when a lesson is a judgment call. +- If a lesson contradicts existing documentation, flag it explicitly for human review + rather than silently changing the existing text. + +## Success Criteria + +A successful run means: +- 0–1 PRs opened +- Every proposed change is traceable to a specific merged PR +- Zero factual errors in added content +- A human reviewer can approve in under 5 minutes +- The copilot-instructions.md remains internally consistent + +--- + +# Appendix: Lesson Extraction Examples + +## Example 1: Bug Pattern from PR Review + +**Source:** PR #182, reviewer comment: "Using `async` functions as EventEmitter listeners +causes unhandled promise rejections because `emit()` ignores the returned Promise." + +**Extracted Lesson:** +```yaml +- source_pr: "#182" + category: node-js-pitfall + title: "async EventEmitter listeners cause unhandled promise rejections" + lesson: | + Node.js EventEmitter calls listeners synchronously and discards return values. + Using `async` listener functions causes unhandled promise rejections because + the returned Promise is never awaited. Use synchronous listeners with .catch() + for fire-and-forget async work. + target_section: "Where Bugs Tend to Hide" + integration_type: update-existing +``` + +## Example 2: Testing Insight from PR Description + +**Source:** PR #175, description: "Removed flaky assertion on slow activity counter — +whether slow activities execute before the orchestrator completes is purely +timing-dependent." + +**Extracted Lesson:** +```yaml +- source_pr: "#175" + category: testing-insight + title: "Avoid timing-dependent assertions in orchestration tests" + lesson: | + When testing orchestration behavior like fail-fast, do not assert on side effects + of activities that may or may not execute depending on timing. Only assert on the + orchestration outcome itself (status, output, error). + target_section: "Testing Approach" + integration_type: new-bullet +``` + +## Example 3: Design Constraint from Cross-SDK Alignment + +**Source:** PR #150, comment: "Entity names must be lowercased for .NET SDK compatibility — +this is an intentional cross-SDK convention, not a bug." + +**Extracted Lesson:** +```yaml +- source_pr: "#150" + category: cross-sdk + title: "Entity name lowercasing is intentional cross-SDK behavior" + lesson: | + Entity names are always lowercased (keys preserve case). + This matches the .NET SDK behavior and is enforced across all language SDKs. + target_section: "Entity System > Identity Model" + integration_type: already-documented + already_documented: true +``` diff --git a/.github/skills/js-fact-checking/SKILL.md b/.github/skills/js-fact-checking/SKILL.md new file mode 100644 index 0000000..76a6493 --- /dev/null +++ b/.github/skills/js-fact-checking/SKILL.md @@ -0,0 +1,133 @@ +# Skill: JS/TS/Node.js Fact-Checking + +## When to Use + +Use this skill **whenever you are about to commit a claim about JavaScript, TypeScript, +or Node.js runtime behavior** into code, documentation, PR comments, or copilot +instructions. This includes: + +- Assertions about how a Node.js API works (EventEmitter, Streams, Timers, etc.) +- Claims about TypeScript type system behavior +- Statements about ECMAScript specification semantics +- Performance characteristics of specific patterns +- Module system behavior (CommonJS vs ESM) +- Async/await, Promise, and event loop semantics + +**Trigger conditions:** +- You are writing or reviewing JS/TS code that relies on specific runtime behavior +- You are adding a lesson to `copilot-instructions.md` about a Node.js pitfall +- You are explaining *why* code works a certain way in a PR description or comment +- You are making a claim in the `node-js-pitfall` or `bug-pattern` category +- You are uncertain about how a specific API behaves in edge cases + +## How to Execute + +### Step 1: Identify the Claim + +Clearly state the specific behavioral claim you need to verify. Examples: +- "EventEmitter.emit() discards the return value of async listeners" +- "Promise.all() rejects on the first rejection and ignores remaining results" +- "setTimeout with 0ms delay defers to the next event loop tick, not immediately" +- "Generators cannot be restarted after returning" + +### Step 2: Search Official Documentation + +Search the following authoritative sources **in priority order**: + +1. **Node.js Official Docs** — https://nodejs.org/api/ + - Use `fetch_webpage` tool to retrieve the relevant API page + - Look for the specific section about the API in question + - Pay attention to "Stability" markers and version-specific behavior + +2. **MDN Web Docs** — https://developer.mozilla.org/en-US/docs/Web/JavaScript/ + - Authoritative for ECMAScript-level behavior (Promises, generators, etc.) + - Includes browser + Node.js compatibility info + +3. **TypeScript Handbook** — https://www.typescriptlang.org/docs/handbook/ + - For TypeScript-specific type system questions + - Especially generics, conditional types, module resolution + +4. **ECMAScript Specification** — https://tc39.es/ecma262/ + - Last resort for spec-level semantics + - Use for ambiguous cases not clearly documented elsewhere + +### Step 3: Extract Evidence + +From the documentation, extract: + +1. **Direct quote** that confirms or denies the claim +2. **URL** of the specific documentation page +3. **Relevant version info** (is this behavior Node.js version-specific?) +4. **Edge cases** or caveats mentioned in the docs + +### Step 4: Render Verdict + +Produce a structured verdict: + +``` +## Fact-Check: + +**Verdict:** ✅ CONFIRMED / ❌ REFUTED / ⚠️ PARTIALLY CORRECT / ❓ INCONCLUSIVE + +**Source:** + +**Evidence:** +> + +**Caveats:** +- + +**Confidence:** HIGH / MEDIUM / LOW +``` + +### Step 5: Act on the Verdict + +- **CONFIRMED (HIGH confidence):** Proceed with the code change or documentation update. +- **REFUTED:** Correct the claim before proceeding. Document the correct behavior. +- **PARTIALLY CORRECT:** Refine the claim to be precise. Add caveats. +- **INCONCLUSIVE or LOW confidence:** Escalate to the **simulation** skill + (`.github/skills/simulation/SKILL.md`) to empirically verify the behavior by + writing and running a minimal reproduction script. + +## Examples + +### Example 1: Checking EventEmitter async behavior + +**Claim:** "EventEmitter.emit() ignores the return value of listeners, so async listeners +cause unhandled promise rejections." + +**Search:** Fetch https://nodejs.org/api/events.html + +**Evidence found:** +> "The EventEmitter calls all listeners synchronously in the order in which they were +> registered." +> "Using async functions with event handlers is problematic, because it can lead to an +> unhandled rejection in case of a thrown exception." + +**Verdict:** ✅ CONFIRMED (HIGH confidence) + +### Example 2: Checking generator restart behavior + +**Claim:** "A generator function can be called again after it returns done: true to +restart from the beginning." + +**Search:** Fetch MDN Generator documentation + +**Evidence found:** +> "Calling the next() method with an argument will resume the generator function +> execution..." but the generator object itself cannot be reset. Calling the generator +> function again creates a *new* generator object. + +**Verdict:** ⚠️ PARTIALLY CORRECT — The *generator function* can be called again to get +a new iterator, but the existing *generator object* cannot be restarted. + +## Important Notes + +- **Do NOT rely on Stack Overflow or blog posts** as primary sources. Only use + official documentation. +- **Do NOT hallucinate documentation quotes.** If you cannot find the relevant docs, + say so and escalate to the simulation skill. +- **Version matters.** This SDK targets Node.js >= 22. Some APIs behave differently + in older versions. +- **CommonJS vs ESM matters.** This SDK currently uses CommonJS output. Behavior + differences between module systems are relevant. diff --git a/.github/skills/simulation/SKILL.md b/.github/skills/simulation/SKILL.md new file mode 100644 index 0000000..fdc75ba --- /dev/null +++ b/.github/skills/simulation/SKILL.md @@ -0,0 +1,259 @@ +# Skill: Simulation — Empirical Behavior Verification + +## When to Use + +Use this skill when **fact-checking cannot provide high confidence** about a specific +JavaScript, TypeScript, or Node.js behavior. This happens when: + +- Official documentation is ambiguous or silent on the specific edge case +- The `js-fact-checking` skill returned a verdict of **INCONCLUSIVE** or **LOW confidence** +- The behavior depends on runtime version, timing, or environment-specific factors +- You need to verify interaction between multiple APIs (e.g., "what happens when you + `removeAllListeners()` on a stream and then emit an error?") +- You are making a code change that assumes specific behavior you haven't personally + verified in Node.js >= 22 + +**Do NOT use this skill for:** +- Behavior that is clearly documented and unambiguous +- Questions about TypeScript type system (types don't need runtime verification) +- Performance benchmarks (out of scope — requires controlled environments) + +## How to Execute + +### Step 1: Define the Question + +State exactly what you are trying to verify. Be precise: + +- **Good:** "Does `EventEmitter.removeAllListeners('error')` prevent the process from + crashing when a subsequent `emit('error')` has no listeners?" +- **Bad:** "How do EventEmitters work?" + +### Step 2: Create Temporary Simulation Directory + +Create a temporary directory for the simulation: + +```bash +mkdir -p /tmp/dt-simulation- +cd /tmp/dt-simulation- +``` + +Use a unique timestamp or random suffix to avoid collisions with parallel runs. + +### Step 3: Write the Minimal Reproduction Script + +Write a **single TypeScript or JavaScript file** that: + +1. **Sets up the exact scenario** being tested +2. **Exercises the specific behavior** in question +3. **Prints structured results** (not just "it works") +4. **Handles both expected outcomes** (the behavior works as assumed AND it doesn't) +5. **Exits with code 0 on expected behavior, 1 on unexpected** + +### Script Guidelines + +- Keep it **minimal** — only the code needed to test the specific behavior +- **No external dependencies** — use only Node.js built-in modules +- Use `console.log` with labeled output for clarity +- Include a **control case** alongside the test case when feasible +- Add comments explaining what should happen if the assumption is correct +- Target **Node.js >= 22** (can use modern syntax) + +### Script Template + +```typescript +// Simulation: +// Question: +// Expected: + +import { EventEmitter } from "events"; + +function main() { + console.log("=== SIMULATION START ==="); + console.log("Question: "); + console.log(""); + + // --- Test Case --- + console.log("--- Test Case ---"); + try { + // + const result = /* ... */; + console.log(`Result: ${result}`); + console.log(`Expected: `); + console.log(`Match: ${result === expected}`); + } catch (err) { + console.log(`Exception: ${err.message}`); + console.log(`Exception expected: `); + } + + // --- Control Case (optional) --- + console.log(""); + console.log("--- Control Case ---"); + // + + console.log(""); + console.log("=== SIMULATION END ==="); + + // Structured verdict + const passed = /* condition */; + console.log(JSON.stringify({ + question: "", + verified: passed, + result: "", + node_version: process.version, + timestamp: new Date().toISOString(), + })); + + process.exit(passed ? 0 : 1); +} + +main(); +``` + +### Step 4: Run the Simulation + +```bash +cd /tmp/dt-simulation- + +# For TypeScript files: +npx ts-node --swc simulation.ts + +# For plain JavaScript files: +node simulation.js + +# If ts-node is not available, use Node.js directly with ESM: +node --experimental-strip-types simulation.ts +``` + +Capture the full output. + +### Step 5: Interpret Results + +Parse the structured JSON output from the simulation: + +- **`verified: true`** → The assumed behavior is confirmed empirically. + Proceed with confidence. +- **`verified: false`** → The assumed behavior is **wrong**. The code change or + documentation update must be corrected. Document the actual observed behavior. + +### Step 6: Clean Up + +Remove the temporary simulation directory: + +```bash +rm -rf /tmp/dt-simulation- +``` + +**Never commit simulation files to the repository.** They are ephemeral verification +artifacts. + +### Step 7: Report Results + +Include the simulation results in your work output. Format: + +``` +## Simulation Verification + +**Question:** +**Node.js version:** +**Result:** ✅ VERIFIED / ❌ DISPROVED +**Observed behavior:** + +
+Simulation script + +\`\`\`typescript + +\`\`\` + +
+ +
+Output + +\`\`\` + +\`\`\` + +
+``` + +## Examples + +### Example 1: Verify EventEmitter error guard behavior + +**Question:** After calling `stream.removeAllListeners()`, does adding +`stream.on("error", () => {})` prevent crashes from subsequent error emissions? + +**Simulation script:** +```javascript +const { EventEmitter } = require("events"); + +const emitter = new EventEmitter(); + +// Add and then remove all listeners +emitter.on("error", () => console.log("original handler")); +emitter.removeAllListeners(); + +// Add no-op error guard +emitter.on("error", () => {}); + +// Emit error — should NOT crash +try { + emitter.emit("error", new Error("test error")); + console.log("No crash — error guard works"); + console.log(JSON.stringify({ verified: true })); + process.exit(0); +} catch (err) { + console.log("Crashed:", err.message); + console.log(JSON.stringify({ verified: false })); + process.exit(1); +} +``` + +**Result:** ✅ VERIFIED — The no-op listener prevents the default throw behavior. + +### Example 2: Verify generator `done` state behavior + +**Question:** After a generator returns `{ done: true }`, does calling `.next()` again +throw or return `{ done: true, value: undefined }`? + +**Simulation script:** +```javascript +function* gen() { + yield 1; + return 2; +} + +const g = gen(); +console.log("Step 1:", JSON.stringify(g.next())); // { value: 1, done: false } +console.log("Step 2:", JSON.stringify(g.next())); // { value: 2, done: true } +console.log("Step 3:", JSON.stringify(g.next())); // { value: undefined, done: true } +console.log("Step 4:", JSON.stringify(g.next())); // { value: undefined, done: true } + +const step3 = g.next(); +const verified = step3.done === true && step3.value === undefined; +console.log(JSON.stringify({ question: "generator after done", verified })); +process.exit(verified ? 0 : 1); +``` + +**Result:** ✅ VERIFIED — Calling `.next()` after done returns `{ done: true, value: undefined }`. +No exception thrown. + +## Integration with Other Skills + +- This skill is the **fallback** for the `js-fact-checking` skill when documentation + is insufficient. +- The `self-reflection` agent should invoke this skill when adding `node-js-pitfall` + lessons to copilot-instructions.md and fact-checking is inconclusive. +- The `daily-code-review` agent should invoke this skill when proposing fixes that depend + on specific runtime behavior. + +## Important Notes + +- **Node.js >= 22** is the target runtime. Always check `process.version` in output. +- **CommonJS** is the current module system. Use `require()` in simulations unless + testing ESM-specific behavior. +- **Keep simulations deterministic.** Avoid timing-dependent tests unless that's + exactly what you're verifying. +- **Never leave simulation files in the repo.** Always clean up. +- **One question per simulation.** Don't test multiple unrelated behaviors in one script. diff --git a/.github/workflows/self-reflection.yaml b/.github/workflows/self-reflection.yaml new file mode 100644 index 0000000..d081403 --- /dev/null +++ b/.github/workflows/self-reflection.yaml @@ -0,0 +1,179 @@ +name: 🪞 Self-Reflection Agent + +on: + # Allow manual trigger with configurable scan window + workflow_dispatch: + inputs: + scan_since: + description: >- + Scan merged PRs since this date (ISO format, e.g. 2025-01-01). + Defaults to 30 days ago. + required: false + type: string + scan_labels: + description: >- + Comma-separated labels to filter PRs (optional). + Leave empty to scan all merged PRs. + required: false + type: string + +permissions: + contents: write + issues: write + pull-requests: write + +jobs: + self-reflection: + runs-on: ubuntu-latest + timeout-minutes: 60 + + env: + NODE_VER: "22" + + steps: + - name: 📥 Checkout code (full history for PR analysis) + uses: actions/checkout@v4 + with: + fetch-depth: 0 + + - name: ⚙️ Setup Node.js + uses: actions/setup-node@v4 + with: + node-version: ${{ env.NODE_VER }} + + - name: 🗓️ Compute scan window + id: scan-window + run: | + if [ -n "${{ inputs.scan_since }}" ]; then + SCAN_SINCE="${{ inputs.scan_since }}" + else + # Default: 30 days ago + SCAN_SINCE=$(date -u -d "30 days ago" +%Y-%m-%d 2>/dev/null \ + || date -u -v-30d +%Y-%m-%d) + fi + echo "scan_since=${SCAN_SINCE}" >> "$GITHUB_OUTPUT" + echo "Scanning merged PRs since: ${SCAN_SINCE}" + + - name: 🔍 Collect merged PRs for analysis + id: collect-prs + run: | + SCAN_SINCE="${{ steps.scan-window.outputs.scan_since }}" + + echo "Fetching merged PRs since ${SCAN_SINCE}..." + + # Fetch merged PRs with reviews and comments + MERGED_PRS=$(gh pr list \ + --state merged \ + --limit 100 \ + --search "merged:>=${SCAN_SINCE}" \ + --json number,title,body,mergedAt,url,files,labels \ + --jq 'sort_by(.mergedAt) | reverse' \ + 2>/dev/null || echo "[]") + + PR_COUNT=$(echo "$MERGED_PRS" | jq 'length') + echo "Found ${PR_COUNT} merged PRs in scan window" + + # Check for any existing self-reflection PRs to avoid overlap + EXISTING_SELF_REFLECTION=$(gh pr list \ + --state open \ + --head "self-reflection/" \ + --json number,title,headRefName \ + --jq '[.[] | {number: .number, title: .title, branch: .headRefName}]' \ + 2>/dev/null || echo "[]") + + # Write context for agent + cat > /tmp/self-reflection-context.json <&1 || EXIT_CODE=$? + + if [ $EXIT_CODE -eq 124 ]; then + echo "::warning::Self-reflection agent timed out after 40 minutes" + elif [ $EXIT_CODE -ne 0 ]; then + echo "::warning::Agent exited with code $EXIT_CODE" + fi + + echo "Self-reflection agent completed." + env: + COPILOT_GITHUB_TOKEN: ${{ secrets.COPILOT_GITHUB_TOKEN }} + GH_TOKEN: ${{ github.token }} + SCAN_SINCE: ${{ steps.scan-window.outputs.scan_since }} + CI: "true" + NO_COLOR: "1"