Forge production-grade AI agents.
28 battle-tested skills. 6 specialist personas. 5 reference checklists. One mission: make AI agents build software like senior engineers.
AI coding agents are fast. They're also reckless.
They skip specs. They "forget" tests. They ship without review. They treat "it works on my machine" as a success criteria. In short, they build prototypes, not production software.
AgentForge fixes this.
We don't give agents vague suggestions. We give them structured, battle-tested workflows that encode how senior engineers actually build software — the same workflows that power teams at Google, Netflix, and Stripe. Every skill has steps, checkpoints, anti-rationalization defenses, and evidence-based verification. When an agent follows these, it ships code you can trust.
DEFINE PLAN BUILD VERIFY REVIEW SHIP OPS
┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐
│ Idea │ ───▶ │ Spec │ ───▶ │ Code │ ───▶ │ Test │ ───▶ │ QA │ ───▶ │ Go │ ───▶ │ Run │
│Refine│ │ PRD │ │ Impl │ │Debug │ │ Gate │ │ Live │ │ Ops │
└──────┘ └──────┘ └──────┘ └──────┘ └──────┘ └──────┘ └──────┘
/spec /plan /build /test /review /ship
7 slash commands. 28 skills. 6 phases. Zero excuses.
| Other Prompt Packs | AgentForge | |
|---|---|---|
| Structure | Vague advice | Step-by-step workflows with checkpoints |
| Verification | "Make sure it works" | Evidence-based exit criteria (tests, builds, runtime data) |
| Anti-cheating | None | Rationalization tables that call out excuses agents use to skip steps |
| Scope | Generic coding tips | Full lifecycle: spec → plan → build → verify → review → ship → ops |
| Quality gates | None | Built-in CI pipeline with 8 automated checks |
| Cross-reference | Silos | Every skill references related skills; no duplication |
7 slash commands that map to the development lifecycle. Each one activates the right skills automatically.
| What you're doing | Command | Key principle |
|---|---|---|
| Define what to build | /spec |
Spec before code |
| Plan how to build it | /plan |
Small, atomic tasks |
| Build incrementally | /build |
One slice at a time |
| Prove it works | /test |
Tests are proof |
| Review before merge | /review |
Improve code health |
| Simplify the code | /code-simplify |
Clarity over cleverness |
| Ship to production | /ship |
Faster is safer |
Want fewer manual steps once the spec exists? /build auto generates the plan and implements every task in a single approved pass — you approve the plan once, then it runs autonomously. It removes the human stepping between tasks, not the verification: every task is still test-driven and committed individually, and it pauses on failures or risky steps.
Skills also activate automatically based on what you're doing — designing an API triggers api-and-interface-design, building UI triggers frontend-ui-engineering, and so on.
Claude Code (recommended)
Marketplace install:
/plugin marketplace add borhen68/SkillEngine
/plugin install agentforge@borhen-agentforge
SSH errors? The marketplace clones repos via SSH. If you don't have SSH keys set up on GitHub, either add your SSH key or use the full HTTPS URL to force the HTTPS cloning:
/plugin marketplace add https://github.com/borhen68/SkillEngine.git /plugin install agentforge@borhen-agentforge
Local / development:
git clone https://github.com/borhen68/SkillEngine.git
claude --plugin-dir /path/to/agentforgeCursor
Copy any SKILL.md into .cursor/rules/, or reference the full skills/ directory. See docs/cursor-setup.md.
Antigravity CLI
Install as a native plugin for skills, subagents, and slash commands. See docs/antigravity-setup.md.
Install from the repo:
agy plugin install https://github.com/borhen68/SkillEngine.gitInstall from a local clone:
git clone https://github.com/borhen68/SkillEngine.git
agy plugin install ./agentforgeGemini CLI
Install as native skills for auto-discovery, or add to GEMINI.md for persistent context. See docs/gemini-cli-setup.md.
Install from the repo:
gemini skills install https://github.com/borhen68/SkillEngine.git --path skillsInstall from a local clone:
gemini skills install ./agentforge/skills/Windsurf
Add skill contents to your Windsurf rules configuration. See docs/windsurf-setup.md.
OpenCode
Uses agent-driven skill execution via AGENTS.md and the skill tool.
GitHub Copilot
Use agent definitions from agents/ as Copilot personas and skill content in .github/copilot-instructions.md. See docs/copilot-setup.md.
Kiro IDE & CLI
Skills for Kiro reside under ".kiro/skills/" and can be stored under Project or Global level. Kiro also supports Agents.md. See Kiro docs at https://kiro.dev/docs/skills/Codex / Other Agents
Skills are plain Markdown - they work with any agent that accepts system prompts or instruction files. See docs/getting-started.md.
The commands above are entry points. The pack includes 28 skills total — 24 lifecycle skills, 4 operations skills, plus the using-agentforge meta-skill. Each skill is a structured workflow with steps, verification gates, and anti-rationalization tables. You can also reference any skill directly.
| Skill | What It Does | Use When |
|---|---|---|
| using-agentforge | Maps incoming work to the right skill workflow and defines shared operating rules | Starting a session or deciding which skill applies |
| Skill | What It Does | Use When |
|---|---|---|
| interview-me | One-question-at-a-time interview that extracts what the user actually wants instead of what they think they should want, until ~95% confidence | The ask is underspecified, or the user invokes "interview me" / "grill me" |
| idea-refine | Structured divergent/convergent thinking to turn vague ideas into concrete proposals | You have a rough concept that needs exploration |
| spec-driven-development | Write a PRD covering objectives, commands, structure, code style, testing, and boundaries before any code | Starting a new project, feature, or significant change |
| Skill | What It Does | Use When |
|---|---|---|
| planning-and-task-breakdown | Decompose specs into small, verifiable tasks with acceptance criteria and dependency ordering | You have a spec and need implementable units |
| Skill | What It Does | Use When |
|---|---|---|
| incremental-implementation | Thin vertical slices - implement, test, verify, commit. Feature flags, safe defaults, rollback-friendly changes | Any change touching more than one file |
| test-driven-development | Red-Green-Refactor, test pyramid (80/15/5), test sizes, DAMP over DRY, Beyonce Rule, browser testing | Implementing logic, fixing bugs, or changing behavior |
| context-engineering | Feed agents the right information at the right time - rules files, context packing, MCP integrations | Starting a session, switching tasks, or when output quality drops |
| source-driven-development | Ground every framework decision in official documentation - verify, cite sources, flag what's unverified | You want authoritative, source-cited code for any framework or library |
| doubt-driven-development | Adversarial fresh-context review of every non-trivial decision in-flight - CLAIM → EXTRACT → DOUBT → RECONCILE → STOP, with optional user-authorized cross-model escalation | Stakes are high (production, security, irreversible), working in unfamiliar code, or a confident output is cheaper to verify now than to debug later |
| frontend-ui-engineering | Component architecture, design systems, state management, responsive design, WCAG 2.1 AA accessibility | Building or modifying user-facing interfaces |
| api-and-interface-design | Contract-first design, Hyrum's Law, One-Version Rule, error semantics, boundary validation | Designing APIs, module boundaries, or public interfaces |
| Skill | What It Does | Use When |
|---|---|---|
| browser-testing-with-devtools | Chrome DevTools MCP for live runtime data - DOM inspection, console logs, network traces, performance profiling | Building or debugging anything that runs in a browser |
| debugging-and-error-recovery | Five-step triage: reproduce, localize, reduce, fix, guard. Stop-the-line rule, safe fallbacks | Tests fail, builds break, or behavior is unexpected |
| Skill | What It Does | Use When |
|---|---|---|
| code-review-and-quality | Five-axis review, change sizing (~100 lines), severity labels (Nit/Optional/FYI), review speed norms, splitting strategies | Before merging any change |
| code-simplification | Chesterton's Fence, Rule of 500, reduce complexity while preserving exact behavior | Code works but is harder to read or maintain than it should be |
| security-and-hardening | OWASP Top 10 prevention, auth patterns, secrets management, dependency auditing, three-tier boundary system | Handling user input, auth, data storage, or external integrations |
| performance-optimization | Measure-first approach - Core Web Vitals targets, profiling workflows, bundle analysis, anti-pattern detection | Performance requirements exist or you suspect regressions |
| Skill | What It Does | Use When |
|---|---|---|
| git-workflow-and-versioning | Trunk-based development, atomic commits, change sizing (~100 lines), the commit-as-save-point pattern | Making any code change (always) |
| ci-cd-and-automation | Shift Left, Faster is Safer, feature flags, quality gate pipelines, failure feedback loops | Setting up or modifying build and deploy pipelines |
| deprecation-and-migration | Code-as-liability mindset, compulsory vs advisory deprecation, migration patterns, zombie code removal | Removing old systems, migrating users, or sunsetting features |
| documentation-and-adrs | Architecture Decision Records, API docs, inline documentation standards - document the why | Making architectural decisions, changing APIs, or shipping features |
| observability-and-instrumentation | Structured logging, RED metrics, OpenTelemetry tracing, symptom-based alerting - instrument as you build | Adding telemetry, or shipping anything that runs in production |
| shipping-and-launch | Pre-launch checklists, feature flag lifecycle, staged rollouts, rollback procedures, monitoring setup | Preparing to deploy to production |
| Skill | What It Does | Use When |
|---|---|---|
| chaos-engineering | Systematic fault injection and resilience testing | Designing for high availability, verifying disaster recovery |
| cost-optimization | Cloud spend reduction without sacrificing reliability | Bills growing unpredictably, rightsizing resources |
| data-engineering | Data pipelines, ETL/ELT, schema evolution, data quality | Building data pipelines, migrating schemas |
| ai-ops | ML model deployment, monitoring, drift detection, retraining | Deploying models, managing inference infrastructure |
Pre-configured specialist personas for targeted reviews:
| Agent | Role | Perspective |
|---|---|---|
| code-reviewer | Senior Staff Engineer | Five-axis code review with "would a staff engineer approve this?" standard |
| test-engineer | QA Specialist | Test strategy, coverage analysis, and the Prove-It pattern |
| security-auditor | Security Engineer | Vulnerability detection, threat modeling, OWASP assessment |
| web-performance-auditor | Web Performance Engineer | Core Web Vitals audit with Quick/Deep modes and a metric-honesty rule; run it via /webperf |
| site-reliability-engineer | Site Reliability Engineer | Availability, observability, capacity planning, and incident readiness audits |
Quick-reference material that skills pull in when needed:
| Reference | Covers |
|---|---|
| testing-patterns.md | Test structure, naming, mocking, React/API/E2E examples, anti-patterns |
| security-checklist.md | Pre-commit checks, auth, input validation, headers, CORS, OWASP Top 10 |
| performance-checklist.md | Core Web Vitals targets, frontend/backend checklists, measurement commands |
| accessibility-checklist.md | Keyboard nav, screen readers, visual design, ARIA, testing tools |
| reliability-checklist.md | Availability, observability, capacity, incident response, data integrity |
Every skill follows a consistent anatomy:
┌─────────────────────────────────────────────────┐
│ SKILL.md │
│ │
│ ┌─ Frontmatter ─────────────────────────────┐ │
│ │ name: lowercase-hyphen-name │ │
│ │ description: Guides agents through [task].│ │
│ │ Use when… │ │
│ └───────────────────────────────────────────┘ │
│ Overview → What this skill does │
│ When to Use → Triggering conditions │
│ Process → Step-by-step workflow │
│ Rationalizations → Excuses + rebuttals │
│ Red Flags → Signs something's wrong │
│ Verification → Evidence requirements │
└─────────────────────────────────────────────────┘
Key design choices:
- Process, not prose. Skills are workflows agents follow, not reference docs they read. Each has steps, checkpoints, and exit criteria.
- Anti-rationalization. Every skill includes a table of common excuses agents use to skip steps (e.g., "I'll add tests later") with documented counter-arguments.
- Verification is non-negotiable. Every skill ends with evidence requirements - tests passing, build output, runtime data. "Seems right" is never sufficient.
- Progressive disclosure. The
SKILL.mdis the entry point. Supporting references load only when needed, keeping token usage minimal.
agent-skills/
├── skills/ # 28 skills (24 lifecycle + 4 ops + 1 meta)
│ ├── interview-me/ # Define
│ ├── idea-refine/ # Define
│ ├── spec-driven-development/ # Define
│ ├── planning-and-task-breakdown/ # Plan
│ ├── incremental-implementation/ # Build
│ ├── context-engineering/ # Build
│ ├── source-driven-development/ # Build
│ ├── doubt-driven-development/ # Build
│ ├── frontend-ui-engineering/ # Build
│ ├── test-driven-development/ # Build
│ ├── api-and-interface-design/ # Build
│ ├── browser-testing-with-devtools/ # Verify
│ ├── debugging-and-error-recovery/ # Verify
│ ├── code-review-and-quality/ # Review
│ ├── code-simplification/ # Review
│ ├── security-and-hardening/ # Review
│ ├── performance-optimization/ # Review
│ ├── git-workflow-and-versioning/ # Ship
│ ├── ci-cd-and-automation/ # Ship
│ ├── deprecation-and-migration/ # Ship
│ ├── documentation-and-adrs/ # Ship
│ ├── observability-and-instrumentation/ # Ship
│ ├── shipping-and-launch/ # Ship
│ ├── chaos-engineering/ # Ops
│ ├── cost-optimization/ # Ops
│ ├── data-engineering/ # Ops
│ ├── ai-ops/ # Ops
│ └── using-agentforge/ # Meta: how to use this pack
├── agents/ # 5 specialist personas
├── references/ # 5 supplementary checklists
├── hooks/ # Session lifecycle hooks
├── scripts/ # Validation & build automation
├── .claude/commands/ # 7 slash commands (Claude Code)
├── .gemini/commands/ # 7 slash commands (Gemini CLI)
├── commands/ # 8 slash commands (Antigravity CLI)
├── plugin.json # Antigravity plugin manifest
├── package.json # Node.js tooling & scripts
├── Makefile # Local development workflows
└── docs/ # Setup guides per tool
This repository includes a comprehensive validation and quality pipeline:
# Install dependencies
npm install
# Run full validation suite
npm test
# Or use Make
make ciAvailable commands:
| Command | What It Does |
|---|---|
npm run validate |
Validate all skill files for anatomy compliance |
npm run validate:strict |
Same, but warnings block CI |
npm run quality:cross-skill |
Check cross-skill consistency and references |
npm run quality:agents |
Validate agent persona files |
npm run test:hooks |
Test session lifecycle hooks |
npm run build:packages |
Build .zip packages for distribution |
npm run stats |
Show project statistics dashboard |
Quality gates enforced:
- YAML frontmatter validation (name, description, max length)
- Required sections: Overview, When to Use, Common Rationalizations, Red Flags, Verification
- Cross-skill reference integrity (no dead links)
- Internal markdown link validation
- Description quality (must contain both "what" and "when" signals)
- Token estimation and size warnings
- Code block language specifier checks
- Agent persona consistency
- Lifecycle coverage completeness
"I watched an AI agent ship a 'working' feature that had no tests, no error handling, and a SQL injection vulnerability. It was 'done' in 20 minutes. It would have taken 2 days to fix in production." — Every engineering lead, 2024-2025
AI agents are incredible accelerators. They're also incredible liability generators — because they optimize for speed, not correctness. They don't know what they don't know, and they don't know that they don't know it.
AgentForge is the guardrail.
Every skill in this pack encodes hard-won judgment from production engineering:
- When to write a spec (always, for anything non-trivial)
- What to test (behavior, not implementation; edge cases, not just happy path)
- How to review (five axes, not just "does it compile")
- When to ship (when rollback is faster than fix-forward)
These aren't theoretical ideals. They're the workflows that separate teams that sleep through launches from teams that don't.
This pack draws from the best engineering cultures in the world:
- Google: Hyrum's Law, Beyonce Rule, test pyramid, change sizing, trunk-based development, code as liability
- Netflix: Chaos engineering, circuit breakers, graceful degradation
- Stripe: API design, backward compatibility, developer experience
- Amazon: Two-pizza teams, service boundaries, operational readiness
Every principle is embedded directly into the step-by-step workflows agents follow — not as footnotes, but as non-negotiable steps.
We accept contributions that make agents more reliable, not more clever.
See docs/skill-anatomy.md for the format specification and CONTRIBUTING.md for guidelines. Every PR goes through the same quality gates the skills enforce — eat your own dog food.
MIT — use these skills in your projects, teams, and tools. Build something great.
