Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
6318d5d
add dataclasses and docstrings
jdsheehan Feb 4, 2026
ffc2581
add tests
jdsheehan Feb 4, 2026
29d4ff4
bump to version 1.0.0
jdsheehan Feb 4, 2026
7948f07
chore(deps): update couchdb docker tag to v3.5
renovate[bot] Mar 2, 2026
2aafad6
Update README.md
DhavalRepo18 Mar 3, 2026
77d17e2
move mlflow.start_run out of event loop
jdsheehan Mar 4, 2026
733e00d
use tolerence in graders.numeric_match
jdsheehan Mar 4, 2026
4f34b87
switch from mlflow to mlflow.client in grading
jdsheehan Mar 4, 2026
45dee58
tidy up return type
jdsheehan Mar 4, 2026
a0913bb
refactor: implement Work Order Agent MCP server and agent_hive integr…
ShuxinLin Mar 4, 2026
01526d4
refactor(wo): replace stub with full MCP server from api_implementati…
ShuxinLin Mar 4, 2026
607be34
refactor(wo): split main.py into models, data, tools, and entry point
ShuxinLin Mar 4, 2026
8b5de8e
feat: replace CSV data access with CouchDB in WO server
ShuxinLin Mar 4, 2026
d833454
feat: add init_asset_data.py and wire both init scripts into docker-c…
ShuxinLin Mar 4, 2026
fbc363b
refactor: consolidate db init into couchdb_setup.sh, remove db-init c…
ShuxinLin Mar 4, 2026
c90bb4d
refactor: rename COUCHDB_DBNAME→IOT_DBNAME, WO_COUCHDB_DBNAME→WO_DBNAME
ShuxinLin Mar 4, 2026
b478548
feat: add WO integration tests and update INSTRUCTIONS.md with WorkOr…
ShuxinLin Mar 4, 2026
3f50e2b
fix: downgrade doc reference to Python 3.12, fix requires_couchdb to …
ShuxinLin Mar 4, 2026
5cea530
fix: standardize COUCHDB_USERNAME/PASSWORD local var names in iot ser…
ShuxinLin Mar 4, 2026
62fd7ac
feat: add scripts/start_servers.sh to smoke-test all MCP servers
ShuxinLin Mar 4, 2026
0bec54d
fix: rename _dataset to dataset in WO CouchDB init and data layer
ShuxinLin Mar 4, 2026
cad5e6f
feat: register WorkOrderAgent in plan-execute default servers
ShuxinLin Mar 4, 2026
8f83b1c
docs: add WO plan-execute examples and on-demand server note to INSTR…
ShuxinLin Mar 4, 2026
ed93479
chore: remove scripts/ directory and update INSTRUCTIONS.md
ShuxinLin Mar 4, 2026
20ab48f
Removing special characters
ChathurangiShyalika Mar 6, 2026
d8c6ab9
Removing special characters
ChathurangiShyalika Mar 6, 2026
6d4ff0b
Merge branch 'main' into issue133
jdsheehan Mar 6, 2026
728e67e
switch from mlflow.langchain.autolog to mlflow.autolog
jdsheehan Mar 6, 2026
341b1bb
remove stray mlflow.start
jdsheehan Mar 6, 2026
d5f305c
add timeout for scenario-client
jdsheehan Mar 6, 2026
00f8dc5
add closing mlflow.end
jdsheehan Mar 6, 2026
22ee7c3
Merge pull request #134 from IBM/issue133
jdsheehan Mar 6, 2026
3be7344
Merge branch 'main' into sserver-refactor
jdsheehan Mar 6, 2026
b41ffba
replace mlflow with mlflow.MLflowClient
jdsheehan Mar 9, 2026
a1d4e6d
Merge pull request #194 from IBM/issue193
bradleyjeck Mar 9, 2026
43652b1
Merge pull request #192 from ChathurangiShyalika/main
DhavalRepo18 Mar 9, 2026
cea9302
adding templates
DhavalRepo18 Mar 9, 2026
74b56c6
Adding contribution template also
DhavalRepo18 Mar 10, 2026
9a22806
add build-date endpoint to scenario-server
jdsheehan Mar 11, 2026
953f083
Merge pull request #202 from IBM/issue201
bradleyjeck Mar 11, 2026
acb021b
Merge pull request #199 from IBM/issue_198
DhavalRepo18 Mar 11, 2026
99dbd5b
Merge pull request #155 from IBM/renovate/couchdb-3.x
DhavalRepo18 Mar 16, 2026
3df0e81
fix: add --break-system-packages to pip3 install in couchdb_setup.sh
ShuxinLin Mar 17, 2026
0f25844
Merge pull request #191 from IBM/refactor/reorg-wo-agent
DhavalRepo18 Mar 17, 2026
ea717f5
refactor: rename "agent" to "server" throughout workflow/ (#170)
ShuxinLin Mar 19, 2026
16673ea
refactor: rename MCP server display names to lowercase short IDs
ShuxinLin Mar 19, 2026
7c88797
Agent Oriented Planning
DhavalRepo18 Mar 19, 2026
d70257a
fix: no-tool steps with dependencies now derive value via LLM
ShuxinLin Mar 19, 2026
09c1c93
Merge pull request #220 from IBM/issue_219
DhavalRepo18 Mar 19, 2026
54696dd
Merge pull request #218 from IBM/refactor/issue-170-rename-agent-to-s…
DhavalRepo18 Mar 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .env.public
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# ── CouchDB (IoTAgent server) ────────────────────────────────────────────────
COUCHDB_URL=http://localhost:5984
COUCHDB_DBNAME=chiller
COUCHDB_USERNAME=admin
COUCHDB_PASSWORD=password
IOT_DBNAME=chiller
WO_DBNAME=workorder

# ── IBM WatsonX (plan-execute runner) ────────────────────────────────────────
WATSONX_APIKEY=
Expand Down
17 changes: 17 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE/bugfix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
## Description
## Fix Details
## Impact on Benchmarking
- [ ] **No change to baselines**: This fix only improves stability/performance.
- [ ] **Baseline change**: This fix corrects a scoring error. (Please provide "Before vs. After" results).

## Related Issues
- Fixes: #

## Verification Steps
1. Run the following command: `uv run pytest tests/integration`
2. Describe any manual verification performed:

## Checklist
- [ ] I have added tests that prove my fix is effective.
- [ ] My code follows the project's Ruff formatting and linting rules.
- [ ] I have signed off my commits (DCO).
8 changes: 8 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE/chore.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
## Description
## Changes
- [ ] Dependency update (`uv lock`)
- [ ] Documentation / Tutorial update
- [ ] Refactoring (no logic change)

## Checklist
- [ ] I have signed off my commits (DCO).
21 changes: 21 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE/feature.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
## Description
## Type of Change
- [ ] New Benchmark Scenario (Industry/Asset type)
- [ ] Evaluation Metric / Scorer
- [ ] Agentic Orchestration Logic (ReAct, Plan-Execute, etc.)
- [ ] Infrastructure / Tooling Improvement

## Industry Relevance
## Related Issues
- Refs: #

## Testing & Validation
- [ ] **Unit Tests**: `uv run pytest tests/unit` passed.
- [ ] **Scenario Validation**: Verified that the agent can execute the trajectory.
- [ ] **Data Integrity**: Checked that no PII or sensitive industrial data is included.

## Checklist
- [ ] My code follows the project's Ruff formatting and linting rules.
- [ ] I have performed a self-review of my code.
- [ ] I have updated the documentation (README or /docs) accordingly.
- [ ] I have signed off my commits (DCO).
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -199,4 +199,5 @@ benchmark/cods_track2/.env.local
CLAUDE.md
mcp/couchdb/sample_data/bulk_docs.json
.env
mcp/servers/tsfm/artifacts/tsfm_models/
mcp/servers/tsfm/artifacts/tsfm_models/
src/tmp/
2 changes: 1 addition & 1 deletion .python-version
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3.14
3.12
129 changes: 129 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# Contributing to AssetOpsBench

Thank you for your interest in contributing to **AssetOpsBench**! This project aims to advance the state of Industrial AI by providing a rigorous benchmarking framework for autonomous asset operations.

## How to Contribute

1. **Fork the repository** to your own GitHub account.
2. **Create a feature branch** from `main` in your fork: `git checkout -b feature/<short-topic>`.
3. **Keep PRs small and focused**: We prefer PRs with fewer than 300 changed lines to ensure high-quality reviews.
4. **Follow Conventional Commits** for all commits and PR titles.
5. **Run formatting and tests** locally before opening a pull request.
6. **Open a Pull Request** from your fork to `main` with a clear description of the benchmarking impact.

> **Note:** All PRs are merged using **Squash and merge**. The PR title will become the final commit message. Please write it carefully using the Conventional Commits format.

---

## DCO: Developer's Certificate of Origin

This repository requires a **DCO 1.1 signoff** on every commit. This is a legal statement asserting that you have the right to submit the code. You can sign off by adding the `-s` or `--signoff` flag:

```bash
git commit -s -m 'feat(eval): add predictive maintenance scoring for pumps'

```

If you have already made commits without a signoff, you can fix them:

* **Last commit only:** `git commit --amend --no-edit --signoff`
* **Multiple commits:** `git rebase --signoff HEAD~<n>` (where `<n>` is the number of commits).

Followed by a `git push -f` to your fork.

---

## Commit & Branching Standards

### Conventional Commits

We follow the [Conventional Commits](https://www.conventionalcommits.org/) specification.

**Structure:** `<type>[optional scope]: <description>`

* `feat`: New benchmark scenario, asset model, or agentic logic (e.g., ReAct).
* `fix`: Bug fix in evaluation scripts or data loaders.
* `docs`: Documentation improvements.
* `refactor`: Code changes that neither fix a bug nor add a feature.
* `perf`: Improvements to evaluation speed or data processing.

### Branch Naming

Use the structure: `<type>/<description>`

* **Good:** `feature/hvac-chiller-scenario`, `bugfix/fix-jsonl-loader`
* **Bad:** `update1`, `feature_new_stuff` (no underscores or vague names)

---

## Local Development Setup

We use `uv` for lightning-fast Python dependency management.

### 1. Install Dependencies

```bash
uv sync --dev
source .venv/bin/activate

```

### 2. Code Quality & Formatting

We use `ruff` for both linting and formatting. Run these before every commit:

```bash
uv run ruff format .
uv run ruff check --fix .

```

### 3. Security Scanning

To protect industrial metadata and API keys, run the IBM `detect-secrets` scan:

```bash
uv pip install --upgrade "git+[https://github.com/ibm/detect-secrets.git@master#egg=detect-secrets](https://github.com/ibm/detect-secrets.git@master#egg=detect-secrets)"
detect-secrets scan --update .secrets.baseline
detect-secrets audit .secrets.baseline

```

---

## Running Tests & Validation

### Unit Tests

Validate core logic for metrics and data parsing:

```bash
uv run pytest tests/unit

```

### Integration & Benchmark Validation

Verify that agent trajectories and environment simulations run correctly:

```bash
chmod +x ./scripts/run_tests.sh
./scripts/run_tests.sh

```

This script validates:

* **Linting**: Ruff validation.
* **Agentic Logic**: Verification of ReAct and Plan-Execute orchestration.
* **Asset Consistency**: Ensuring industrial asset IDs (e.g., FailureSensorIQ) match registry definitions.

---

## Pull Request Guidelines

* **Benchmark Integrity**: If your change modifies existing scoring logic, please include a "Before vs. After" comparison in the PR description.
* **Asset Privacy**: Ensure no real-world sensitive telemetry data is included in scenarios without anonymization.
* **Documentation**: Update the relevant dataset cards (e.g., for FailureSensorIQ) if you modify the underlying data structures.
* **PR Templates**: Use the provided templates for Features, Bug Fixes, or Chores to ensure consistent review cycles.

Loading
Loading