feat: add VibrationAgent MCP server for vibration analysis benchmarks#190
feat: add VibrationAgent MCP server for vibration analysis benchmarks#190LGDiMaggio wants to merge 7 commits intoIBM:mainfrom
Conversation
|
@LGDiMaggio I've seen the same issue with Pydantic in #187. That PR downgrades to @ShuxinLin given this is a repo wide issue do you want a separate PR to fix versioning between python and pydantic versions (for visiblity) or are you happy with any of the incoming PRs to fix that? |
|
Thanks for the heads-up on #187 — good to know the 3.12 downgrade fixes it. Happy to align this PR with whatever versioning approach you settle on. Just let me know if any changes are needed on my side. |
|
I downgraded the py version in this commit 8ef012c. It should be in main branch, no? |
|
Yes, 8ef012c is already in main — this PR branch is based on top of it. I also just verified: after installing Python 3.12 via I'll push a small follow-up commit shortly to fix a minor test helper compatibility issue with fastmcp 2.14.5's |
|
@LGDiMaggio, we will review this PR progressively. |
| "id": 306, | ||
| "type": "Vibration", | ||
| "text": "What is the vibration severity classification for a machine with an RMS velocity of 4.5 mm/s? It is a medium-sized machine on rigid foundations.", | ||
| "category": "ISO Assessment", |
There was a problem hiding this comment.
ISO might be too specific. How about Condition Assessment?
There was a problem hiding this comment.
Makes sense. Renamed the category from 'ISO Assessment' to 'Condition Assessment' in both utterances (306, 307).
| { | ||
| "id": 308, | ||
| "type": "Vibration", | ||
| "text": "Fetch vibration sensor data from Chiller 6, sensor Current, starting from 2020-06-01.", |
There was a problem hiding this comment.
It could be a little bit fuzzy for the ending time. If the vibration is measured at high frequency, say every couple of seconds, the data volume could be too large. It is better to have the ending time.
There was a problem hiding this comment.
Good point. Updated the utterance to include an explicit end time:
Fetch vibration sensor data from Chiller 6, sensor Current, from 2020-06-01 to 2020-06-07 at site MAIN.
Additionally, the CouchDB client already enforces a server-side limit=10000 documents per query as a safety net against unbounded fetches.
| { | ||
| "id": 311, | ||
| "type": "Vibration", | ||
| "text": "Generate the FFT spectrum using a Blackman window for the loaded signal.", |
There was a problem hiding this comment.
This is a little bit fuzzy. We need to know which signal for the loaded dataset, for example the signal sensor name or ID. Maybe need to specific the return should be the peak frequencies (say top 1, 2, ...,).
There was a problem hiding this comment.
Agreed. Updated all signal analysis and diagnosis utterances (IDs 310-320) to reference explicit signal IDs (e.g. 'vib_001') and specify the expected output format. For example:
Compute the FFT spectrum for signal 'vib_001' and return the top 5 peak frequencies with amplitudes.
Compute the FFT spectrum of signal 'vib_001' using a Blackman window and return the top 5 peaks.
| { | ||
| "id": 312, | ||
| "type": "Vibration", | ||
| "text": "Perform envelope analysis on the vibration signal to look for bearing defect frequencies.", |
There was a problem hiding this comment.
Again, this is a little bit fuzzy, the implemented system might not know that there is a need for a bandpass filter (or it might be with a certain prompt examples). Ideally, we can be more specific:
- Add the signal ID:
- Specify the bandpass filter.
- Using certiain algorithm for the envelope.
The current utterance could work, but it increases the opportunities of hulliciation.
There was a problem hiding this comment.
Updated. The envelope utterance now specifies all three points:
- Signal ID: 'vib_001'
- Bandpass filter: 500 Hz to 1500 Hz
- Algorithm: Hilbert transform
Compute the envelope spectrum of signal 'vib_001' using a bandpass filter from 500 Hz to 1500 Hz (Hilbert transform) and return the top 5 peaks.
This removes ambiguity and lets the grader verify exact parameters.
There was a problem hiding this comment.
The current implementation is based on the scipy package. An alternative is to use existing packages for FFT, time series analysis, and other package. That could help to expand the availability of the analytic functions and focus on the scenario development.
There was a problem hiding this comment.
Thanks for the feedback @nianjunz.
The DSP modules are thin wrappers around scipy.signal, not reimplementations. The actual scipy-specific code is about 25 lines across fft_analysis.py and envelope.py (calls to welch, hilbert, butter, sosfilt, find_peaks). The remaining about 950 lines are domain knowledge that no existing package provides:
bearing_freqs.py is rolling-element bearing kinematics (BPFO/BPFI/BSF/FTF formulas and bearing database). Zero scipy.
fault_detection.py is ISO 10816 thresholds, shaft feature extraction, rule-based fault classification. Zero scipy.
scipy.signal is the de facto standard Python DSP library. I evaluated alternatives, but dedicated vibration packages in the Python ecosystem are either unmaintained or internally depend on scipy anyway.
That said, I'm very open to integrating a specific package if you have one in mind, happy to discuss. The modular dsp makes it straightforward to swap implementations without touching the MCP tool layer.
Regarding focusing on scenario development fully agreed, and that's the direction for the next iteration (addressing your other comments on utterance specificity and missing categories).
There was a problem hiding this comment.
Make the knowledge query utternaces are good enough for now. We can incremental add the other type of predictive and decision support utterances as time goes.
There was a problem hiding this comment.
Agreed — the current 24 utterances cover knowledge extraction, condition assessment, diagnostic and decision support well enough as a first iteration. Happy to add predictive utterances (e.g. RUL estimation, trend extrapolation) incrementally once the TSFM integration is in place. Thanks for the thorough review!
nianjunz
left a comment
There was a problem hiding this comment.
I made some initial comments on the fork - feat: add VibrationAgent MCP server for vibration analysis benchmarks
#190.
The work is great. It gives an opportunity to integrate a new application of asset management using the vibration signals. It might be that the AssetOpsBench is closer to the real engineering practice.
The main comments are: 1. Should we use the existing package to implement the MCP tools; 2. The utterances in general are a little bit fuzzy; 3. The utterances mainly focus on the knowledge extraction; we do not have utterances in prediction, diagnotic and decision support.
|
Thanks @nianjunz for the thorough review. Here is a summary of all changes addressing your feedback:
All changes are committed locally and will be pushed shortly. |
b21bf19 to
86efdf2
Compare
|
One additional note on multi-server orchestration and hallucination mitigation: I have been exploring the use of SKILL.md files as a coordination layer for complex MCP workflows — essentially structured instructions that guide the LLM through multi-tool pipelines with explicit constraints and verification steps. Early results in my project (https://github.com/LGDiMaggio/claude-stwinbox-diagnostics) show significant improvements in both task completion accuracy and hallucination reduction. However, this approach is currently tied to a specific LLM and is not model-agnostic, so it would not be appropriate for AssetOpsBench at this stage. Mentioning it here as a direction worth discussing for future iterations, especially for cross-server scenarios (e.g., VibrationAgent + TSFMAgent prognostic workflows). |
|
@ShuxinLin, I would like to request that you review this PR now. Please make all cross-checks on your side of implementation. And pay special attention to the question I have: how data is loaded into CouchDB and how it's accessed. |
I acknowledge that this was our discussion to start slow and then grow faster. |
|
@LGDiMaggio I have one question about
---> Question and answers are a bit disconnected. Shall we list the available sensor fields? |
|
@DhavalRepo18 good questions, two answers:
Regarding the broader question of how data is loaded into CouchDB and accessed: the couchdb_client.py is a generic client that queries any asset and sensor field present in the database. It reuses the same environment variables as IoTAgent (COUCHDB_URL, COUCHDB_DBNAME, COUCHDB_USERNAME, COUCHDB_PASSWORD). The current sample data in src/couchdb/sample_data/ contains only thermal and energy sensors for the Chillers, which are not suitable for vibration analysis. For this reason I also updated the data retrieval scenarios (308, 309, 321, 322) to reference a dedicated vibration asset (Motor_01, sensor Vibration_X) rather than Chiller 6. Populating CouchDB with vibration time-series data (acceleration in g, sampled at >= 1 kHz) is a prerequisite for the data retrieval scenarios to execute end-to-end. The characteristic_form for those scenarios now makes this dependency explicit. |
Did you added dataset? |
|
@DhavalRepo18 not yet — let me explain the options and the tradeoff. The couchdb_client.py is intentionally generic: it queries any asset and sensor field stored with the structure {asset_id, timestamp, sensor_field: value}. This is the same structure used by the existing IoT sample data, so VibrationAgent is fully compatible with the existing CouchDB schema. Two options for the dataset:
My recommendation is option 1 for this PR (synthetic, reproducible, schema-compatible), with the CouchDB loader script serving as the template for anyone who wants to adapt a real dataset later. Happy to proceed with that if you agree. |
Yes I agree with option 1, Synthetic data reduce our data license need also. Also we can point to a real data so user can do those extra steps for doing benchmark on a real dataset. We can provide a python sciript for any processing of the data needed and avoid copying any data. Pointer to one real dataset sitting somewhere but getting benefits of this work is highly appreciated. We are internded to merge this PR by early Next week. |
|
@ShuxinLin can you please initiate a code review and all the other observation you have from code perspective. If you can provide all of your review by Monday, @LGDiMaggio can make necessary changes and we can close the PR by early next week. |
Synthetic vibration data & generation script@DhavalRepo18 here's an update on the data side. This commit adds:
Regarding real datasets: I am currently working on making vibration datasets from my research and teaching activities available in an open format, but they are not yet publicly released under a compatible license. We can evaluate including them in a future iteration once the licensing is resolved. Regarding testing with the new vibration data: It would be valuable if someone from the IBM team could run an end-to-end smoke test (CouchDB load -> scenario execution) to validate that the data pipeline works correctly in the containerized environment. Also fixed a minor inconsistency: data_store.py was using ddof=0 for kurtosis while main.py and fault_detection.py both use ddof=1 — now aligned everywhere. Added myself (LGDiMaggio) to .all-contributorsrc. |
Added myself (LGDiMaggio) to .all-contributorsrc. ---> the moment your PR merged, the automated hook will revise. We have an auto policy.
|
|
@LGDiMaggio final question from me.
|
|
@DhavalRepo18 great question. The 24 scenarios naturally support a tool-augmented vs. baseline leaderboard using the existing 6-dimensional evaluation framework (Task Completion, Data Retrieval Accuracy, Result Verification, Agent Sequence, Clarity & Justification, Hallucination Check). Experimental design:
Expected results by category:
At least 11 of 24 scenarios (Data Retrieval, Signal Analysis, multi-step Diagnostic) are structurally impossible without the tools because they require real DSP computation or CouchDB access. This alone guarantees a significant delta on the leaderboard. The remaining 13 test whether tools also improve accuracy where LLMs have parametric knowledge (bearing math, ISO thresholds), where the Hallucination dimension should show the clearest improvement (exact computed values vs. approximations). Leaderboard format (consistent with the existing Kaggle benchmark):
This directly demonstrates the benchmark's core thesis: domain-specific tools measurably improve agent performance on industrial tasks. |
|
@nianjunz If your all questions are answered please approve the PR. We will also wait for @ShuxinLin to give her view. |
Add a new VibrationAgent MCP server that provides 8 tools for industrial vibration diagnostics: - get_vibration_data / list_vibration_sensors (CouchDB integration) - compute_fft_spectrum / compute_envelope_spectrum (DSP analysis) - assess_vibration_severity (ISO 10816 classification) - calculate_bearing_frequencies / list_known_bearings (bearing analysis) - diagnose_vibration (full automated diagnostic pipeline) DSP core adapted from vibration-analysis-mcp (Apache-2.0): https://github.com/LGDiMaggio/claude-stwinbox-diagnostics/tree/main/mcp-servers/vibration-analysis-mcp Also includes: - 20 benchmark scenarios (vibration_utterance.json, IDs 301-320) - Registration in workflow executor (DEFAULT_SERVER_PATHS) - scipy>=1.10.0 dependency and vibration-mcp-server entry point - 26 unit tests (DSP) + 17 MCP tool tests - Full documentation in INSTRUCTIONS.md Ref IBM#178 -- ready for review and testing Signed-off-by: Luigi Di Maggio <luigi.dimaggio@polito.it>
- conftest.py: handle both tuple and direct return from FastMCP.call_tool() - test_tools.py: fix bearing name assertion (includes description suffix) - test_tools.py: fix key name 'bearing' (was 'bearing_name') in to_dict() All 42 unit tests pass on Python 3.12; 2 integration tests skipped (no CouchDB). Signed-off-by: Luigi Di Maggio <luigi.dimaggio@polito.it>
- Add explicit end time to data retrieval scenario (308) - Add signal IDs and output format to all analysis/diagnosis/fault utterances (310-320) - Specify bandpass filter params and algorithm for envelope analysis (312) - Rename 'ISO Assessment' category to 'Condition Assessment' (306, 307) - Add Diagnostic scenarios (321-322): symptom-driven multi-tool orchestration - Add Decision Support scenarios (323-324): maintenance recommendations - Total scenarios: 20 -> 24 (IDs 301-324) Signed-off-by: Luigi Di Maggio <luigi.dimaggio@polito.it>
- Replace 'Chiller 6 / sensor Current' with 'Motor_01 / sensor Vibration_X' in scenarios 308, 309, 321, 322 - Add note in characteristic_form that CouchDB must be populated with vibration time-series data (acceleration in g, >= 1 kHz) for data retrieval scenarios to execute Signed-off-by: Luigi Di Maggio <luigi.dimaggio@polito.it>
- Add generate_synthetic_vibration.py (McFadden & Smith 1984 impulsive model) - Add bulk_docs_vibration.json (4096 docs, Motor_01/Vibration_X, BPFO fault) - Update couchdb_setup.sh to load vibration sample data - Fix kurtosis ddof inconsistency in data_store.py (ddof=0 -> ddof=1) - Add LGDiMaggio to .all-contributorsrc Signed-off-by: Luigi Di Maggio <luigi.dimaggio@polito.it>
Signed-off-by: Luigi Di Maggio <luigi.dimaggio@polito.it>
a7deebe to
ae33722
Compare
|
Rebased on latest main (includes WorkOrderAgent from #191). All 4 conflicts resolved:
All 42 tests pass, 2 integration tests correctly skipped (no CouchDB). |
|
@ShuxinLin, can you please prioritize this PR now? |
|
Conflicts resolved, ready to merge. |
|
We typically run the PR prior to merge. We plan to get this in the mainstream in a week. |
There was a problem hiding this comment.
tmp/ is going to be removed soon. Consolidate the scenarios to HF @DhavalRepo18
ShuxinLin
left a comment
There was a problem hiding this comment.
I have not run the vibration server locally since I feel couchdb setup with vibration data is not working.
| --db "${WO_DBNAME:-workorder}" | ||
|
|
||
| # Load vibration sample data (Motor_01 bearing fault) into the IoT database | ||
| VIBRATION_FILE="/sample_data/bulk_docs_vibration.json" |
There was a problem hiding this comment.
I found the bulk_docs_vibration.json is under servers/vibration/sample_data/. Any file moving operation I missed?
| "fmsr": "fmsr-mcp-server", | ||
| "tsfm": "tsfm-mcp-server", | ||
| "wo": "wo-mcp-server", | ||
| "IoTAgent": "iot-mcp-server", |
There was a problem hiding this comment.
we are removing the "agent" from mcp servers. this conflict merge is not correct
| - [FMSRAgent](#fmsragent) | ||
| - [TSFMAgent](#tsfmagent) | ||
| - [WorkOrderAgent](#workorderagent) | ||
| - [VibrationAgent](#vibrationagent) |
There was a problem hiding this comment.
similarly we renamed the mcp servers. you can change "VibrationAgent" to just "vibration"
| if [ -f "$VIBRATION_FILE" ]; then | ||
| echo "Loading vibration data..." | ||
| COUCHDB_URL="http://localhost:5984" \ | ||
| python3 /couchdb/init_asset_data.py \ |
There was a problem hiding this comment.
you are running init_asset_data.py to load the vibration data but init_asset_data.py file was not modified. Is the change complete?
Summary
This PR adds a VibrationAgent MCP server to AssetOpsBench, introducing industrial vibration diagnostics capabilities as described in Issue #178.
What's included
src/servers/vibration/main.pysrc/servers/vibration/dsp/data_store.py,couchdb_client.pysrc/servers/vibration/tests/vibration_utterance.jsonpyproject.toml,executor.py,INSTRUCTIONS.mdTools provided
get_vibration_datalist_vibration_sensorscompute_fft_spectrumcompute_envelope_spectrumassess_vibration_severitycalculate_bearing_frequencieslist_known_bearingsdiagnose_vibrationOrigin
DSP core adapted from vibration-analysis-mcp (Apache-2.0), with reliability fixes (kurtosis standardisation, velocity vectorisation, ddof consistency).
Testing
uv run pytest src/servers/vibration/tests/test_dsp.py -v)test_tools.py): blocked by Pydantic 2.12.5 + Python 3.14 incompatibility (repo-wide issue affecting all servers, not specific to this PR)Notes
Ref #178