Skip to content

feat (mcp server): Add sandbox MCP server with secure Python code execution#187

Open
florenzi002 wants to merge 9 commits intomainfrom
code-sandbox
Open

feat (mcp server): Add sandbox MCP server with secure Python code execution#187
florenzi002 wants to merge 9 commits intomainfrom
code-sandbox

Conversation

@florenzi002
Copy link
Member

@florenzi002 florenzi002 commented Mar 3, 2026

Adds sandbox for code execution in a python secure container via mcp (#166)
The main contribution is in mcp/servers/sandbox.

Downgrades required python version to >=3.12 for pydantic compatibility

Conceptually what is proposed in this PR is largely the same as what the mcp-sandbox utility provides. It exposes a couple of tools to run python code, either passed as string or file, in a dedicated lightweight and secure container.

this PR would work by registering the sandbox tools as top level mcp servers and could be reached by all the other tools, servers or agent, including the planner executor if needed.

@florenzi002 florenzi002 self-assigned this Mar 3, 2026
@DhavalRepo18
Copy link
Collaborator

@florenzi002, in my understanding, if there is an existing MCP Server, do I need to write the code again, or is there an economic way of just registering it into the current ecosystem?

I was originally under the impression that - existing MCP server stays where they are, MCP client gets registered. And we do not need to write code.

@florenzi002
Copy link
Member Author

florenzi002 commented Mar 4, 2026

@florenzi002, in my understanding, if there is an existing MCP Server, do I need to write the code again, or is there an economic way of just registering it into the current ecosystem?

I was originally under the impression that - existing MCP server stays where they are, MCP client gets registered. And we do not need to write code.

My understanding is that #166 was about registering a new MCP server to run code. This PR addresses the following in #166

Given that we are rebasing AssetOpsBench in the MCP-compliant environment, we would like to provide a Sandbox as part of the AssetOpsBench MCP echo system.

All the other servers stay the same, this is just an additional one registered alongside all the others e.g., utility server

@DhavalRepo18
Copy link
Collaborator

@florenzi002 I discussed this with @ShuxinLin, and I will be primarily reviewing this PR.

@DhavalRepo18 DhavalRepo18 removed the request for review from ShuxinLin March 5, 2026 03:20
@DhavalRepo18
Copy link
Collaborator

@florenzi002 - Any further comments on what makes the existing MCP-Sandbox tough will be highly valuable.

#166 (comment)

@florenzi002
Copy link
Member Author

@florenzi002 - Any further comments on what makes the existing MCP-Sandbox tough will be highly valuable.

#166 (comment)

@DhavalRepo18

I've found that sandbox-mcp is primarily a GO utility. This means a need to install a whole GO compiler for a single dependency. Then it doesn't provide a way to install a subset of the sandboxes, so it always install about 6GB of sandboxes some of which we would probably never use, i think for starter a python only sandbox is all we need. Furthermore the current AssetOpsBench can be used with both docker or podman backends while the proposed library works with docker only due to some hardcoded paths in the source code of the project, we would lose that flexibility; it is relevant because while docker offers a free tier, it is a commercial product and a part of the open source community is moving away (or required by institutions) to use open source alternatives like podman. Finally sandbox MCP seems to be stuck with development to May 2025 so I think this small in-house alternative might prove to be more stable.

Conceptually what is proposed in this PR is largely the same as what the library provides. Ultimately it expose a couple of tools to run python code, either passed as string or file, in a dedicated lightweight and secure container.

this PR would work by registering the sandbox tools as top level mcp servers and could be reached by all the other tools, servers or agent, including the planner executor if needed.

Copy link
Collaborator

@DhavalRepo18 DhavalRepo18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This PR needs test cases at the tool level testing.
  • This PR also needs some example scenarios to be tested: where we typically download data from IoT and then perform the same data aggregation (obtain first-order statistics) using Python code sandbox. Some example query - ``Give me the mean and max value of temperature for Chiller 6.''
  • The Docker should also expose the library available as a part of the MCP-doc string, and this will enable efficient coding at the LLM side
  • version (typically we fix library version to avoid a mismatch in APIs, etc)

@DhavalRepo18
Copy link
Collaborator

DhavalRepo18 commented Mar 11, 2026

  • There is one thing that came to my mind about stateless and stateful. Does this environment maintain the states?
  • storage mounting, the data exchange will be via file, most of the time, how this storage is being maintained during agent execution
  • Currently, the script or program will go maybe as a string, but what happens if LLM makes a decision on executing an existing .py file? (A sandbox receives Python code and data), Some design thinking should be done here.
  • Ideally, for a shortcut, the doc string of the MCP tool is embedded with an example, but the MCP allows you to have an additional two Resources and Prompts.

@florenzi002
Copy link
Member Author

florenzi002 commented Mar 18, 2026

  • There is one thing that came to my mind about stateless and stateful. Does this environment maintain the states?

No it doesn't. Containers are ephemeral and stateless. The alternative libraries are also stateless as such. I think if state is of importance at any time it could be made so the container returns the result of the script + a dump of the environment (e.g., variables, etc).

  • storage mounting, the data exchange will be via file, most of the time, how this storage is being maintained during agent execution

I think that files produced by the agent during execution and needed for a particular coding round can be dynamically mounted in the container before running the code as part of the mcp call, maybe b64 encoded strings or via any other network protocol. Alternative could be to mount persistent storage to the mcp server and let the agent upload there for long term storage, it is more complex though. Currently when running AssetOpsBench locally with both the mcp server and agent on the same machine the agent workspace can be mounted directly as part of the sandbox container solving the use case.

  • Currently, the script or program will go maybe as a string, but what happens if LLM makes a decision on executing an existing .py file? (A sandbox receives Python code and data), Some design thinking should be done here.

It is very similar to the above use case if i understand correctly.

  • Ideally, for a shortcut, the doc string of the MCP tool is embedded with an example, but the MCP allows you to have an additional two Resources and Prompts.

is this suggestion here to add a toolcall/mcp call example in the docstring?

@DhavalRepo18
Copy link
Collaborator

@florenzi002 we like to merge this PR early next week. At present we are running all the code being submitted to reduce the future work.

@DhavalRepo18
Copy link
Collaborator

This one is now in actively being reviwed. Please addess conflict and name (We do not use word Agent, if any).

@DhavalRepo18 DhavalRepo18 marked this pull request as ready for review March 19, 2026 22:02
@DhavalRepo18
Copy link
Collaborator

  • This PR needs test cases at the tool level testing.
  • This PR also needs some example scenarios to be tested: where we typically download data from IoT and then perform the same data aggregation (obtain first-order statistics) using Python code sandbox. Some example query - ``Give me the mean and max value of temperature for Chiller 6.''
  • The Docker should also expose the library available as a part of the MCP-doc string, and this will enable efficient coding at the LLM side
  • version (typically we fix library version to avoid a mismatch in APIs, etc)

Please address the second bullet. We need one example to run on the existing dataset (connect IoT and Sandbox) and then add the instruction.MD just like other examples.

@florenzi002 florenzi002 changed the title Add sandbox MCP server with secure Python code execution feat (mcp server): Add sandbox MCP server with secure Python code execution Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants