Skip to content

mcp: validate data product names before interpolating into SQL#35780

Merged
bobbyiliev merged 1 commit intoMaterializeInc:mainfrom
bobbyiliev:mcp-validate-data-product-name
Apr 1, 2026
Merged

mcp: validate data product names before interpolating into SQL#35780
bobbyiliev merged 1 commit intoMaterializeInc:mainfrom
bobbyiliev:mcp-validate-data-product-name

Conversation

@bobbyiliev
Copy link
Copy Markdown
Contributor

Validates that data product names passed to read_data_product are syntactically valid SQL identifiers by parsing a probe query locally, preventing malformed or injected names from reaching the database as opaque SQL errors.

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

  • Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
  • Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
  • Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

  • The PR title is descriptive and will make sense in the git log.
  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

@bobbyiliev bobbyiliev force-pushed the mcp-validate-data-product-name branch 2 times, most recently from 6ba5a4c to f1995cf Compare March 30, 2026 11:19
@bobbyiliev bobbyiliev marked this pull request as ready for review March 30, 2026 12:29
@bobbyiliev bobbyiliev requested a review from a team as a code owner March 30, 2026 12:29
Copy link
Copy Markdown
Contributor

@ggevay ggevay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering, couldn't we instead use something like StrExt::quoted? The quotation should make any SQL keywords not be parsed as keywords, but parsed as part of the name. Also, StrExt::quoted makes sure to escape quotation marks, so the user can't break out of the quotation.


let make_err = || {
McpRequestError::QueryValidationFailed(format!(
"Invalid data product name: {}. Expected a fully qualified name like '\"database\".\"schema\".\"name\"'",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not allow an unqualified name? Actually, the test

assert!(validate_data_product_name("my_view").is_ok());

seems to indicate that we do allow unqualified names.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ops yes, the error message was misleading! We do accept unqualified names. Updated the message to show both forms as examples.

@bobbyiliev bobbyiliev force-pushed the mcp-validate-data-product-name branch 2 times, most recently from b07130d to d56cb23 Compare March 31, 2026 13:21
@bobbyiliev bobbyiliev requested a review from ggevay March 31, 2026 13:38
@bobbyiliev
Copy link
Copy Markdown
Contributor Author

I'm wondering, couldn't we instead use something like StrExt::quoted? The quotation should make any SQL keywords not be parsed as keywords, but parsed as part of the name. Also, StrExt::quoted makes sure to escape quotation marks, so the user can't break out of the quotation.

@ggevay I think StrExt::quoted might not work here because it escapes double quotes with backslash (\") which is for display/logging, whereas SQL identifiers use doubled quotes (""). So I'm not sure it would be safe to use directly in SQL.

The other thing is the name arrives from mz_mcp_data_products as a pre-formatted multi-part identifier like "materialize"."schema"."view", so we can't really wrap the whole thing in one set of quotes since it needs to be parsed as a qualified name.

The AST validation approach makes sure the name is safe to interpolate as-is without transforming it. But I might be missing something, do you have a different approach in mind?

@ggevay
Copy link
Copy Markdown
Contributor

ggevay commented Mar 31, 2026

Ah yes, StrExt::quoted is not good.

My problem with the PR's approach is that it is a lot of brittle code to maintain. E.g., when someone later adds to the AST, they'll have to not forget to update this code as well. This could be enforced somewhat by relying more on pattern matching, with no wildcards, so that one would get compile errors in validate_data_product_name if they forget to attend to validate_data_product_name when adding AST stuff.

But another alternative approach would be to call parse_item_name (making it pub) to get an UnresolvedItemName (which takes care of having or not having qualifications), and then when interpolating into the final SQL query (not a probe, but the real query) print it with an AstFormatter that has FormatMode::Stable, so that everything will be quoted (see AstDisplay for Ident), so that nasty things can't escape from it. This would be just a few lines of code, unless I'm missing something. Do you think this would be feasible?

@bobbyiliev bobbyiliev force-pushed the mcp-validate-data-product-name branch from d56cb23 to b28e655 Compare March 31, 2026 16:04
@bobbyiliev
Copy link
Copy Markdown
Contributor Author

@ggevay ah nice suggestion, just implemented it, much cleaner! Added a public parse_item_name to mz-sql-parser (following the same pattern as parse_expr/parse_data_type), and replaced the AST inspection with:

fn safe_data_product_name(name: &str) -> Result<String, McpRequestError> {
    let parsed = parse_item_name(name).map_err(|_| ...)?;
    Ok(parsed.to_ast_string_stable())
}

Thanks for the suggestion and let me know what you think!

Copy link
Copy Markdown
Contributor

@ggevay ggevay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for iterating on it!

/// Validates that a data product name is a safe SQL identifier.
///
/// The name is interpolated into `SELECT * FROM {name}`, so we parse a
/// synthetic query and inspect the AST to ensure nothing beyond a bare
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(stale comment)

@bobbyiliev bobbyiliev force-pushed the mcp-validate-data-product-name branch from b28e655 to 64be1e0 Compare April 1, 2026 11:48
@bobbyiliev bobbyiliev merged commit d9b6a94 into MaterializeInc:main Apr 1, 2026
118 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants