Harden BigQuery ML tools against SQL injection#5251
Harden BigQuery ML tools against SQL injection#5251petrmarinec wants to merge 3 commits intogoogle:mainfrom
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
89c2b7c to
86db909
Compare
86db909 to
f928c91
Compare
|
Response from ADK Triaging Agent - Security Review Complete. |
|
Hi @petrmarinec , Thank you for your contribution! We appreciate you taking the time to submit this pull request. |
Summary
This change hardens the BigQuery ML helper tools exposed by
BigQueryToolsetagainst SQL injection.forecast,analyze_contribution, anddetect_anomaliespreviously constructed SQL with direct string interpolation of tool inputs. Because these helpers are exposed through the toolset, attacker-influenced inputs could be propagated into generated SQL, including inORDER BYclauses and model option values.What changed
forecast,analyze_contribution, anddetect_anomaliesinstead of interpolating them into SQL strings.history_data,input_data, andtarget_datato valid BigQuery table IDs instead of accepting raw SQL statements.detect_anomaliesbefore placing them inORDER BY.Why this approach
BigQuery supports parameterized queries natively, which is the correct defense for literal values controlled by tool inputs.
Table references and
ORDER BYidentifiers cannot be passed as query parameters, so those inputs are now validated as BigQuery table IDs or field paths before use.Compatibility
This is an intentional hardening change: these ML helper tools no longer accept raw SQL statements as data sources.
Callers that need arbitrary SQL should use
execute_sqldirectly or materialize a view or table first, then pass that table ID to the ML helper.Validation
Verified in a fresh Linux environment with test dependencies installed via
pip install -e '.[test]':pytest tests/unittests/tools/bigquery/test_bigquery_query_tool.py -q95 passedAlso re-ran the offline regression scenario used to confirm the original issue and verified that the generated SQL no longer contains the previously attacker-controlled fragments.
This PR also adds unit regressions for those cases so future changes do not reintroduce the same issue.