Skip to content

Conversation

@ankitajhanwar2001
Copy link

@ankitajhanwar2001 ankitajhanwar2001 commented Dec 18, 2025

User description

SUMMARY

This PR updates the SQLGlot dialect mapping for AWS Athena from PRESTO to ATHENA.
Athena-specific SQL syntax (e.g., USING EXTERNAL FUNCTION) now parses correctly in Superset, preventing “Unable to parse SQL” errors for queries that are valid in Athena but not in Presto.

Related Issue

Fixes #36717

Changes

  • Updated SQLGlot dialect mapping for awsathena in superset/sql/parse.py
  • Added unit tests in tests/unit_tests/sql/parse_tests.py covering Athena-specific SQL parsing
  • Verified parsing does not fail for USING EXTERNAL FUNCTION constructs
  • Tested with local Superset Athena database connection

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A

TESTING INSTRUCTIONS

  1. Unit tests

    pytest tests/unit_tests/sql/parse_tests.py
    
  2. Local Superset setup with Athena

  • Configure Athena database connection locally in Superset.

  • Execute a query using USING EXTERNAL FUNCTION, e.g.:

    USING EXTERNAL FUNCTION decrypt(data varbinary) RETURNS VARCHAR LAMBDA 'arn:aws:lambda:ap-south-1:123456789111:function:lambda-test' SELECT 1
    
  • Verify parsing succeeds and no Superset errors occur.

ADDITIONAL INFORMATION

  • Has associated issue: Athena SQL parsing fails due to Presto dialect mapping in SQLGlot #36717
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

CodeAnt-AI Description

Use Athena dialect for awsathena so Athena-specific SQL parses correctly

What Changed

  • Switched the SQL parser mapping for engine "awsathena" from Presto to Athena so Athena-only syntax parses correctly
  • Added a unit test that verifies queries using "USING EXTERNAL FUNCTION" parse and format without errors

Impact

✅ Fewer Athena parse failures for valid queries
✅ Clearer parsing for queries using Athena-specific syntax
✅ Prevents "Unable to parse SQL" errors for awsathena connections

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

@codeant-ai-for-open-source
Copy link
Contributor

CodeAnt AI is reviewing your PR.


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@bito-code-review
Copy link
Contributor

bito-code-review bot commented Dec 18, 2025

Code Review Agent Run #0b9f57

Actionable Suggestions - 0
Additional Suggestions - 1
  • tests/unit_tests/sql/parse_tests.py - 1
    • Missing type annotation · Line 2894-2894
      The test function `test_awsathena_engine_mapping` is missing the return type annotation `-> None`, which is inconsistent with other test functions in the file (e.g., `test_remove_quotes() -> None`). This violates the project's typing standards requiring type hints for all new Python code.
Review Details
  • Files reviewed - 2 · Commit Range: 0c417be..0c417be
    • superset/sql/parse.py
    • tests/unit_tests/sql/parse_tests.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Default Agent You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

@codeant-ai-for-open-source codeant-ai-for-open-source bot added the size:S This PR changes 10-29 lines, ignoring generated files label Dec 18, 2025
@dosubot dosubot bot added the change:backend Requires changing the backend label Dec 18, 2025
@codeant-ai-for-open-source
Copy link
Contributor

Nitpicks 🔍

🔒 No security issues identified
⚡ Recommended areas for review

  • Dialect availability
    Mapping awsathena to Dialects.ATHENA assumes the installed sqlglot
    version exposes Dialects.ATHENA. If a runtime environment uses an older
    sqlglot release that doesn't provide ATHENA, importing/using this
    mapping may cause runtime failures or unexpected fallbacks. Validate
    CI/test environments and consider a safe fallback.

  • Parsing behavior changes
    Switching from PRESTO to ATHENA will change sqlglot parsing/AST for some
    queries. Ensure the unit tests cover edge cases (e.g., constructs that
    were previously accepted under PRESTO but not under ATHENA) and that
    downstream codepaths (formatting, transpile, optimize) behave correctly.

  • Missing Assertion
    The new test test_awsathena_engine_mapping calls statement.format() but does not assert any outcome. This makes the test weaker — it will only fail on exceptions during parsing, and won't catch regressions where the wrong dialect is used but parsing still succeeds. Consider adding explicit assertions that validate the mapping or the formatted output.

  • Dialect Mapping Verification
    The goal of the change is to ensure awsathena maps to the ATHENA SQLGlot dialect. The test should explicitly verify the mapping (e.g., SQLGLOT_DIALECTS["awsathena"] == Dialects.ATHENA) or assert properties of the formatted SQL that demonstrate Athena-specific syntax was handled. Otherwise the test may pass without actually validating the intended configuration.

  • Engine name variants
    The key used is awsathena. Database engine identifiers sometimes vary
    (e.g. athena, aws_athena, aws-athena). If other parts of the
    codebase or user configs use a different identifier the mapping won't
    apply and parsing will revert to base behavior. Consider adding common
    aliases or normalizing engine names before lookup.

@@ -58,7 +58,7 @@
SQLGLOT_DIALECTS = {
"base": Dialects.DIALECT,
"ascend": Dialects.HIVE,
"awsathena": Dialects.PRESTO,
"awsathena": Dialects.ATHENA,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Compatibility with older sqlglot versions: directly referencing Dialects.ATHENA may raise an AttributeError if the installed sqlglot version lacks the ATHENA dialect; use a safe fallback (e.g. PRESTO) via getattr to avoid runtime attribute errors and ensure parsing still proceeds. [possible bug]

Severity Level: Critical 🚨

Suggested change
"awsathena": Dialects.ATHENA,
"awsathena": getattr(Dialects, "ATHENA", Dialects.PRESTO),
Why it matters? ⭐

Using getattr(Dialects, "ATHENA", Dialects.PRESTO) guards against environments with older sqlglot that lack an ATHENA entry and avoids an import-time AttributeError. The fallback to PRESTO is a reasonable defensive choice when ATHENA isn't available.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** superset/sql/parse.py
**Line:** 61:61
**Comment:**
	*Possible Bug: Compatibility with older sqlglot versions: directly referencing `Dialects.ATHENA` may raise an AttributeError if the installed sqlglot version lacks the ATHENA dialect; use a safe fallback (e.g. PRESTO) via getattr to avoid runtime attribute errors and ensure parsing still proceeds.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

)
statement = SQLStatement(sql, engine="awsathena")

# Should parse without errors using Athena dialect
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The test calls statement.format() but contains no assertion that the parsed statement is a SELECT (or otherwise valid); add an explicit assertion like assert statement.is_select() to ensure the SQL is parsed as the expected statement type rather than silently passing without verification. [logic error]

Severity Level: Minor ⚠️

Suggested change
# Should parse without errors using Athena dialect
# Ensure the parsed statement is a SELECT (verifies parsing result)
assert statement.is_select()
Why it matters? ⭐

Reasonable and useful: asserting statement.is_select() makes the test explicit
about the expected kind of parsed AST instead of silently passing because
format() didn't raise. It's a low-risk, high-signal assertion that improves test quality.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/unit_tests/sql/parse_tests.py
**Line:** 2904:2904
**Comment:**
	*Logic Error: The test calls `statement.format()` but contains no assertion that the parsed statement is a SELECT (or otherwise valid); add an explicit assertion like `assert statement.is_select()` to ensure the SQL is parsed as the expected statement type rather than silently passing without verification.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

@codeant-ai-for-open-source
Copy link
Contributor

CodeAnt AI finished reviewing your PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change:backend Requires changing the backend size/S size:S This PR changes 10-29 lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Athena SQL parsing fails due to Presto dialect mapping in SQLGlot

1 participant