Skip to content

feat: support DROP INDEX DDL#371

Merged
hamersaw merged 6 commits intolance-format:mainfrom
LuciferYang:feat/drop-index-v2
Apr 2, 2026
Merged

feat: support DROP INDEX DDL#371
hamersaw merged 6 commits intolance-format:mainfrom
LuciferYang:feat/drop-index-v2

Conversation

@LuciferYang
Copy link
Copy Markdown
Contributor

@LuciferYang LuciferYang commented Mar 31, 2026

Summary

Add support for dropping indexes on Lance tables through the Spark SQL interface:

ALTER TABLE catalog.db.table DROP INDEX index_name;

This is the counterpart to the existing CREATE INDEX and SHOW INDEXES commands, completing the index lifecycle management in lance-spark.

Design

DROP INDEX is a metadata-only operation — it removes the index entry from the dataset manifest via lance-core's dataset.dropIndex(name) API. Physical index files are not deleted; they are cleaned up by VACUUM during garbage collection.

The implementation follows the standard lance-spark SQL extension pipeline:

  1. ANTLR grammarALTER TABLE ... DROP INDEX indexName rule + DROP token
  2. AST buildervisitDropIndex in all version-specific builders (3.4, 3.5, 4.0, 4.1)
  3. Logical planLanceDropIndex(table, indexName) with output schema (index_name, status)
  4. Physical execLanceDropIndexExec calls dataset.dropIndex(indexName) on the driver
  5. StrategyLanceDropIndexLanceDropIndexExec mapping with indexName.toLowerCase for consistency with CREATE INDEX

Note: The logical plan and physical exec are named LanceDropIndex / LanceDropIndexExec (not DropIndex / DropIndexExec) to avoid classpath collisions with Spark's built-in classes of the same name in spark-catalyst and spark-sql, which have different constructor signatures.

Changes

  • LanceSqlExtensions.g4 — Added dropIndex grammar rule and DROP token
  • LanceSqlExtensionsAstBuilder.scala (3.4, 3.5, 4.0, 4.1) — Added visitDropIndex visitor
  • DropIndex.scala — New LanceDropIndex logical plan
  • DropIndexExec.scala — New LanceDropIndexExec physical execution node
  • LanceDataSourceV2Strategy.scala — Added LanceDropIndexLanceDropIndexExec dispatch
  • BaseAddIndexTest.java — Added testDropIndex and testDropIndexThenRecreate
  • test_lance_spark.py — Added test_drop_index and test_drop_index_then_recreate integration tests
  • drop-index.md — New documentation page

Test plan

Unit tests (BaseAddIndexTest.java)

  • testDropIndex — Creates index, drops it, verifies schema/output, confirms index is gone
  • testDropIndexThenRecreate — Full lifecycle: create → drop → recreate → verify query works
  • All 9 AddIndexTest tests pass (7 existing + 2 new) across all modules (3.4, 3.5, 4.0, 4.1)
  • Full test suite passes, 0 failures

Integration tests (test_lance_spark.py)

  • test_drop_index — Creates BTree index, drops it via SQL, verifies output schema (index_name, status), confirms SHOW INDEXES no longer lists it
  • test_drop_index_then_recreate — Full lifecycle: create → drop → recreate, verifies recreated index appears in SHOW INDEXES and queries still return correct results

@github-actions github-actions bot added the enhancement New feature or request label Mar 31, 2026
Copy link
Copy Markdown
Collaborator

@hamersaw hamersaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great thanks! Can you just add an integration test?

Add two integration tests to TestDDLIndex:
- test_drop_index: creates index, drops it, verifies output schema and
  that SHOW INDEXES no longer lists it
- test_drop_index_then_recreate: full lifecycle create -> drop -> recreate,
  verifies the recreated index works with queries
Copy link
Copy Markdown
Contributor Author

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! Added two integration tests in docker/tests/test_lance_spark.py under TestDDLIndex:

  • test_drop_index — creates a BTree index, drops it via ALTER TABLE ... DROP INDEX, verifies the output schema (index_name, status), and confirms the index no longer appears in SHOW INDEXES.
  • test_drop_index_then_recreate — full lifecycle: create → drop → recreate, then verifies the recreated index exists in SHOW INDEXES and queries still work correctly.

SHOW INDEXES output schema uses column 'name', not 'index_name'.
The DROP INDEX output uses 'index_name' — these are different schemas.
@LuciferYang
Copy link
Copy Markdown
Contributor Author

need to further investigate the Docker testing

Spark's spark-catalyst JAR contains its own
org.apache.spark.sql.catalyst.plans.logical.DropIndex with a
3-parameter constructor (LogicalPlan, String, boolean). At runtime
Spark's class takes precedence on the classpath, causing
NoSuchMethodError when the AST builder tries to call lance-spark's
2-parameter constructor.

Rename to LanceDropIndex / LanceDropIndexOutputType to avoid the
collision, consistent with how other lance-spark classes (AddIndex,
ShowIndexes, etc.) already use names that don't conflict with Spark
built-ins.
…llision

Spark's spark-sql JAR also contains DropIndexExec in the same package
(org.apache.spark.sql.execution.datasources.v2) with a different
constructor signature. Rename to LanceDropIndexExec for the same reason
as the LanceDropIndex rename.
@LuciferYang
Copy link
Copy Markdown
Contributor Author

all test passed

Copy link
Copy Markdown
Collaborator

@hamersaw hamersaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great add! Thanks.

@hamersaw hamersaw merged commit ca54fa9 into lance-format:main Apr 2, 2026
18 checks passed
@LuciferYang
Copy link
Copy Markdown
Contributor Author

thanks @hamersaw

@LuciferYang
Copy link
Copy Markdown
Contributor Author

By the way, I would like to mention that I will be on a 4-day holiday. During this period, I may not respond promptly to code fix suggestions.

@hamersaw
Copy link
Copy Markdown
Collaborator

hamersaw commented Apr 2, 2026

By the way, I would like to mention that I will be on a 4-day holiday. During this period, I may not respond promptly to code fix suggestions.

Enjoy the break!

@hamersaw hamersaw mentioned this pull request Apr 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants