Skip to content

Conversation

@xbattlax
Copy link
Contributor

Summary

Implement 10 new metadata table types for the inspect module, addressing the metadata tables epic #823.

Changes

  • Add crates/iceberg/src/inspect/refs.rs:

    • RefsTable showing all branches and tags with retention policies
    • Reads from TableMetadata.refs HashMap
  • Add crates/iceberg/src/inspect/history.rs:

    • HistoryTable showing snapshot log with ancestry tracking
    • Computes is_current_ancestor by walking parent chain from current snapshot
  • Add crates/iceberg/src/inspect/metadata_log_entries.rs:

    • MetadataLogEntriesTable showing metadata file history
    • Appends current metadata entry with snapshot/schema/sequence info
  • Add crates/iceberg/src/inspect/files.rs with shared infrastructure:

    • files_schema() building dynamic schema with partition struct
    • scan_files() for current snapshot manifest scanning
    • scan_all_files() for all-snapshot scanning with manifest path deduplication
    • ContentFilter enum for data-only, deletes-only, or all content
    • FilesTable, DataFilesTable, DeleteFilesTable as thin wrappers
  • Add crates/iceberg/src/inspect/all_manifests.rs:

    • AllManifestsTable iterating manifests across all snapshots
    • Adds reference_snapshot_id field to track which snapshot references each manifest
  • Add crates/iceberg/src/inspect/all_files.rs:

    • AllFilesTable, AllDataFilesTable, AllDeleteFilesTable
    • Thin wrappers around scan_all_files() with different content filters
  • Update crates/iceberg/src/inspect/metadata_table.rs:

    • Add AllManifests, AllFiles, AllDataFiles, AllDeleteFiles, History, MetadataLogEntries, Refs, Files, DataFiles, DeleteFiles to MetadataTableType enum
    • Add accessor methods on MetadataTable
  • Update crates/integrations/datafusion/src/table/metadata_table.rs:

    • Add match arms for all new MetadataTableType variants in schema and scan
  • Update crates/iceberg/src/scan/mod.rs:

    • Add setup_all_snapshot_manifest_files() test helper for all-snapshot tests

Notes

This matches the Java Iceberg implementation's metadata table types:

  • Group A (metadata-only, no I/O): refs, history, metadata_log_entries
  • Group B (file-level, shared infra): files, data_files, delete_files
  • Group C (all-snapshot variants): all_manifests, all_files, all_data_files, all_delete_files

Remaining tables (all_entries, partitions, position_deletes) are blocked or deferred:

Part of #823

@xbattlax xbattlax force-pushed the feat/metadata-tables-core branch 2 times, most recently from 1e30982 to 1cbfcdf Compare January 29, 2026 19:16
…entries, files, all_manifests, all_files)

Implement 10 new metadata table types for the inspect module:
- Group A (metadata-only): RefsTable, HistoryTable, MetadataLogEntriesTable
- Group B (file-level): FilesTable, DataFilesTable, DeleteFilesTable
- Group C (all-snapshot): AllManifestsTable, AllFilesTable, AllDataFilesTable, AllDeleteFilesTable

Shared infrastructure in files.rs handles dynamic partition structs,
content filtering, and manifest deduplication across snapshots.
DataFusion integration updated with all new variants.

Part of apache#823.
@xbattlax xbattlax force-pushed the feat/metadata-tables-core branch from 1cbfcdf to e1107a5 Compare January 29, 2026 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant