Skip to content

feat: Add manifests and files system tables#309

Open
suxiaogang223 wants to merge 10 commits into
alibaba:mainfrom
suxiaogang223:codex/system-table-files-manifests-pr4
Open

feat: Add manifests and files system tables#309
suxiaogang223 wants to merge 10 commits into
alibaba:mainfrom
suxiaogang223:codex/system-table-files-manifests-pr4

Conversation

@suxiaogang223
Copy link
Copy Markdown
Contributor

@suxiaogang223 suxiaogang223 commented May 27, 2026

Background

This PR is part of #141 and continues the system table work after the previously merged options, audit_log, binlog, and pure metadata system table support.

The scope of this PR is manifest/file-backed table-level metadata system tables. It adds read-only support for table$manifests and table$files, so users can inspect latest snapshot manifest and data file metadata through the existing system table query path.

Architecture

  • Keep manifests and files on the existing InMemorySystemTable execution path.

    • Both tables materialize latest-snapshot metadata into GenericRow values.
    • The existing in-memory system table scan/read path handles singleton split planning and Arrow export.
  • Split system table implementations into Java-aligned per-table files.

    • OptionsSystemTable, SnapshotsSystemTable, SchemasSystemTable, TagsSystemTable, BranchesSystemTable, ConsumersSystemTable, ManifestsSystemTable, and FilesSystemTable now live in separate source/header pairs.
    • The previous metadata_system_tables.{h,cpp} aggregation file is removed.
  • Introduce SystemTableUtils for shared system table helpers.

    • It owns shared context creation, option merging, branch loading, timestamp/value conversion, JSON conversion, manifest reading, data file merging, and metadata formatting helpers.
    • This keeps concrete system table classes focused on schema declaration and row materialization.
  • Keep SystemTableLoader registry behavior unchanged.

    • The loader still creates system tables through the registry/factory table.
    • It now includes concrete system table headers directly after the file split.

System Tables Added

This PR follows the Apache Paimon table-scoped system table model, where table metadata is queried with names like table$files. Reference semantics: https://paimon.apache.org/docs/master/concepts/system-tables/

  • table$manifests

    • Purpose: exposes data manifest metadata for the latest snapshot.
    • Source: latest snapshot data manifests through SnapshotManager, ManifestList, and ManifestFileMeta.
    • Columns: file_name, file_size, num_added_files, num_deleted_files, schema_id, min_partition_stats, max_partition_stats, min_row_id, max_row_id.
    • Note: reads data manifests only in this PR.
  • table$files

    • Purpose: exposes data file metadata for the latest snapshot.
    • Source: latest snapshot data manifest entries merged with FileEntry::MergeEntries.
    • Columns: partition, bucket, file_path, file_format, schema_id, level, record_count, file_size_in_bytes, min_key, max_key, null_value_counts, min_value_stats, max_value_stats, min_sequence_number, max_sequence_number, creation_time, deleteRowCount, file_source, first_row_id, write_cols.
    • Note: file_path uses external path when present, otherwise the table bucket path plus file name.

Scope Notes

  • This PR reads the latest snapshot only; scan.snapshot-id, tag, and timestamp time-travel options are left for follow-up work.
  • Predicate pushdown for manifest/file metadata is not implemented in this PR.
  • Existing options, audit_log, binlog, snapshots, schemas, tags, branches, and consumers behavior is preserved.
  • Remaining metadata system tables such as partitions, buckets, statistics, table_indexes, and aggregation_fields are planned for later PRs.

Tests

  • Local: pre-commit run --all-files
  • Fedora: cmake --build build-codex-metadata-pr3 --target paimon-core-test paimon-read-inte-test -j2
  • Fedora: ./build-codex-metadata-pr3/debug/paimon-core-test --gtest_filter=FileSystemCatalogTest.TestMetadataSystemTableCatalog
  • Fedora: ./build-codex-metadata-pr3/debug/paimon-read-inte-test --gtest_filter=SystemTableReadInteTest.TestReadMetadataSystemTables:SystemTableReadInteTest.TestReadFilesSystemTableForPartitionedTable:SystemTableReadInteTest.TestReadManifestAndFilesSystemTablesForEmptyTable

@suxiaogang223 suxiaogang223 changed the title [C++] Add manifests and files system tables feat: Add manifests and files system tables May 27, 2026
@suxiaogang223 suxiaogang223 force-pushed the codex/system-table-files-manifests-pr4 branch from 52f584d to cf6e2f1 Compare May 27, 2026 13:04
@suxiaogang223 suxiaogang223 force-pushed the codex/system-table-files-manifests-pr4 branch from 9ff3a86 to ff9be72 Compare May 28, 2026 03:26
Comment thread src/paimon/core/table/system/metadata_system_tables.cpp Outdated
Comment thread src/paimon/core/table/system/metadata_system_tables.cpp
Comment thread src/paimon/core/table/system/metadata_system_tables.cpp Outdated
Comment thread src/paimon/core/table/system/metadata_system_tables.cpp Outdated
Comment thread src/paimon/core/table/system/metadata_system_tables.cpp Outdated
Comment thread src/paimon/core/table/system/metadata_system_tables.cpp Outdated
Comment thread src/paimon/core/table/system/metadata_system_tables.cpp Outdated
Comment thread test/inte/read_inte_test.cpp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants