Skip to content

[VL] Fix Iceberg writer aggregate-init build failure on macOS C++20 libc++#12115

Open
jackylee-ch wants to merge 1 commit into
apache:mainfrom
jackylee-ch:fix-iceberg-writer-aggregate-init
Open

[VL] Fix Iceberg writer aggregate-init build failure on macOS C++20 libc++#12115
jackylee-ch wants to merge 1 commit into
apache:mainfrom
jackylee-ch:fix-iceberg-writer-aggregate-init

Conversation

@jackylee-ch
Copy link
Copy Markdown
Contributor

What changes are proposed in this pull request?

The bug

Builds with --enable_enhanced_features=ON (which compiles the Iceberg
writer integration in cpp/velox/compute/iceberg/) fail to compile on
macOS Apple Clang 17 + libc++ with -std=c++20. The build aborts at
cpp/velox/compute/iceberg/IcebergWriter.cc.cpp.o (~156 / 217 in the
gluten cpp ninja graph) with:

error: no matching function for call to 'construct_at'
note: in instantiation of function template specialization
      'std::vector<IcebergPartitionSpec::Field>::emplace_back<...>'
candidate template ignored: substitution failure for construct_at
[with _Tp = IcebergPartitionSpec::Field, _Args = ...]:
no matching constructor for initialization of
'IcebergPartitionSpec::Field'

Root cause

IcebergWriter.cc constructs aggregate types with paren-init at
two call sites:

Line Call Target type Origin
262 return WriteStats(bytes, files, ioNs, wallNs) gluten::WriteStats cpp/velox/compute/iceberg/IcebergWriter.h
311 fields.emplace_back(name, type, transform, parameter) IcebergPartitionSpec::Field (Velox) velox/connectors/hive/iceberg/PartitionSpec.h

Both target types are pure aggregate structs with no user-defined
constructor
:

// Velox
struct IcebergPartitionSpec::Field {
  std::string name;
  TypePtr type;
  TransformType transformType;
  std::optional<int32_t> parameter;
};

// Gluten
struct WriteStats {
  uint64_t numWrittenBytes{0};
  uint32_t numWrittenFiles{0};
  uint64_t writeIOTimeNs{0};
  uint64_t writeWallNs{0};
  ...
};

C++20 P0960R3
made aggregates initializable via paren-init at the language level, but
standard library implementations of std::construct_at (which
std::vector::emplace_back, std::make_shared, std::optional::emplace
all route through for in-place construction) have not all caught up:

Toolchain std::construct_at for aggregates Build result
Linux GCC + libstdc++ Paren-init accepted (P0960 implemented) ✅ compiles
macOS Apple Clang 17 + libc++ Strict — only real constructors accepted ❌ substitution failure

So the same code that passes on Linux blocks the macOS build with
--enable_enhanced_features=ON.

The fix

Two minimal edits — switch paren-init to brace-init at the two call
sites:

- fields.emplace_back(name, type, transform, parameter);
+ fields.push_back({name, type, transform, parameter});

- return WriteStats(numWrittenBytes, numWrittenFiles, ioNs, wallNs);
+ return WriteStats{numWrittenBytes, numWrittenFiles, ioNs, wallNs};

Brace-init invokes aggregate initialization, which is supported on
every C++17/20 standard library. It does not depend on
std::construct_at accepting paren-init for aggregates, so the
result is portable.

Why this matters

  1. Build blocker on macOS in enhanced mode. Anyone running
    dev/builddeps-veloxbe.sh --enable_enhanced_features=ON on macOS
    today hits a hard compile error before any binary is produced.
  2. Naturally complements the macOS unblockers. Once the existing
    macOS build fixes land (gflags load-time abort, Arrow vendored
    datetime, /usr/local isolation), enhanced mode is the next thing
    macOS users will try — this PR removes the last compile-time
    blocker on that path.
  3. Zero runtime impact. Brace-init and paren-init for aggregates
    produce identical field-by-field initialization on all
    toolchains. Linux libstdc++ still compiles cleanly. The change is
    purely build portability.
  4. Style consistency. The rest of the iceberg/ directory and most
    of cpp/velox/ already uses brace-init for aggregate construction.
    These two paren-init call sites are the outliers.
  5. Tiny diff — 2 hunks, 6 lines (3 added, 3 removed).

How was this patch tested?

  • macOS 14 arm64 + Apple Clang 17 + libc++ + -std=c++20 +
    --enable_enhanced_features=ON:
    • Before fix: build aborts at IcebergWriter.cc.cpp.o
      (~156 / 217 ninja steps) with the construct_at substitution
      failure.
    • After fix: build reaches 217 / 217, including the new
      velox_iceberg_test executable produced under the enhanced-features
      gate.
  • macOS cpp full ctest matrix (17 binaries including
    velox_iceberg_test): 5539 / 5539 pass with this fix.
  • Linux x86_64 Ubuntu 22.04 build remains green; brace-init for
    aggregates is the same lowering as the prior paren-init under
    libstdc++'s P0960 implementation, so behavior is identical.

Was this patch authored or co-authored using generative AI tooling?

co-auth: Claude (Sonnet/Opus) via Claude Code 1.x

…0 libc++

Two call sites in cpp/velox/compute/iceberg/IcebergWriter.cc construct
aggregate types via paren-init with multiple arguments:

  fields.emplace_back(name, type, transform, parameter);   // line 311
  return WriteStats(numWrittenBytes,                       // line 262
                    numWrittenFiles, writeIOTimeNs,
                    writeWallNs);

Both target types are pure aggregate structs without a user-defined
constructor:

  // Velox: velox/connectors/hive/iceberg/PartitionSpec.h
  struct IcebergPartitionSpec::Field {
    std::string name;
    TypePtr type;
    TransformType transformType;
    std::optional<int32_t> parameter;
  };

  // Gluten: cpp/velox/compute/iceberg/IcebergWriter.h
  struct WriteStats {
    uint64_t numWrittenBytes{0};
    uint32_t numWrittenFiles{0};
    uint64_t writeIOTimeNs{0};
    uint64_t writeWallNs{0};
    ...
  };

C++20 P0960R3 made aggregates initializable via paren-init at the
language level, but the standard library implementation of
std::construct_at (which std::vector::emplace_back routes through to do
in-place construction) is uneven across stdlibs:

  - libstdc++ (Linux GCC): construct_at accepts aggregate paren-init,
    so emplace_back(a, b, c, d) compiles for aggregate types.
  - libc++ (Apple Clang 17, macOS): construct_at still requires a real
    constructor, so the substitution fails:

      error: no matching function for call to 'construct_at'
      note: candidate template ignored: substitution failure
      [with _Tp = IcebergPartitionSpec::Field, _Args = ...]:
      no matching constructor for initialization of
      'IcebergPartitionSpec::Field'

Result: builds with --enable_enhanced_features=ON on macOS abort
hard at IcebergWriter.cc.cpp.o (~156/217) before producing
libgluten.dylib, blocking any macOS user from running the Iceberg
writer integration that this gate adds.

The fix is a minimal style change at the two call sites — switch
from paren-init to brace-init, which is aggregate initialization on
every C++17/20 toolchain (libstdc++, libc++, MSVC):

  fields.push_back({name, type, transform, parameter});
  return WriteStats{numWrittenBytes, numWrittenFiles, ...};

Linux libstdc++ continues to compile cleanly: brace-init resolves to
the same aggregate initialization the paren-init form selected via
P0960. Zero runtime change, identical field-by-field initialization.

Verification:
- macOS 14 arm64 + Apple Clang 17 + --enable_enhanced_features=ON:
  cpp build now reaches 217/217 (was 156/217 before this fix).
- cpp ctest matrix on the same configuration: 5539 / 5539 pass,
  including the enhanced-only velox_iceberg_test executable.
- Linux x86_64 build remains green; brace-init for aggregates is
  unchanged in libstdc++.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant