Skip to content

[Python] Expose RecordBatchFileReader::CountRows in Python #49305

@adrien-grl

Description

@adrien-grl

Describe the enhancement requested

Problem / Motivation

In PyArrow, pyarrow.ipc.RecordBatchFileReader the only way I found to gather the total number of rows of contained in a Feather file is to do something along the lines of:

num_rows = sum(reader.get_batch(i).num_rows for i in range(reader.num_record_batches))

This is not very efficient when it seems you can directly count the rows using the metadata (as in RecordBatchFileReader::CountRows) (if I understand the code correctly?)

The current way of doing is intractable when reading from remote file systems.

Proposed solution

Expose RecordBatchFileReader::CountRows in Python.

References

virtual Result<int64_t> CountRows() = 0;

Thank you for reading my suggestion and all the amazing work!!

Component(s)

Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions