-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-48625: [Python] Add temporal unit checking in NumPyDtypeUnifier #48626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
135d52a to
3f3c322
Compare
AlenkaF
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
@rok, short look, in case you have time.
rok
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a minor nit.
| case NPY_FR_GENERIC: | ||
| return "generic"; | ||
| default: | ||
| return "unknown"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numpy has other time units. Could we perhaps print which was used in case this is hit instead of unknown? I don't know if this is practical, just asking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me check and update it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense. Here is the example that makes sense:
import pyarrow as pa
import numpy as np
# `ps` unsupported but `s` supported.
pa.array([np.datetime64('2020', 'ps'), np.datetime64('2020', 's')])Before:
pyarrow.lib.ArrowNotImplementedError: Unsupported datetime64 time unit
After:
pyarrow.lib.ArrowInvalid: Cannot mix NumPy datetime64 units ps and s
adf3e72 to
86aa893
Compare
86aa893 to
1067cab
Compare
Rationale for this change
This is to address a todo:
arrow/python/pyarrow/src/arrow/python/inference.cc
Line 258 in de6eb89
When users mix
numpy.datetime64values with different units (e.g.,datetime64[s]anddatetime64[ms]) in a single array, PyArrow previously produced a confusing error messageWhat changes are included in this PR?
NumPyDtypeUnifier::Observe_DATETIME()InvalidDatetimeUnitMix()methodNumPyDtypeUnifier::Observe()to check units for same-type comparisonstest_array_from_different_numpy_datetime_units_raisesAre these changes tested?
Manually tested, and unittests were added.
Are there any user-facing changes?
Yes. It produces a better error message. For example,
Before:
After: