fix: accept ISO 8601 'T' separator for naive datetimes#8
Merged
Conversation
`parse_with_preference("2020-01-15T08:00:00", _)` returned Err because
ymd_hms's regex required `\s+` between the date and time, and rfc3339
requires an offset suffix — so the T+no-tz intersection fell into the
gap between them. This is the form Python's `datetime.isoformat()`
emits without `.astimezone()`, so downstream consumers (qsv
`stats --infer-dates` and everything that depends on its type
inference) were misclassifying these columns as String.
Extend ymd_hms to accept either separator: regex changed from `\s+` to
`(?:T|\s+)`, then dispatch the chrono format-string family on
`input.as_bytes()[10]` so the trial-parse chain stays the same length
for the common space-separated case (no perf regression on the hot
path).
Tests:
- `tests::ymd_hms` extended with T-separator cases (no-seconds,
seconds, ms/us/ns fractional) and a T-vs-space equivalence check.
- New `tests::parse_iso_t_no_tz` integration test exercising the
public `parse_with_preference` API, with a regression guard that
existing tz-bearing T-forms (`Z`, `+00:00`) still parse.
README's accepted-formats list updated to document the T-separator
form.
T-separator with named timezone (e.g. `2020-01-15T08:00:00 UTC`) is
still rejected by `ymd_hms_z`; that's a sibling change that can be
done separately if needed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace `(?:T|\s+)` with `[T\s]+`. Functionally equivalent for our purposes — any extra inputs the character class accepts at the regex layer (`TT`, `T `, ` T`) get rejected by the chrono format strings anyway — but compiles to a tighter DFA loop than an alternation. Bench-compare against the 0.14.0 baseline shows every codepath that actually touches `ymd_hms` is now stable to improved across three runs: - `2021-04-30 21:14:10` (ymd_hms success): −1.4% - `2017-11-25 13:31:15 PST` (ymd_hms_z): −3.0% - `2019-11-29 08:08:05-08` (ymd_hms_z): −3.1% - `2021-02-21 PST` (ymd_z): −0.5% `memory_usage` dropped from a borderline +3.2% with the alternation form to a noise-band +1.1%. The remaining persistent regressions in the bench (`08/21/71`, `03/19/2012 10:11:59`) are on slash_* codepaths that never enter `ymd_hms`; they appear identically across all three runs regardless of the regex form here, indicating they're system-noise / icache effects unrelated to this change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The first assertion called `super::parse("31/3/22").unwrap().date()`
and compared against `Utc.ymd(2022, 3, 31)`. `parse()` uses Local
timezone and pads date-only inputs with the current time of day
(`Utc::now().time()`); the resulting UTC date can roll by ±1 day
depending on host TZ and the moment the test runs. The other two
assertions in this test use `parse_with_preference` (Utc + midnight)
and are unaffected.
Reproduction: on a host in PST during the late-evening local window
(early-morning UTC), the assembled `2022-03-31 ${local-time} PST`
converts to `2022-04-01 ${early}Z` and `.date()` returns
`2022-04-01Z`, which the assertion rejects.
Fix: compare the *Local* date (what `parse()` actually models for a
date-only input) by chaining `.with_timezone(&Local).date()` and
comparing against `Local.ymd(2022, 3, 31)`. The library's behavior
is unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
parse_with_preference(\"2020-01-15T08:00:00\", _)and other T-separated naive datetimes (with or without fractional seconds) returnedErr. The space-separated form and tz-bearing T-forms (Z,+00:00) already worked — only the T + no-timezone intersection was broken.ymd_hms's regex required\s+between date and time, andrfc3339requires an offset suffix, so T+no-tz fell into the gap between them.ymd_hms's regex to(?:T|\s+)and dispatch the chrono format-string family oninput.as_bytes()[10]. The trial-parse chain stays the same length for the common space-separated case — no extra cost on the hot path.This is the form Python's
datetime.isoformat()emits without.astimezone(), so consumers like qsvstats --infer-dateswere misclassifying these columns asString. Downstream qsv commands (synthesize,schema,describegpt) all depend on that type inference. Origin handoff: opened from a qsv 20.1.0 release-prep session; the qsv side already landed workaround commit4e143476bfor one specific failure mode, but the proper fix lives here.Test plan
cargo test --workspace --all-features— new tests pass; one pre-existingparse_unambiguous_dmyfailure also fails onmainwithout my changes (host-timezone/time-of-day-dependent; unrelated, worth a separate look).cargo fmt --all -- --checkclean.cargo clippy --workspace --tests --all-features -- -D warningsclean.tests::ymd_hmsextended with T-separator cases (no-seconds, seconds, ms/us/ns fractional) and a T-vs-space equivalence assertion.tests::parse_iso_t_no_tzexercises the publicparse_with_preferenceAPI and guards that existing tz-bearing T-forms (Z,+00:00) still parse./bench-compare compareto confirm no perf regression on the space-separated hot path (recommended before merge).Out of scope
T-separator with a named timezone (e.g.
2020-01-15T08:00:00 UTC) is still rejected — that requires a sibling change toymd_hms_z's pre-filter and regex. Easy follow-up if needed; flagged in the commit message.🤖 Generated with Claude Code