Function to parse restart dates out of event files#3828
Function to parse restart dates out of event files#3828infotroph wants to merge 3 commits intoPecanProject:developfrom
Conversation
|
I agree that the events schema should be updated. It makes sense for each planting event to be required to specify the associated crop or pft. |
dlebauer
left a comment
There was a problem hiding this comment.
This looks good. Yes, it makes sense to require a crop identifier.
Before merging, please:
- update or create a new schema that requires crop identifier
- create example and ensure it validates
- enable currently skipped test
- either:
- pull out helper functions for events.json <--> table
- create an issue, or ask me to do so
| #' events_to_crop_cycle_starts(evts) | ||
| #' } | ||
| events_to_crop_cycle_starts <- function(event_json) { | ||
| jsonlite::read_json(event_json) |> |
There was a problem hiding this comment.
I think that it would be helpful to have helper functions that convert events.json to and from tables, e.g. events_json_to_table()
events_table_to_json(). Please either implement or convert this comment to an issue.
There was a problem hiding this comment.
Where else do you expect to want to use these, and for what fraction of event usage? If we'll ~always want to process events in table format, then maybe they should be stored that way instead of unnesting from JSON all the time.
There was a problem hiding this comment.
Each event type is already being generated in a "tidy" tabular format, so staying tabular is easier and more compact for us. But the crux of the problem comes when you have to interleave different event types chronologically as each event type has different variables associated with it. We solved that in SIPNET by not having column headers -- you just have to know from the metadata the position of the different variables in each row. That remains an option here, but we'll loose the ability for the dataframe to play nice with lots of R tools (e.g., tidyverse). Other options are a wide format (all possible event column names, most irrelevant for most events) or a long format (e.g., datetime, site, variable, value) which will result in each event taking up multiple rows.
There was a problem hiding this comment.
This sounds to me like we don't yet have a single target table format, so I think it's premature to try to write the helper for it in this PR. It seems very plausible that the format will be context-specific: Here I unnested first to get a wide (and sparse!) table and it was fine, but in other cases we might want to filter the still-nested events by event type/crop/etc so that we can unnest to a form with fewer NAs.
|
schema update is in #3836 |
|
@dlebauer Can you elaborate on "create example and ensure it validates"? I don't follow -- remember this function consumes JSON, not produces it. |
| dplyr::group_by(.data$site_id, .data$crop_cycle_id) |> | ||
| dplyr::slice_min(.data$date) |> | ||
| dplyr::select("site_id", "date", "crop") |
There was a problem hiding this comment.
Two quick notes (from trying these changes in my restart work):
- This will return a grouped data frame because we never
ungroup. I suggest either adding anungroupcall to the end of the pipe or (my personal preference) replacing thegroup_bywith thebyargument toslice_minto only group that one operation.dplyr::slice_min(.data$date, by = c("site_id", "crop_cycle_id")) |>
- I think in Update events schema: require crop id when planting, remove pft from site #3836 we will start to require
crop_code, right? I'm not sure what the relationship betweencropandcrop_codeis, but assuming we'll be usingcrop_codeinstead ofcrop, this function will need to be adjusted accordingly. Since the two PRs are closely related, I might suggest sequencing this one after Update events schema: require crop id when planting, remove pft from site #3836 and future-proofing this to start usingcrop_coderight away.
There was a problem hiding this comment.
- Good call!
- Yep, this is now waiting on schema finalization in Update events schema: require crop id when planting, remove pft from site #3836. I'll mark it as waiting.
Description
Single-PFTs models like Sipnet need to reinitialize their parameterization when changing the plant type. Here's a function to get the list of dates and plant types from a (possibly multisite) events JSON.
Discussion needed: This version assumes the crop type is specified in each planting event, but this is not enforced by the existing events schema. Should we update the schema or have this function ignore all planting events with no crop type specified (even if this means returning an empty list)?
Motivation and Context
Review Time Estimate
Types of changes
Checklist: