Skip to content

Function to parse restart dates out of event files#3828

Open
infotroph wants to merge 3 commits intoPecanProject:developfrom
infotroph:sipnet-hop
Open

Function to parse restart dates out of event files#3828
infotroph wants to merge 3 commits intoPecanProject:developfrom
infotroph:sipnet-hop

Conversation

@infotroph
Copy link
Copy Markdown
Member

Description

Single-PFTs models like Sipnet need to reinitialize their parameterization when changing the plant type. Here's a function to get the list of dates and plant types from a (possibly multisite) events JSON.

Discussion needed: This version assumes the crop type is specified in each planting event, but this is not enforced by the existing events schema. Should we update the schema or have this function ignore all planting events with no crop type specified (even if this means returning an empty list)?

Motivation and Context

Review Time Estimate

  • Immediately
  • Within one week
  • When possible

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • My name is in the list of CITATION.cff
  • I agree that PEcAn Project may distribute my contribution under any or all of
    • the same license as the existing code,
    • and/or the BSD 3-clause license.
  • I have updated the CHANGELOG.md.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@infotroph infotroph changed the title New function to parse restart dates out of event files Function to parse restart dates out of event files Feb 23, 2026
@dlebauer
Copy link
Copy Markdown
Member

I agree that the events schema should be updated. It makes sense for each planting event to be required to specify the associated crop or pft.

Copy link
Copy Markdown
Member

@dlebauer dlebauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Yes, it makes sense to require a crop identifier.

Before merging, please:

  • update or create a new schema that requires crop identifier
  • create example and ensure it validates
  • enable currently skipped test
  • either:
    • pull out helper functions for events.json <--> table
    • create an issue, or ask me to do so

#' events_to_crop_cycle_starts(evts)
#' }
events_to_crop_cycle_starts <- function(event_json) {
jsonlite::read_json(event_json) |>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it would be helpful to have helper functions that convert events.json to and from tables, e.g. events_json_to_table()
events_table_to_json(). Please either implement or convert this comment to an issue.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where else do you expect to want to use these, and for what fraction of event usage? If we'll ~always want to process events in table format, then maybe they should be stored that way instead of unnesting from JSON all the time.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each event type is already being generated in a "tidy" tabular format, so staying tabular is easier and more compact for us. But the crux of the problem comes when you have to interleave different event types chronologically as each event type has different variables associated with it. We solved that in SIPNET by not having column headers -- you just have to know from the metadata the position of the different variables in each row. That remains an option here, but we'll loose the ability for the dataframe to play nice with lots of R tools (e.g., tidyverse). Other options are a wide format (all possible event column names, most irrelevant for most events) or a long format (e.g., datetime, site, variable, value) which will result in each event taking up multiple rows.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds to me like we don't yet have a single target table format, so I think it's premature to try to write the helper for it in this PR. It seems very plausible that the format will be context-specific: Here I unnested first to get a wide (and sparse!) table and it was fine, but in other cases we might want to filter the still-nested events by event type/crop/etc so that we can unnest to a form with fewer NAs.

@infotroph
Copy link
Copy Markdown
Member Author

schema update is in #3836

Comment thread modules/data.land/tests/testthat/test-events_to_crop_cycle_starts.R Outdated
@infotroph
Copy link
Copy Markdown
Member Author

@dlebauer Can you elaborate on "create example and ensure it validates"? I don't follow -- remember this function consumes JSON, not produces it.

Comment on lines +49 to +51
dplyr::group_by(.data$site_id, .data$crop_cycle_id) |>
dplyr::slice_min(.data$date) |>
dplyr::select("site_id", "date", "crop")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two quick notes (from trying these changes in my restart work):

  1. This will return a grouped data frame because we never ungroup. I suggest either adding an ungroup call to the end of the pipe or (my personal preference) replacing the group_by with the by argument to slice_min to only group that one operation.
    dplyr::slice_min(.data$date, by = c("site_id", "crop_cycle_id")) |>
  2. I think in Update events schema: require crop id when planting, remove pft from site #3836 we will start to require crop_code, right? I'm not sure what the relationship between crop and crop_code is, but assuming we'll be using crop_code instead of crop, this function will need to be adjusted accordingly. Since the two PRs are closely related, I might suggest sequencing this one after Update events schema: require crop id when planting, remove pft from site #3836 and future-proofing this to start using crop_code right away.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Good call!
  2. Yep, this is now waiting on schema finalization in Update events schema: require crop id when planting, remove pft from site #3836. I'll mark it as waiting.

@infotroph infotroph added the status:blocked Waiting for another PR/issue (say which one in comments) label Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

modules status:blocked Waiting for another PR/issue (say which one in comments) tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants