add ca fertilization harmonization and ncc compost support#4002
add ca fertilization harmonization and ncc compost support#4002divine7022 wants to merge 75 commits into
Conversation
|
high level, priors are out now, two ref tables (ca_compost_amendment, ca_n_application_rate) stay as bundled data for now. and new ag/management package vs bety priors vs whatever is separate conv |
@divine7022 Could you please clarify? |
dlebauer
left a comment
There was a problem hiding this comment.
Thank you for these changes.
General changes requested:
Better documentation
(Make sure to review the data packaging guidance from the Data chapter of the "R-Packages" book)
Documentation should cover both the source (TSV) and package (RDA) data:
package datasets should be documented in roxygen, and should include
- definitions of each field
- information about where the data came from:
- references
- how the dataset was compiled, including assumptions
- link to more details in source data README.
Source datasets can be documented in a data-raw/README.md and should include:
- where data came from
- what decisions were made during curation
- note that for package datasets that do not have a manual curation step, the provenance of package data is documented by R scripts.
- how to update both the TSV and .rda files
dataset names
Make sure naming is consistent, e.g. create_n_rate_data.R reads a file n_fertilizaton.tsv and generates a dataset ca_n_application_rate. This can get confusing.
consider renaming compost.tsv to organic_ammendments.tsv to bettr reflect the diversity of materials represented in the dataset.
| @@ -0,0 +1,33 @@ | |||
| Material C_MIN (C:N) C_MAX (C:N) C_Avg (C:N) C_Assumed (%) Total N (%) 4 week PAN (%) LowerN/HigherN RowsMIN_AppRate (tons/acre) RowsMIN_AppRate (lbs/acre) RowsMIN_Total_N (lbs N/acre) RowsMIN_Avail_N (lbs N/acre) RowsMIN_Total_C (lbs C/acre) RowsMAX_AppRate (tons/acre) RowsMAX_AppRate (lbs/acre) RowsMAX_Total_N (lbs N/acre) RowsMAX_Avail_N (lbs N/acre) RowsMAX_Total_C (lbs C/acre) TreesMIN_AppRate (tons/acre) TreesMIN_AppRate (lbs/acre) TreesMIN_Total_N (lbs N/acre) TreesMIN_Avail_N (lbs N/acre) TreesMIN_Total_C (lbs C/acre) TreesMAX_AppRate (tons/acre) TreesMAX_AppRate (lbs/acre) TreesMAX_Total_N (lbs N/acre) TreesMAX_Avail_N (lbs N/acre) TreesMAX_Total_C (lbs C/acre) Source | |||
There was a problem hiding this comment.
This table combines stoichiometric properties with assumed application-rate
scenarios.
- would it make more sense to separate material properties from application-rates?
- See more general requests about dataset documentation in my general review comments. Some specific questions are below
- what columns come from the source, and what information was added during manual curation?
- e.g. are the
Rows*andTrees*application-rate values directly supported by
the cited sources, or did sources provide more granular, species level rates?
- e.g. are the
- Why does the build script expose only the
Rows*values in
ca_compost_amendment? - Is the intended distinction that synthetic N rates are crop-specific, while
compost/organic amendment rates are only available at a broader crop-structure
level? - Given the contents include manure, blood meal, paper, wood chips, straw, and
other organic materials, wouldorganic_amendments.tsvor
compost_amendments.tsvbe a clearer filename thancompost.tsv?
| # map each raw material name to one of the CalRecycle classes | ||
| # (14 CCR section 17852). biosolids is empty in the current table. | ||
| material_to_class <- function(m) { | ||
| s <- tolower(m) |
There was a problem hiding this comment.
where does m come from? I don't see it in compost.tsv
| app_rate_min = .data$`RowsMIN_AppRate (lbs/acre)`, | ||
| app_rate_max = .data$`RowsMAX_AppRate (lbs/acre)`, |
There was a problem hiding this comment.
why are only Rows[MIN|MAX] and not Trees[MIN|MAX] used here? (see also general question about documentation)
| @@ -0,0 +1,90 @@ | |||
| PFT Group Crop PlantStage Season MINN MAXN Unit Source Notes | |||
There was a problem hiding this comment.
Do crop-names come from a controlled vocabulary? Any reason not to use the LandIQ names here?
Where do PFTs come from? On one hand, row/woody is a simple designation, and there may be reasons to include a grouping in this dataset. But caution should be used and rationale/motivation documented because PFT groupings are opinionated, and in this case alfalfa is not typically considered a 'row' crop (it is forage).
There was a problem hiding this comment.
To unblock #4003, which currently depends on this PR, you could
- copy the .rda and .R files from this PR to the
workflows/[fertilization|ncc]-statewidedirectories - update #4003 to use those resources
This is in line with one of the motivations for the workflows directory - as a place to maintain and develop workflows until individual components can be generalized and extracted into PEcAn package functions and data
Description
folds standalone N fertilization harmonization (was living at /projectnb/dietzelab/ccmmf/management/fertilization/harmonization.R) into data.land data-raw pipeline, then layers ncc (compost) sampling support on top and extends the events schema with an ncc event type
events_schema_v0.1.2.json picks up ncc as an allowed event_type with material, ncc_subtype, fert_subtype, and pft properties
separate workflow PR will consume these samplers from workflows/fertilization-statewide and workflows/ncc-statewide to emit ensemble events
cc @sarahkanee @mdietze @infotroph @dlebauer
Motivation and Context
Review Time Estimate
Types of changes
Checklist: