Skip to content

Initial migration of BETYdb to R package #12

Merged
dlebauer merged 68 commits intomainfrom
mvp-betydata
Apr 9, 2026
Merged

Initial migration of BETYdb to R package #12
dlebauer merged 68 commits intomainfrom
mvp-betydata

Conversation

@divine7022
Copy link
Copy Markdown
Collaborator

@divine7022 divine7022 commented Feb 11, 2026

Initial release of betydata

An R data package providing offline access to public data from BETYdb.

  • 16 datasets: traitsview (43,532 rows) + 15 reference tables
  • Multiple formats: .rda (lazy-loaded), Parquet, Frictionless datapackage.json
  • Filtered to public data only (access_level = 4, checked >= 0)
  • Complete roxygen2 documentation for all datasets
  • Package-level documentation with BETYdb context
  • Data quality policy in README (checked column, access levels)

Vignettes

  • orientation: Package overview and data relationships
  • sql-analogs: Migrate BETYdb SQL queries to dplyr
  • pfts-priors: Working with PFTs and Bayesian priors
  • manuscript: Reproduce LeBauer et al. (2018) analyses

Datasets

Dataset Description
traitsview Primary trait/yield observations (43,532 × 36)
species Plant taxonomy
sites Research site locations
variables Trait definitions and units
citations Literature references
pfts Plant functional types
priors Bayesian prior distributions
+ 9 more Support and relationship tables

Implements: #1, #2, #3, #4, #5, #6, #7, #8, #9, #10, #11

@divine7022 divine7022 requested a review from dlebauer February 11, 2026 21:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR delivers the initial release (v0.1.0) of betydata, an R data package providing offline access to public data from the BETYdb (Biofuel Ecophysiological Traits and Yields) database. The package enables reproducible analyses of plant traits and crop yields without requiring database connectivity.

Changes:

  • Complete R package structure with 16 datasets (traitsview + 15 support tables) totaling 43,532+ trait and yield records
  • Multiple data formats: lazy-loaded .rda files, Parquet alternatives, and Frictionless metadata (datapackage.json)
  • Comprehensive documentation: roxygen2 docs for all datasets, 4 vignettes (orientation, sql-analogs, pfts-priors, manuscript), and GitHub issue templates
  • Quality controls: excludes checked=-1 records, public data only (access_level >= 4), full test coverage
  • CI/CD infrastructure: GitHub Actions R-CMD-check workflow, testthat 3.0 test suite

Reviewed changes

Copilot reviewed 38 out of 71 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
DESCRIPTION Package metadata and dependencies; minor email format issue
CITATION.cff Citation metadata; email and missing preferred-citation issues
LICENSE BSD-3-Clause license file
README.md Comprehensive package documentation; table formatting issue
NEWS.md Release notes documenting v0.1.0
R/betydata-package.R Package-level documentation
R/data.R Roxygen2 documentation for all 16 datasets
man/*.Rd Generated documentation files for datasets
vignettes/*.Rmd Four tutorial vignettes; minor issues in manuscript.Rmd and pfts-priors.Rmd
tests/testthat/*.R Test suite for data and metadata validation; deprecated context() calls
data-raw/make-data.R Data build script for generating .rda and Parquet files
inst/metadata/datapackage.json Frictionless Data package metadata
inst/extdata/parquet/*.parquet Sample Parquet data files
data/*.rda Binary R data files (compressed with xz)
.github/workflows/*.yaml GitHub Actions CI configuration
.github/ISSUE_TEMPLATE/*.md Issue templates for data corrections and verifications
.gitignore, .Rbuildignore Build and version control configuration; CSV exclusion concern
Comments suppressed due to low confidence (2)

tests/testthat/test-metadata.R:3

  • The context() function on line 3 is deprecated in testthat 3.0.0 and later. According to the DESCRIPTION file, this package uses testthat (>= 3.0.0) and has Config/testthat/edition: 3. The context() calls should be removed as they are no longer needed and will generate warnings.
    tests/testthat/test-data.R:3
  • The context() function on line 3 is deprecated in testthat 3.0.0 and later. According to the DESCRIPTION file, this package uses testthat (>= 3.0.0) and has Config/testthat/edition: 3. The context() calls should be removed as they are no longer needed and will generate warnings.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread vignettes/manuscript.Rmd Outdated
Comment thread CITATION.cff
Comment thread CITATION.cff Outdated
Comment thread DESCRIPTION Outdated
Comment thread README.md Outdated
Comment thread vignettes/pfts-priors.Rmd Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Member

@dlebauer dlebauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done a quick first review. On a future review I will go through all of the vignettes and explore the tables as they exist.

I am now wondering if we should 1) store the data in CSV files to allow text-based version control and 2) if we can reconstruct traitsview on the fly from the component datasets (i.e. traitsview should not be in data_raw)

Comment thread datapackage.json
Comment thread vignettes/orientation.Rmd Outdated
Comment thread vignettes/orientation.Rmd Outdated
Comment thread vignettes/orientation.Rmd Outdated
Comment thread vignettes/orientation.Rmd Outdated
Comment thread README.md Outdated
Comment thread README.md
Comment thread README.md Outdated
Comment thread README.md Outdated
Comment thread README.md Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 50 out of 79 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread data-raw/make-data.R
Comment thread data-raw/make-data.R
Copilot AI changed the title betydata R data package with BETYdb public data export Move datapackage.json to repo root (Frictionless spec) Apr 8, 2026
Copilot AI requested a review from dlebauer April 8, 2026 21:49
@dlebauer
Copy link
Copy Markdown
Member

dlebauer commented Apr 8, 2026

@divine7022 please take a look at the most recent change moving datapackage.json to root, and then ready to merge. c1e3c90

@divine7022
Copy link
Copy Markdown
Collaborator Author

@dlebauer we are good to go now, once checks pass

@dlebauer dlebauer closed this Apr 9, 2026
@dlebauer dlebauer reopened this Apr 9, 2026
@dlebauer dlebauer merged commit 528739c into main Apr 9, 2026
10 checks passed
@dlebauer
Copy link
Copy Markdown
Member

dlebauer commented Apr 9, 2026

@copilot please restore original issue description

@dlebauer dlebauer changed the title Move datapackage.json to repo root (Frictionless spec) Initial migration of BETYdb to R package Apr 14, 2026
@dlebauer dlebauer deleted the mvp-betydata branch April 14, 2026 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment