Skip to content

Report assumptions#606

Open
DominiqueMakowski wants to merge 11 commits into
mainfrom
report_assumptions
Open

Report assumptions#606
DominiqueMakowski wants to merge 11 commits into
mainfrom
report_assumptions

Conversation

@DominiqueMakowski
Copy link
Copy Markdown
Member

Started integrating assumption reports in the report:

library(report)

mt2 <- rbind(
  mtcars[, c("mpg", "disp", "hp")],
  data.frame(mpg = c(37, 40), disp = c(300, 400), hp = c(110, 120))
)
model <- lm(disp ~ mpg + hp, data = mt2)
report(model)

We fitted a linear model (estimated using OLS) to predict disp with mpg and hp
(formula: disp ~ mpg + hp). The model's assumptions were checked: 2 influential
observations (5.88%) were detected (Cook's distance), heteroskedasticity was
detected (p = 0.029) and no collinearity was detected.
The model explains a
statistically significant and substantial proportion of variance (R2 = 0.54,
F(2, 31) = 18.40, p < .001, adj. R2 = 0.51). The model's intercept,
corresponding to mpg = 0 and hp = 0, is at 49.34 (95% CI [-148.35, 247.02],
t(31) = 0.51, p = 0.614). Within this model:

  • The effect of mpg is statistically non-significant and negative (beta =
    -0.30, 95% CI [-6.04, 5.45], t(31) = -0.10, p = 0.917; Std. beta = -0.02, 95%
    CI [-0.36, 0.32])
  • The effect of hp is statistically significant and positive (beta = 1.34, 95%
    CI [0.72, 1.97], t(31) = 4.36, p < .001; Std. beta = 0.72, 95% CI [0.39, 1.06])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald t-distribution approximation.

report(model, audience = "ai")
#> ## Model
#> - Call: lm
#> - Formula: disp ~ mpg + hp
#> - Family: gaussian
#> - N: 34
#> - Inference: 95% CI [Wald]
#> 
#> ## Variables
#> - disp: Mean = 237.74, SD = 124.07, range: [71.10, 472]
#> - mpg: Mean = 21.17, SD = 7.32, range: [10.40, 40]
#> - hp: Mean = 144.82, SD = 66.89, range: [52, 335]
#> 
#> ## Assumptions
#> - Influential Observations: 2/34 (5.88%) [Cook's distance, threshold = 0.806]
#> - Homoskedasticity: VIOLATED (Breusch-Pagan, p = 0.029)
#> - Collinearity: OK (all VIF < 5)
#> 
#> ## Parameters
#> |Parameter   | Coefficient|   SE |           95% CI | t(31)|     p |
#> |:-----------|-----------:|:-----|:-----------------|-----:|:------|
#> |(Intercept) |       49.34|96.93 |[-148.35, 247.02] |  0.51|0.614  |
#> |mpg         |       -0.30| 2.82 |[  -6.04,   5.45] | -0.10|0.917  |
#> |hp          |        1.34| 0.31 |[   0.72,   1.97] |  4.36|< .001 |
#> 
#> ## Performance
#> |AIC   | AICc |  BIC |  R2 | R2 (adj.)| RMSE |Sigma |
#> |:-----|:-----|:-----|:----|---------:|:-----|:-----|
#> |404.7 |406.1 |410.8 |0.54 |      0.51|82.65 |86.56 |
#> 
#> ## Highlights
#> - Significant effects (p < 0.05): hp

Created on 2026-05-25 with reprex v2.1.1

Copilot AI review requested due to automatic review settings May 25, 2026 10:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces assumption-check reporting into the report package, adding a new report_assumptions() helper and integrating its output into both the human narrative (report_text.lm()) and AI-optimized (audience = "ai") report flows.

Changes:

  • Added report_assumptions() to summarize influential observations, homoskedasticity, and collinearity (with human + AI output formats).
  • Integrated assumption summaries into report() outputs for lm-family models and into the AI report block generation.
  • Added a report_text()/report() method for performance::check_outliers() plus new test coverage and documentation/vignette updates.

Reviewed changes

Copilot reviewed 10 out of 16 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
vignettes/report_ai.Rmd Updates vignette to recommend report(x, audience = "ai") instead of report_ai().
tests/testthat/test-report_assumptions.R Adds unit tests for report_assumptions() and its integration into report() / AI output.
R/report.lm.R Adds assumptions argument and injects assumption summary sentence into report_text.lm().
R/report.check_outliers.R Adds reporting methods for check_outliers objects and helper method-name formatting.
R/report_assumptions.R Implements new exported assumption-reporting function with human + AI output.
R/report_ai.R Adds @exportS3Method tags and injects an ## Assumptions block into AI report output.
NEWS.md Adds changelog entries describing the new assumptions reporting and related fixes.
NAMESPACE Registers new S3 methods and exports report_assumptions (but stops exporting report_ai).
man/report.Rd Updates audience documentation to describe AI output as report_ai class markdown.
man/report.compare.loo.Rd Fixes Rd link target for brms::loo_compare.
man/report-package.Rd Updates package authors section (generated doc change).
man/report_text.check_outliers.Rd Adds generated docs for report_text.check_outliers() / report.check_outliers().
man/report_assumptions.Rd Adds generated docs for report_assumptions().
man/reexports.Rd Updates generated “reexports” help page formatting/links.
DESCRIPTION Updates Collate order and adds RoxygenNote.
man/report_ai.Rd Removes generated docs for report_ai() (no longer exported).
Files not reviewed (6)
  • man/reexports.Rd: Language not supported
  • man/report-package.Rd: Language not supported
  • man/report.Rd: Language not supported
  • man/report.compare.loo.Rd: Language not supported
  • man/report_assumptions.Rd: Language not supported
  • man/report_text.check_outliers.Rd: Language not supported

Comment thread DESCRIPTION
Comment on lines 117 to 121
'report.stanreg.R'
'report.brmsfit.R'
'report.character.R'
'report.check_outliers.R'
'report.compare.loo.R'
Comment thread NEWS.md Outdated
Comment thread NAMESPACE
Comment thread R/report_assumptions.R
Comment on lines +41 to +49
#' @export
report_assumptions <- function(
x,
...,
audience = getOption("report_audience", "humans")
) {
insight::check_if_installed("performance")
audience <- match.arg(audience, c("humans", "ai"))

Comment thread NEWS.md Outdated
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces report_assumptions(), a new function for summarizing model assumption checks like influential observations and heteroskedasticity, with support for both human and AI audiences. These checks are now integrated into linear model reports, and several S3 methods have been updated with proper export tags. Reviewers suggested improving code robustness by explicitly extracting p-values to avoid logical errors in conditional checks, ensuring that optional arguments are passed to underlying functions, and using datawizard::text_concatenate() for better natural language formatting of lists.

I am having trouble creating individual review comments. Click here to see my feedback.

R/report_assumptions.R (64-66)

high

Calling as.numeric() on the result of performance::check_heteroskedasticity() (which is typically a data frame) may fail or produce unexpected results. Furthermore, if multiple p-values are returned, homosked_ok will be a vector, which will cause a warning or error in if (homosked_ok) (lines 122 and 178) in recent R versions. It is safer to extract the p-value column explicitly and use all() to ensure a single logical value.

    p_val <- as.numeric(heterosk$p)
    p_fmt <- insight::format_p(p_val)
    homosked_ok <- all(p_val >= 0.05)

R/report.check_outliers.R (57)

medium

Using paste(..., collapse = " and ") can result in grammatically awkward strings when more than two methods are used (e.g., "A and B and C"). It is recommended to use datawizard::text_concatenate() which correctly handles lists of any length (e.g., "A, B, and C").

  method_str <- datawizard::text_concatenate(method_parts)

R/report.lm.R (760)

medium

The ... arguments are not passed to report_assumptions(). This prevents users from passing custom parameters (such as specific outlier detection methods or thresholds) through the main report() call to the underlying performance functions.

      report_assumptions(x, ...),

R/report_assumptions.R (228)

medium

Consider using datawizard::text_concatenate() here to ensure the list of terms is grammatically correct (e.g., using "and" for the last item instead of just a comma).

        datawizard::text_concatenate(collin_info$flagged_terms),

R/report_assumptions.R (266)

medium

Using paste(..., collapse = " and ") is suboptimal for lists with more than two items. datawizard::text_concatenate() is preferred for generating natural language lists in easystats reports.

  method_str <- datawizard::text_concatenate(method_labels)

DominiqueMakowski and others added 5 commits May 25, 2026 11:13
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@strengejacke
Copy link
Copy Markdown
Member

Nice! Do we want this to be the default, though? I'd say default is not including the assumptions, because usually you want to focus on results.

@DominiqueMakowski
Copy link
Copy Markdown
Member Author

I would have said so a couple of months ago, but given the now-mostly educational value of these reports (i.e. useful to show how to optimally phrase and formulate things) I'd be fine with keeping the default on, as it only adds a sentence. But I don't have a strong opinion either way

@mattansb @bwiernik ?

@mattansb
Copy link
Copy Markdown
Member

I like reporting assumptions, but I think that reporting the results of various "tests" of assumptions goes against the "visually checking" philosophy we've been encouraging with functions like check_model(). I understand that this is a limitation of the tool (unless we start shipping {report} with a build in image AI model...), but ... 🤷‍♂️

@strengejacke
Copy link
Copy Markdown
Member

Ironically, the assumptions-check probably works better for more complex models, where we rely on DHARMa.

@DominiqueMakowski
Copy link
Copy Markdown
Member Author

image

@bwiernik
Copy link
Copy Markdown
Contributor

I'd say we should not include by default, but having a report() method for our assumption check objects would be great

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants