Part 1: isolate unified platform base#264
Conversation
…vious version the parametric algorithms were not actually supported on the same lelevel as the non-parametric ones. I redesigned the crate with solving this issue in mind. The structs are much clearer now, the dispach is smarter. On the negative side some elements might be overcomplicated as I am not 100% sure on how much flexibility some aspects will need, the current API might seem over-flexible. Will be improved over time
|
| Branch | split/unified-platform-base |
| Testbed | mhovd-pgx |
Click to view all benchmark results
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| bimodal_ke_npag | 📈 view plot 🚷 view threshold | 4,898.30 ms(+0.06%)Baseline: 4,895.25 ms | 5,086.30 ms (96.30%) |
| bimodal_ke_npod | 📈 view plot 🚷 view threshold | 1,447.50 ms(+0.16%)Baseline: 1,445.12 ms | 1,727.29 ms (83.80%) |
| bimodal_ke_postprob | 📈 view plot 🚷 view threshold | 296.31 ms(-25.25%)Baseline: 396.40 ms | 576.26 ms (51.42%) |
There was a problem hiding this comment.
Pull request overview
This PR is the first step in splitting the unified-platform work: it preserves the unified modeling/compilation layer and baseline nonparametric algorithms, while removing the parametric stack and newer algorithm families to keep follow-up reviews smaller.
Changes:
- Introduces a new public API surface (
api,model,compile,results,output) and updates tests/examples to useEstimationProblem+ModelDefinition. - Moves/rewires nonparametric estimation components under
estimation::nonparametricand removes legacyroutines/+structs/modules. - Adds shared result/summary/diagnostics/artifact tracking and shared output writers for the new architecture.
Reviewed changes
Copilot reviewed 103 out of 104 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/structs_tests.rs | Updates imports to the new nonparametric module path. |
| tests/settings_tests.rs | Removes legacy Settings-based tests (old API removed). |
| tests/results_summary_tests.rs | Adds smoke test for new summary/diagnostics/predictions surface. |
| tests/output_writer_tests.rs | Adds test coverage for writing shared output artifacts via FitResult::write_outputs. |
| tests/onecomp.rs | Migrates algorithm dispatch tests to EstimationProblem builder API. |
| tests/nonparametric_engine_tests.rs | Adds coverage for NonparametricEngine::fit returning a workspace. |
| tests/ipm_tests.rs | Updates Burke IPM tests to new module path. |
| tests/compile_layer_tests.rs | Adds coverage for compile layer (design context + observation index + covariates). |
| tests/api_smoke_tests.rs | Adds basic smoke coverage for new API entrypoints. |
| tests/acceptance_baseline_tests.rs | Adds an acceptance baseline for NPAG (bimodal_ke). |
| src/structs/mod.rs | Removes legacy structs module (moved to estimation::nonparametric). |
| src/routines/mod.rs | Removes legacy routines module (superseded by new architecture). |
| src/routines/initialization/sobol.rs | Removes legacy Sobol initializer (moved under new nonparametric prior). |
| src/routines/initialization/latin.rs | Removes legacy Latin initializer (moved under new nonparametric prior). |
| src/routines/expansion/mod.rs | Removes legacy expansion module. |
| src/routines/expansion/adaptative_grid.rs | Removes legacy adaptive grid implementation (reintroduced under new nonparametric). |
| src/routines/estimation/qr.rs | Removes legacy QR routine (reintroduced under new nonparametric). |
| src/routines/estimation/mod.rs | Removes legacy estimation routines module. |
| src/routines/condensation/mod.rs | Removes legacy condensation helper. |
| src/results/summary.rs | Adds shared summary DTOs (fit + population + individual). |
| src/results/predictions.rs | Adds shared predictions bundle metadata for results. |
| src/results/mod.rs | Introduces results module exports and internal helpers. |
| src/results/fit_result.rs | Adds FitResult enum and high-level accessors (summary, diagnostics, artifacts, etc.). |
| src/results/diagnostics.rs | Adds diagnostics bundle and nonparametric diagnostics builder. |
| src/results/artifacts.rs | Adds artifact indexing for output files (expected/actual/missing). |
| src/output/writer.rs | Adds unified writer entrypoint for result outputs + shared predictions CSV. |
| src/output/shared.rs | Adds shared output writers (settings/summary/diagnostics/csv rows) and file name list. |
| src/output/nonparametric.rs | Adds method-specific output writers and expected file enumeration. |
| src/output/mod.rs | Introduces output module layout and re-exports. |
| src/output/logging.rs | Refactors logging setup to use new OutputPlan + LoggingOptions. |
| src/output/file.rs | Adds new OutputFile helper with directory creation and context. |
| src/model/variability.rs | Adds variability/random-effects spec types for the new model domain. |
| src/model/parameter_space.rs | Adds parameter-space modeling types and finite-range validation. |
| src/model/observation_spec.rs | Adds observation channel spec and error-model attachment surface. |
| src/model/mod.rs | Introduces ModelDefinition + builder and exports model-domain types. |
| src/model/metadata.rs | Adds model metadata container. |
| src/model/covariates.rs | Adds covariate spec types (in-equation vs structured). |
| src/lib.rs | Replaces legacy public API with new modules + prelude exports. |
| src/estimation/nonparametric/workspace.rs | Adds workspace container and output-writing helpers for nonparametric fit results. |
| src/estimation/nonparametric/weights.rs | Minor cleanup around Weights API docs/structure. |
| src/estimation/nonparametric/summaries.rs | Adds fit/population/individual summary builders. |
| src/estimation/nonparametric/statistics.rs | Adds statistics helpers used by summaries/predictions. |
| src/estimation/nonparametric/qr.rs | Reintroduces QR decomposition helper under new module layout. |
| src/estimation/nonparametric/psi.rs | Adds ndarray conversion helper and keeps CSV helpers/serialization. |
| src/estimation/nonparametric/prior/sobol.rs | Adds Sobol prior generator using ParameterSpace::finite_ranges. |
| src/estimation/nonparametric/prior/latin.rs | Adds Latin hypercube prior generator using finite ranges. |
| src/estimation/nonparametric/prior.rs | Adds Prior enum + prior parsing and sampling entrypoints. |
| src/estimation/nonparametric/predictions.rs | Updates predictions machinery to new module paths. |
| src/estimation/nonparametric/posterior.rs | Routes posterior helper through Posterior::calculate. |
| src/estimation/nonparametric/mod.rs | Defines nonparametric module surface and re-exports. |
| src/estimation/nonparametric/expansion.rs | Reintroduces adaptive grid expansion under new module layout. |
| src/estimation/nonparametric/engine.rs | Adds NonparametricEngine bridging compiled problems to algorithm execution. |
| src/estimation/nonparametric/cycles.rs | Updates cycle logging to write iterations.csv and accept new inputs. |
| src/estimation/mod.rs | Adds estimation module root (currently nonparametric only). |
| src/compile/validation.rs | Adds compile-time validation for problems. |
| src/compile/observation_index.rs | Adds observation index DTO. |
| src/compile/mod.rs | Adds compile pipeline and EstimationProblem::compile/initialize_logs. |
| src/compile/design_context.rs | Adds design context + structured covariate design DTOs. |
| src/compile/compiled_problem.rs | Adds compiled problem container used by engines. |
| src/compile/caches.rs | Adds execution cache flags container. |
| src/bestdose/predictions.rs | Updates BestDose prediction interval handling and imports for new architecture. |
| src/bestdose/posterior.rs | Updates BestDose posterior/refinement to new config/runtime wiring. |
| src/bestdose/optimization.rs | Updates imports to new Weights path. |
| src/bestdose/cost.rs | Updates docs and makes calculate_cost public. |
| src/api/model_definition.rs | Re-exports model definition types into API module. |
| src/api/mod.rs | Introduces API module exports (fit, problem builder/types). |
| src/api/fit.rs | Adds fit(problem) entrypoint with optional logging init + compile + run. |
| src/api/estimation_problem.rs | Adds new unified problem/method/runtime/output/logging option types and builder. |
| src/algorithms/nonparametric/postprob.rs | Rewires POSTPROB to produce NonparametricWorkspace and use native config input. |
| src/algorithms/nonparametric/npod.rs | Rewires NPOD to new config/input plumbing and workspace output. |
| src/algorithms/nonparametric/npag.rs | Rewires NPAG to new config/input plumbing and workspace output. |
| src/algorithms/nonparametric/mod.rs | Adds a dedicated nonparametric algorithms module organization. |
| src/algorithms/mod.rs | Adds algorithm input/config plumbing and replaces old dispatch with run_nonparametric_algorithm. |
| examples/vanco_sde/main.rs | Migrates example to new API/model/problem setup. |
| examples/two_eq_lag/main.rs | Migrates example to new API/model/problem setup. |
| examples/theophylline/main.rs | Migrates example to new API/model/problem setup. |
| examples/new_iov/main.rs | Migrates example to new API/model/problem setup. |
| examples/neely/main.rs | Migrates example to new API/model/problem setup. |
| examples/meta/main.rs | Migrates example to new API/model/problem setup. |
| examples/iov/main.rs | Migrates example to new API/model/problem setup. |
| examples/drusano/main.rs | Migrates example to new API/model/problem setup. |
| examples/drusano/config.toml | Removes legacy config file (old Settings flow removed). |
| examples/bimodal_ke/main.rs | Migrates example to new API/model/problem setup and logging options. |
| examples/bimodal_ke/config.toml | Removes legacy config file (old Settings flow removed). |
| examples/bestdose.rs | Migrates BestDose example to new config + read_prior. |
| examples/bestdose_cov.rs | Adds a new BestDose covariate example using the new API. |
| examples/bestdose_bounds.rs | Migrates BestDose bounds example to new config + read_prior. |
| examples/bestdose_auc.rs | Migrates BestDose AUC example to new config + read_prior. |
| Cargo.toml | Adds dependencies and changes test profile configuration. |
| benches/bimodal_ke.rs | Migrates benchmark to run through EstimationProblem instead of legacy dispatch/settings. |
| .gitignore | Updates ignored output/artifact files for the new output layout. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for result in reader.records() { | ||
| let record = result.unwrap(); | ||
| let values: Vec<f64> = reordered_indices | ||
| .iter() | ||
| .map(|&i| record[i].parse::<f64>().unwrap()) | ||
| .collect(); | ||
| theta_values.push(values); | ||
|
|
||
| if let Some(prob_idx) = prob_index { | ||
| let prob_value: f64 = record[prob_idx].parse::<f64>().unwrap(); | ||
| prob_values.push(prob_value); |
There was a problem hiding this comment.
parse_prior_for_parameters uses multiple unwrap() calls when iterating CSV records and parsing floats. This can panic on malformed CSVs or non-numeric values even though the function returns Result, which makes prior loading crash the caller instead of returning a helpful error. Replace the unwrap()s with ? (and add context including row/column/field name) so invalid priors are reported as regular errors.
| writer.write_field("id")?; | ||
| writer.write_field("point")?; | ||
| self.theta.param_names().iter().for_each(|name| { | ||
| writer.write_field(name).unwrap(); | ||
| }); | ||
| writer.write_field("prob")?; | ||
| writer.write_record(None::<&[u8]>)?; |
There was a problem hiding this comment.
write_posterior mixes ? with several unwrap()s inside iterator closures (e.g., writer.write_field(...).unwrap() and writer.write_record(...).unwrap()). Any I/O/CSV serialization error will panic instead of being returned via anyhow::Result. Refactor these for_each closures into for loops so you can propagate errors with ? throughout.
| let subjects = self.data.subjects(); | ||
| self.posterior | ||
| .matrix() | ||
| .row_iter() | ||
| .enumerate() | ||
| .for_each(|(i, row)| { | ||
| let subject = subjects.get(i).unwrap(); | ||
| let id = subject.id(); | ||
|
|
||
| row.iter().enumerate().for_each(|(spp, prob)| { | ||
| writer.write_field(id.clone()).unwrap(); | ||
| writer.write_field(spp.to_string()).unwrap(); | ||
|
|
||
| self.theta.matrix().row(spp).iter().for_each(|val| { | ||
| writer.write_field(val.to_string()).unwrap(); | ||
| }); | ||
|
|
||
| writer.write_field(prob.to_string()).unwrap(); | ||
| writer.write_record(None::<&[u8]>).unwrap(); | ||
| }); |
There was a problem hiding this comment.
More unwrap()s in the nested loops of write_posterior can panic on write errors and also hide potential mismatches (e.g., subjects.get(i).unwrap()). Prefer explicit loops returning Result and use ? for CSV writes; for indexing, return a descriptive error if the subject/posterior row counts ever diverge.
| pub fn median(data: &[f64]) -> f64 { | ||
| let mut data: Vec<f64> = data.to_vec(); | ||
| data.sort_by(|a, b| a.partial_cmp(b).unwrap()); | ||
|
|
||
| let size = data.len(); | ||
| match size { | ||
| even if even % 2 == 0 => { | ||
| let fst = data.get(even / 2 - 1).unwrap(); | ||
| let snd = data.get(even / 2).unwrap(); | ||
| (fst + snd) / 2.0 | ||
| } | ||
| odd => *data.get(odd / 2_usize).unwrap(), | ||
| } | ||
| } | ||
|
|
||
| pub fn weighted_median(data: &[f64], weights: &[f64]) -> f64 { | ||
| assert_eq!( | ||
| data.len(), | ||
| weights.len(), | ||
| "The length of data and weights must be the same" | ||
| ); | ||
| assert!( | ||
| weights.iter().all(|&x| x >= 0.0), | ||
| "Weights must be non-negative, weights: {:?}", | ||
| weights | ||
| ); | ||
|
|
||
| let mut weighted_data: Vec<(f64, f64)> = data | ||
| .iter() | ||
| .zip(weights.iter()) | ||
| .map(|(&d, &w)| (d, w)) | ||
| .collect(); | ||
|
|
||
| weighted_data.sort_by(|a, b| a.0.partial_cmp(&b.0).unwrap()); | ||
|
|
||
| let total_weight: f64 = weights.iter().sum(); | ||
| let mut cumulative_sum = 0.0; | ||
|
|
||
| for (i, &(_, weight)) in weighted_data.iter().enumerate() { | ||
| cumulative_sum += weight; | ||
|
|
||
| if cumulative_sum == total_weight / 2.0 { | ||
| if i + 1 < weighted_data.len() { | ||
| return (weighted_data[i].0 + weighted_data[i + 1].0) / 2.0; | ||
| } else { | ||
| return weighted_data[i].0; | ||
| } | ||
| } else if cumulative_sum > total_weight / 2.0 { | ||
| return weighted_data[i].0; | ||
| } | ||
| } | ||
|
|
||
| unreachable!("The function should have returned a value before reaching this point."); | ||
| } |
There was a problem hiding this comment.
median/weighted_median sort using partial_cmp(...).unwrap() and use assert!/unreachable! for validation. These functions are publicly re-exported (part of the library API) and will panic if inputs contain NaNs, negative weights, mismatched lengths, or total weight is 0. Consider returning Result<f64> (or filtering/ordering NaNs explicitly) and validate total_weight > 0.0 to avoid panics in normal error cases.
| pub fn population_mean_median( | ||
| theta: &Array2<f64>, | ||
| w: &Array1<f64>, | ||
| ) -> Result<(Array1<f64>, Array1<f64>)> { | ||
| let w = if w.is_empty() { | ||
| tracing::warn!("w.len() == 0, setting all weights to 1/n"); | ||
| Array1::from_elem(theta.nrows(), 1.0 / theta.nrows() as f64) | ||
| } else { | ||
| w.clone() | ||
| }; | ||
|
|
||
| if theta.nrows() != w.len() { | ||
| bail!( | ||
| "Number of parameters and number of weights do not match. Theta: {}, w: {}", | ||
| theta.nrows(), | ||
| w.len() | ||
| ); | ||
| } | ||
|
|
||
| let mut mean = Array1::zeros(theta.ncols()); | ||
| let mut median = Array1::zeros(theta.ncols()); | ||
|
|
||
| for (i, (mn, mdn)) in mean.iter_mut().zip(&mut median).enumerate() { | ||
| let col = theta.column(i).to_owned() * w.to_owned(); | ||
| *mn = col.sum(); | ||
|
|
There was a problem hiding this comment.
population_mean_median computes the weighted mean as sum(theta * w) without normalizing w. Weights::from_vec (and priors loaded from CSV) do not guarantee weights sum to 1.0, so means (and downstream SD/CV calculations) can be scaled incorrectly. Either normalize w inside these statistics helpers (e.g., w / w.sum()) or enforce normalization when constructing Weights from external inputs.
| let has_covariates = result.data().subjects().iter().any(|subject| { | ||
| subject | ||
| .occasions() | ||
| .iter() | ||
| .any(|occasion| !occasion.covariates().covariates().is_empty()) | ||
| }); | ||
| if has_covariates { | ||
| files.push("covariates.csv".to_string()); | ||
| } | ||
|
|
||
| files.sort(); | ||
| files.dedup(); | ||
| files | ||
| } | ||
|
|
||
| pub fn write_nonparametric_outputs<E: pharmsol::Equation>( | ||
| result: &mut NonparametricWorkspace<E>, | ||
| ) -> Result<()> { | ||
| let parameter_names = result.get_theta().parameters().names(); | ||
| result | ||
| .cycle_log() | ||
| .write(result.output_folder(), ¶meter_names)?; | ||
| result.write_theta()?; | ||
| result.write_covariates()?; | ||
| result.write_posterior()?; |
There was a problem hiding this comment.
write_nonparametric_outputs always calls result.write_covariates()?, but output_file_names only includes covariates.csv when has_covariates is true. This makes the artifact/expected-files model inconsistent (a file may be written but never tracked as expected). Either gate write_covariates behind the same has_covariates condition or always include covariates.csv in output_file_names.
| cobyla = "0.8.0" | ||
| rand_distr = "0.5.1" | ||
| rand_chacha = "0.9.0" | ||
| statrs = "0.18" |
There was a problem hiding this comment.
New dependencies (cobyla, rand_distr, rand_chacha, statrs) appear unused in the current tree (no references in src/, tests/, examples/, or benches/). Unused deps increase compile time and supply-chain surface area. If they’re intended for a follow-up PR, consider deferring them until they’re needed or gating behind a feature.
| cobyla = "0.8.0" | |
| rand_distr = "0.5.1" | |
| rand_chacha = "0.9.0" | |
| statrs = "0.18" |
| [profile.test] | ||
| inherits = "release" | ||
|
|
There was a problem hiding this comment.
[profile.test] inheriting from release changes test behavior (e.g., disables debug assertions/overflow checks and can make failures harder to diagnose). If the goal is faster tests/bench-like runs, consider keeping the default debug test profile and introducing a separate CI profile/feature for performance runs, or document why release-mode tests are required here.
| [profile.test] | |
| inherits = "release" |
| /// - Model simulation fails | ||
| /// - Prediction length doesn't match observation count | ||
| /// - AUC calculation fails (for AUC targets) | ||
| pub(crate) fn calculate_cost(problem: &BestDoseProblem, candidate_doses: &[f64]) -> Result<f64> { | ||
| pub fn calculate_cost(problem: &BestDoseProblem, candidate_doses: &[f64]) -> Result<f64> { | ||
| // Validate candidate_doses length matches expected optimizable dose count |
There was a problem hiding this comment.
calculate_cost was changed from pub(crate) to pub while bestdose::cost is a public module. This effectively promotes an internal helper to part of the crate’s public API surface. If this isn’t meant to be supported long-term, keep it crate-private (or mark it clearly as internal) to avoid committing to API stability for a low-level routine.
|
@copilot resolve the merge conflicts in this pull request |
Agent-Logs-Url: https://github.com/LAPKB/PMcore/sessions/0dce7017-0f29-4a18-8b3c-c126536fd231 Co-authored-by: Siel <1023006+Siel@users.noreply.github.com>
This is part 1 of the split of #260.
Scope:
Why:
Validation:
Follow-up PRs will layer the parametric and expanded nonparametric algorithm families back on top of this base.