Add wave equation and Klein-Gordon equation benchmark tasks#97
Add wave equation and Klein-Gordon equation benchmark tasks#97gpartin wants to merge 3 commits intopdebench:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds new PDEBench benchmark tasks and supporting artifacts for the 1D/2D wave equation and Klein–Gordon equation, including a NumPy-based simulator, a Hydra-based dataset generator, training configs, and documentation.
Changes:
- Added
WaveSimulator(1D/2D) and an FFT-based 1D analytical solution helper. - Added
gen_wave.py+wave.yamlto generate datasets in a PDEBench-style HDF5 layout. - Added model argument configs and benchmark documentation; updated README to reference the new generator.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| pdebench/models/config/args/config_wave.yaml | Adds a wave-equation training config (FNO/UNet params). |
| pdebench/models/config/args/config_klein_gordon.yaml | Adds a Klein–Gordon training config (parameterized by χ). |
| pdebench/data_gen/src/sim_wave.py | Implements leapfrog/Verlet simulator and 1D analytical solution. |
| pdebench/data_gen/gen_wave.py | Implements Hydra-driven dataset generation + tensor-format consolidation. |
| pdebench/data_gen/configs/wave.yaml | Adds default Hydra config for wave/KG generation. |
| WAVE_BENCHMARK.md | Documents equations, generation usage, and baseline results. |
| README.md | Adds gen_wave.py to the data generation section. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pdebench/data_gen/src/sim_wave.py
Outdated
| lap = laplacian(u_curr) | ||
| u_next = 2 * u_curr - u_prev + c2dt2 * lap - chi2dt2 * u_curr | ||
| u_prev = u_curr | ||
| u_curr = u_next | ||
|
|
||
| if save_idx < self.Nt and step % save_interval == 0: | ||
| result[save_idx] = u_curr.astype(np.float32) | ||
| save_idx += 1 | ||
|
|
There was a problem hiding this comment.
The leapfrog loop saves u_curr after updating it, but u_curr is initialized to the t=dt state (via the Taylor half-step) and then advanced immediately in the first iteration. This makes the saved frame for step=1 correspond to ~t=2*dt (and in general shifts snapshot times by one step), further desynchronizing the saved solution from the t_save coordinate. Saving u_curr before advancing (or adjusting the step/time bookkeeping so u_curr corresponds to step*dt) would fix the off-by-one.
| lap = laplacian(u_curr) | |
| u_next = 2 * u_curr - u_prev + c2dt2 * lap - chi2dt2 * u_curr | |
| u_prev = u_curr | |
| u_curr = u_next | |
| if save_idx < self.Nt and step % save_interval == 0: | |
| result[save_idx] = u_curr.astype(np.float32) | |
| save_idx += 1 | |
| if save_idx < self.Nt and step % save_interval == 0: | |
| result[save_idx] = u_curr.astype(np.float32) | |
| save_idx += 1 | |
| lap = laplacian(u_curr) | |
| u_next = 2 * u_curr - u_prev + c2dt2 * lap - chi2dt2 * u_curr | |
| u_prev = u_curr | |
| u_curr = u_next |
| f.create_dataset( | ||
| f"{seed_str}/grid/x", | ||
| data=sim.x.astype(np.float32), | ||
| dtype="float32", | ||
| compression="lzf", | ||
| ) | ||
| f.create_dataset( | ||
| f"{seed_str}/grid/t", | ||
| data=sim.t_save.astype(np.float32), | ||
| dtype="float32", | ||
| compression="lzf", | ||
| ) |
There was a problem hiding this comment.
For 2D runs (sim.ndim=2), this only writes grid/x and later only exports x-coordinate. PDEBench 2D datasets typically include both x-coordinate and y-coordinate, and the model loaders (e.g. PINN/FNO utilities) expect y-coordinate to exist for 2D problems. Please write grid/y (likely the same 1D coordinate as x for a square domain) when ndim==2 so the generated HDF5 is self-describing for 2D.
| # Get shape from first sample | ||
| first_key = str(0).zfill(4) | ||
| sample_shape = f_in[f"{first_key}/data"].shape | ||
|
|
||
| x_coord = np.array(f_in[f"{first_key}/grid/x"]) | ||
| t_coord = np.array(f_in[f"{first_key}/grid/t"]) | ||
|
|
||
| # Allocate combined tensor | ||
| full_shape = (n_samples, *sample_shape) | ||
|
|
||
| with h5py.File(str(output_path), "w") as f_out: | ||
| tensor = f_out.create_dataset( | ||
| "tensor", | ||
| shape=full_shape, | ||
| dtype="float32", | ||
| compression="lzf", | ||
| ) | ||
| for i in range(n_samples): | ||
| key = str(i).zfill(4) | ||
| if key in f_in: | ||
| tensor[i] = f_in[f"{key}/data"] | ||
|
|
||
| f_out.create_dataset("x-coordinate", data=x_coord) | ||
| f_out.create_dataset("t-coordinate", data=t_coord) |
There was a problem hiding this comment.
combine_to_tensor_format() only copies x-coordinate and t-coordinate into the output file. For 2D wave/KG data the output should also include y-coordinate (and optionally z-coordinate for higher dims) to match the conventions used elsewhere in the repo and to be consumable by existing loaders. You can infer whether it is 2D from sample_shape (len==3 for 2D per-sample) and copy grid/y from the raw file when applicable.
| for i in range(n_samples): | ||
| key = str(i).zfill(4) | ||
| if key in f_in: | ||
| tensor[i] = f_in[f"{key}/data"] | ||
|
|
There was a problem hiding this comment.
If a seed group is missing in the raw HDF5 (e.g. a worker crashed), the code silently leaves the corresponding slice of tensor as all zeros because it skips missing keys. This can produce corrupted datasets without any signal. Consider validating that all expected keys exist (or collecting the present keys and writing a smaller tensor) and raising/logging an error when samples are missing.
pdebench/data_gen/configs/wave.yaml
Outdated
|
|
||
| work_dir: ${hydra:runtime.cwd} | ||
| data_dir: data | ||
| upload: false |
There was a problem hiding this comment.
upload: false is defined in this config, but gen_wave.py currently never checks config.upload nor performs an upload step (unlike other generators). Either implement the upload path for consistency or remove/rename the config field to avoid suggesting functionality that doesn't exist.
| upload: false |
| # Save schedule | ||
| if self.ndim == 1: | ||
| result = np.zeros((self.Nt, self.Nx), dtype=np.float32) | ||
| else: | ||
| result = np.zeros((self.Nt, self.Nx, self.Nx), dtype=np.float32) | ||
|
|
||
| result[0] = u0.astype(np.float32) | ||
| save_idx = 1 | ||
| save_interval = max(1, self.n_steps // (self.Nt - 1)) | ||
|
|
There was a problem hiding this comment.
t_save is defined as linspace(0, T, Nt), but the saving logic uses save_interval = n_steps // (Nt - 1) and only saves when step % save_interval == 0. This generally produces snapshots at times that do not match t_save (and may skip the final time T if n_steps is not an exact multiple of Nt-1). Consider computing an explicit monotone list of save_steps that matches t_save (including the final step) and saving exactly at those steps, or derive t_save from the actual saved step indices.
- Save before advance in leapfrog loop to fix off-by-one snapshot timing - Precompute save steps from t_save for exact time alignment - Write grid/y for 2D simulations in per-seed HDF5 - Copy y-coordinate into combined tensor format for 2D - Raise KeyError for missing seeds instead of silent zero-fill - Remove unused 'upload' config field from wave.yaml
10 tests covering: - 1D/2D output shape and dtype (float32) - Finite output (no NaN/Inf) - Invalid ndim raises ValueError - Klein-Gordon chi>0 runs in 1D and 2D - Leapfrog vs analytical solution nRMSE < 1% (wave and KG) - analytical_solution_1d returns u0 at t=0 All tests pass in 0.33s.
Summary
This PR adds two new PDE benchmark tasks to PDEBench: the wave equation and the Klein-Gordon equation in 1D and 2D with periodic boundary conditions.
Equations
Wave equation:$\partial^2 u / \partial t^2 = c^2 \nabla^2 u$
Klein-Gordon:$\partial^2 u / \partial t^2 = c^2 \nabla^2 u - \chi^2 u$
Why these benchmarks?
Baseline results (FNO, 100 epochs, 1D)
Klein-Gordon cross-chi generalization (FNO)
Training on one chi value and testing on another reveals:
FNO extrapolates well for small parameter shifts but catastrophically fails across the propagating-to-evanescent transition (chi=2 to chi=5: nRMSE jumps from 0.095 to 0.789).
Files added
Files modified
See WAVE_BENCHMARK.md for full details.