fix: Support CP2K 2025 output format for energy and forces (fixes #850)#947
fix: Support CP2K 2025 output format for energy and forces (fixes #850)#947newtontech wants to merge 4 commits intodeepmodeling:masterfrom
Conversation
…pmodeling#850) This commit adds support for parsing CP2K 2025 version output files: **Changes in CP2K 2025 format:** 1. Energy line format changed from: 'ENERGY| Total FORCE_EVAL ( QS ) energy (a.u.): -7.997403996236343' to: 'ENERGY| Total FORCE_EVAL ( QS ) energy [hartree] -7.364190264587725' 2. Forces output format changed from: 'ATOMIC FORCES in [a.u.]' table with ' Atom Kind Element X Y Z' header to: 'FORCES| Atomic forces [hartree/bohr]' with 'FORCES| Atom x y z |f|' prefix lines **Implementation:** - Detect CP2K 2025 format by checking for 'energy [hartree]' in the content - Parse energy from new '[hartree]' format - Parse forces from new 'FORCES|' prefixed lines - Maintain backward compatibility with CP2K 2023 format **Testing:** - Added test file for CP2K 2025 format (tests/cp2k/cp2k_2025_output/) - Added test case TestCp2k2025Output to verify parsing - Added regression test TestCp2k2023FormatStillWorks to ensure backward compatibility - All existing CP2K tests pass
for more information, see https://pre-commit.ci
📝 WalkthroughWalkthroughAdds CP2K 2025 output support to dpdata by detecting a "energy [hartree]" token and branching parsing logic to extract energies and forces from the new CP2K 2025 line formats while preserving CP2K 2023 behavior. New test fixtures and unit tests validate parsing and edge cases. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dpdata/cp2k/output.py`:
- Around line 438-454: In the is_cp2k_2025 branch the fallback loop may leave
energy uninitialized if no numeric token is found; modify the block around
is_cp2k_2025 / parts / hartree_idx so that if neither the "[hartree]" lookup nor
the reversed parts numeric search sets energy you raise a clear exception or log
an error with context (e.g., include the original line string ii and parts)
instead of continuing silently; ensure the raised error (ValueError or custom)
describes that energy could not be parsed from the CP2K 2025 output so the later
assertion yields a helpful message.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
dpdata/cp2k/output.pytests/cp2k/cp2k_2025_output/cp2k_2025_outputtests/cp2k/cp2k_2025_output/deepmd/set.000/box.npytests/cp2k/cp2k_2025_output/deepmd/set.000/coord.npytests/cp2k/cp2k_2025_output/deepmd/set.000/energy.npytests/cp2k/cp2k_2025_output/deepmd/set.000/force.npytests/cp2k/cp2k_2025_output/deepmd/set.000/virial.npytests/cp2k/cp2k_2025_output/deepmd/type.rawtests/cp2k/cp2k_2025_output/deepmd/type_map.rawtests/test_cp2k_2025_output.py
| if is_cp2k_2025: | ||
| # Find the energy value after "[hartree]" | ||
| parts = ii.split() | ||
| try: | ||
| hartree_idx = parts.index("[hartree]") | ||
| energy = parts[hartree_idx + 1] | ||
| except (ValueError, IndexError): | ||
| # Fallback: try to find energy value in the line | ||
| for part in reversed(parts): | ||
| try: | ||
| float(part) | ||
| energy = part | ||
| break | ||
| except ValueError: | ||
| continue | ||
| else: | ||
| energy = ii.split()[8] |
There was a problem hiding this comment.
Potential uninitialized variable in fallback path.
If "[hartree]" is not found in parts and the fallback loop doesn't find any numeric value, energy will not be set for this line, but the code continues silently. While the assertion on line 491 will eventually catch this, the error message won't be helpful for debugging malformed CP2K 2025 output.
Consider adding explicit error handling:
Proposed fix
if is_cp2k_2025:
# Find the energy value after "[hartree]"
parts = ii.split()
try:
hartree_idx = parts.index("[hartree]")
energy = parts[hartree_idx + 1]
except (ValueError, IndexError):
# Fallback: try to find energy value in the line
for part in reversed(parts):
try:
float(part)
energy = part
break
except ValueError:
continue
+ else:
+ # Loop completed without finding energy
+ raise RuntimeError(f"Cannot parse energy from CP2K 2025 line: {ii}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dpdata/cp2k/output.py` around lines 438 - 454, In the is_cp2k_2025 branch the
fallback loop may leave energy uninitialized if no numeric token is found;
modify the block around is_cp2k_2025 / parts / hartree_idx so that if neither
the "[hartree]" lookup nor the reversed parts numeric search sets energy you
raise a clear exception or log an error with context (e.g., include the original
line string ii and parts) instead of continuing silently; ensure the raised
error (ValueError or custom) describes that energy could not be parsed from the
CP2K 2025 output so the later assertion yields a helpful message.
Merging this PR will improve performance by 18.08%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | WallTime | test_cli |
369.1 ms | 327.9 ms | +12.55% |
| ⚡ | WallTime | test_import |
11.3 ms | 9.6 ms | +18.08% |
Comparing newtontech:fix-cp2k-2025-format-v2 (7478a4c) with master (ae134fe)
|
please also address the review comments of the AI |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #947 +/- ##
==========================================
- Coverage 86.37% 86.30% -0.07%
==========================================
Files 86 86
Lines 8086 8105 +19
==========================================
+ Hits 6984 6995 +11
- Misses 1102 1110 +8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
- Add tests for energy parsing with extra whitespace - Add tests for FORCES| header line filtering (Atom x y z, Atomic forces) - Add integration test for CP2K 2025 format with LabeledSystem - Improve code coverage for CP2K 2025 format support
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
tests/test_cp2k_2025_output.py (2)
175-203: Strengthen force-header edge tests with value assertions.Both tests only assert atom count (
shape[1] == 2). Adding one component check makes header filtering regressions easier to catch.Suggested enhancement
def test_cp2k2025_force_parsing_with_header_lines(self): @@ try: system = dpdata.LabeledSystem(fname, fmt="cp2k/output") self.assertEqual(system.data["forces"].shape[1], 2) + self.assertAlmostEqual(system.data["forces"][0][0][0], -2.94874881, places=5) finally: os.unlink(fname) @@ def test_cp2k2025_force_parsing_with_atomic_forces_line(self): @@ try: system = dpdata.LabeledSystem(fname, fmt="cp2k/output") self.assertEqual(system.data["forces"].shape[1], 2) + self.assertAlmostEqual(system.data["forces"][0][0][0], -2.94874881, places=5) finally: os.unlink(fname)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_cp2k_2025_output.py` around lines 175 - 203, Update the two tests test_cp2k2025_force_parsing_with_header_lines and test_cp2k2025_force_parsing_with_atomic_forces_line to also assert actual force values (not just atom count) to catch header-filtering regressions: after creating the LabeledSystem (system = dpdata.LabeledSystem(...)) and before unlinking, fetch forces = system.data["forces"] and add assertions on specific components (e.g., forces[0,0] or forces[0,1] / forces[1,0], comparing to the expected parsed float values from create_cp2k_output_2025's forces_lines) so the tests validate both shape and numeric content.
1-207: Please run the targeted format test module before merge.Given this is a format-specific parser/test addition, run the focused test module as a pre-merge gate.
Based on learnings: For format-specific changes, run relevant test modules:
cd tests && python -m unittest test_<format>*.py; for core system changes runcd tests && python -m unittest test_system*.py test_multisystems.py.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_cp2k_2025_output.py` around lines 1 - 207, Summary: Reviewer requests running the format-specific tests for CP2K 2025 before merging. Run the targeted test module tests/test_cp2k_2025_output.py (which contains TestCp2k2025Output, TestCp2k2025EdgeCases, and helper create_cp2k_output_2025) locally with unittest (cd tests && python -m unittest test_cp2k_2025_output.py) and confirm all cases pass (energy/forces extraction and edge-case variants); if failures occur, fix the parser code paths referenced by fmt="cp2k/output" and the energy/force parsing logic, re-run the same command, and add this module to the CI pre-merge test pattern so format-specific tests are always executed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/test_cp2k_2025_output.py`:
- Around line 114-117: Replace truthiness checks with explicit None checks:
change "if energy_line:" to "if energy_line is not None:" and similarly change
any checks that currently use truthiness for override inputs to "is not None"
(e.g., the override input variables used in the second block around the f.write
call). Update both the energy_line branch and the override input branch (the
latter referenced in the comment as lines 127-130) so empty strings/empty lists
are treated as valid overrides instead of falling back to defaults.
- Around line 163-172: The test
test_cp2k2025_energy_parsing_with_extra_whitespace currently only checks for
presence of energies; add an explicit numeric assertion that the parsed energy
equals the expected value from the input string (-7.364190264587725) to ensure
correct parsing. Locate the call that constructs the system via
dpdata.LabeledSystem and replace or augment the non-null check with an assertion
using the test framework (e.g., self.assertAlmostEqual or self.assertEqual with
a tolerance) against the energy entry in system.data["energies"] (e.g., the
first energy element), referencing the helper create_cp2k_output_2025 for the
expected value. Ensure you account for array shape (use the correct index into
system.data["energies"]) and use a small tolerance like 1e-12 to avoid
floating-point comparison issues.
---
Nitpick comments:
In `@tests/test_cp2k_2025_output.py`:
- Around line 175-203: Update the two tests
test_cp2k2025_force_parsing_with_header_lines and
test_cp2k2025_force_parsing_with_atomic_forces_line to also assert actual force
values (not just atom count) to catch header-filtering regressions: after
creating the LabeledSystem (system = dpdata.LabeledSystem(...)) and before
unlinking, fetch forces = system.data["forces"] and add assertions on specific
components (e.g., forces[0,0] or forces[0,1] / forces[1,0], comparing to the
expected parsed float values from create_cp2k_output_2025's forces_lines) so the
tests validate both shape and numeric content.
- Around line 1-207: Summary: Reviewer requests running the format-specific
tests for CP2K 2025 before merging. Run the targeted test module
tests/test_cp2k_2025_output.py (which contains TestCp2k2025Output,
TestCp2k2025EdgeCases, and helper create_cp2k_output_2025) locally with unittest
(cd tests && python -m unittest test_cp2k_2025_output.py) and confirm all cases
pass (energy/forces extraction and edge-case variants); if failures occur, fix
the parser code paths referenced by fmt="cp2k/output" and the energy/force
parsing logic, re-run the same command, and add this module to the CI pre-merge
test pattern so format-specific tests are always executed.
| if energy_line: | ||
| f.write(energy_line + "\n") | ||
| else: | ||
| f.write( |
There was a problem hiding this comment.
Use explicit None checks for override inputs.
Line 114 and Line 127 use truthiness checks, so empty overrides ("" / []) cannot be tested and silently fall back to defaults.
Suggested fix
- if energy_line:
+ if energy_line is not None:
f.write(energy_line + "\n")
else:
f.write(
" ENERGY| Total FORCE_EVAL ( QS ) energy [hartree] -7.364190264587725\n"
)
@@
- if forces_lines:
+ if forces_lines is not None:
for line in forces_lines:
f.write(line + "\n")
else:
f.write(
" FORCES| 1 -5.73440344E-02 2.95274914E-02 -1.50988167E-02 6.62433792E-02\n"
)Also applies to: 127-130
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_cp2k_2025_output.py` around lines 114 - 117, Replace truthiness
checks with explicit None checks: change "if energy_line:" to "if energy_line is
not None:" and similarly change any checks that currently use truthiness for
override inputs to "is not None" (e.g., the override input variables used in the
second block around the f.write call). Update both the energy_line branch and
the override input branch (the latter referenced in the comment as lines
127-130) so empty strings/empty lists are treated as valid overrides instead of
falling back to defaults.
| def test_cp2k2025_energy_parsing_with_extra_whitespace(self): | ||
| """Test energy parsing with extra whitespace around value (coverage for parsing robustness).""" | ||
| fname = self.create_cp2k_output_2025( | ||
| energy_line=" ENERGY| Total FORCE_EVAL ( QS ) energy [hartree] -7.364190264587725 " | ||
| ) | ||
| try: | ||
| system = dpdata.LabeledSystem(fname, fmt="cp2k/output") | ||
| self.assertIsNotNone(system.data["energies"]) | ||
| self.assertEqual(system.data["forces"].shape[1], 2) | ||
| finally: |
There was a problem hiding this comment.
The whitespace-energy test does not validate the parsed energy value.
This test currently only checks non-null fields, so incorrect energy parsing could still pass. Add an explicit numeric assertion for Line 166 input.
Suggested fix
def test_cp2k2025_energy_parsing_with_extra_whitespace(self):
"""Test energy parsing with extra whitespace around value (coverage for parsing robustness)."""
fname = self.create_cp2k_output_2025(
energy_line=" ENERGY| Total FORCE_EVAL ( QS ) energy [hartree] -7.364190264587725 "
)
try:
system = dpdata.LabeledSystem(fname, fmt="cp2k/output")
- self.assertIsNotNone(system.data["energies"])
+ self.assertAlmostEqual(
+ system.data["energies"][0], -200.3898256786414, places=5
+ )
self.assertEqual(system.data["forces"].shape[1], 2)
finally:
os.unlink(fname)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_cp2k_2025_output.py` around lines 163 - 172, The test
test_cp2k2025_energy_parsing_with_extra_whitespace currently only checks for
presence of energies; add an explicit numeric assertion that the parsed energy
equals the expected value from the input string (-7.364190264587725) to ensure
correct parsing. Locate the call that constructs the system via
dpdata.LabeledSystem and replace or augment the non-null check with an assertion
using the test framework (e.g., self.assertAlmostEqual or self.assertEqual with
a tolerance) against the energy entry in system.data["energies"] (e.g., the
first energy element), referencing the helper create_cp2k_output_2025 for the
expected value. Ensure you account for array shape (use the correct index into
system.data["energies"]) and use a small tolerance like 1e-12 to avoid
floating-point comparison issues.
There was a problem hiding this comment.
Pull request overview
This PR extends dpdata’s CP2K output parser to handle CP2K 2025’s updated ENERGY| ... energy [hartree] ... line and the new FORCES| ...-prefixed force table format, while keeping compatibility with the existing CP2K output format. It also adds a CP2K 2025 fixture and corresponding tests to validate energy/force extraction and backward compatibility.
Changes:
- Add CP2K 2025 format detection and parsing logic for energy and forces in
dpdata/cp2k/output.py. - Add CP2K 2025 test fixtures (CP2K output + DeepMD reference data) under
tests/cp2k/cp2k_2025_output/. - Add new tests covering CP2K 2025 parsing and a regression check for the older CP2K format.
Reviewed changes
Copilot reviewed 5 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
dpdata/cp2k/output.py |
Adds CP2K 2025 detection plus updated energy/force parsing logic. |
tests/test_cp2k_2025_output.py |
Adds integration + edge-case tests for CP2K 2025 output parsing and a regression test class. |
tests/cp2k/cp2k_2025_output/cp2k_2025_output |
Adds a CP2K 2025 sample output fixture used by tests. |
tests/cp2k/cp2k_2025_output/deepmd/type.raw |
Adds DeepMD reference typing for the CP2K 2025 fixture. |
tests/cp2k/cp2k_2025_output/deepmd/type_map.raw |
Adds DeepMD type map for the CP2K 2025 fixture. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if "ENERGY|" in ii: | ||
| energy = ii.split()[8] | ||
| if " Atom Kind " in ii: | ||
| force_flag = True | ||
| force_idx = idx | ||
| if force_flag: | ||
| if idx > force_idx: | ||
| if "SUM OF ATOMIC FORCES" in ii: | ||
| force_flag = False | ||
| else: | ||
| force.append(ii.split()[3:6]) | ||
| # CP2K 2025 format: ENERGY| Total FORCE_EVAL ( QS ) energy [hartree] -7.364190264587725 | ||
| # CP2K 2023 format: ENERGY| Total FORCE_EVAL ( QS ) energy (a.u.): -1766.225653832774242 | ||
| if is_cp2k_2025: | ||
| # Find the energy value after "[hartree]" | ||
| parts = ii.split() | ||
| try: | ||
| hartree_idx = parts.index("[hartree]") | ||
| energy = parts[hartree_idx + 1] | ||
| except (ValueError, IndexError): | ||
| # Fallback: try to find energy value in the line | ||
| for part in reversed(parts): | ||
| try: | ||
| float(part) | ||
| energy = part | ||
| break | ||
| except ValueError: | ||
| continue | ||
| else: |
| Z 0.00000000 0.00000000 0.12345678 | ||
|
|
||
| ------------------------------------------------------------------------------- | ||
| - -\n - T I M I N G - |
| # Energy should be -7.364190264587725 hartree | ||
| # Using dpdata's conversion factor: -200.3898256786414 eV | ||
| expected_energy = -200.3898256786414 | ||
| self.assertAlmostEqual( | ||
| self.system_1.data["energies"][0], expected_energy, places=5 | ||
| ) |
njzjz-bot
left a comment
There was a problem hiding this comment.
I reviewed this PR against the reported CP2K 2025 output format in #850.
The nominal parser path looks reasonable: it handles the new ENERGY| ... energy [hartree] ... line, skips the new FORCES| headers/summaries via the numeric atom-id check, and preserves the old CP2K branch. I also ran the focused test module locally in a fresh uv environment:
cd tests && python -m unittest test_cp2k_2025_output.py
# Ran 34 tests in 0.382s -- OKA few things I would still clean up before merging:
dpdata/cp2k/output.py: the CP2K-2025 energy branch can still leaveenergyunset if[hartree]is present but the following token is missing/non-numeric and the fallback scan finds no float. Please initializeenergytoNonebefore parsing and raise a clear parser error when no numeric energy is found. This will turn malformed CP2K output into a helpful error instead of a laterUnboundLocalError/generic assertion.tests/cp2k/cp2k_2025_output/cp2k_2025_output: line 71 contains a literal\nsequence in the fixture. It would be better split into real lines, or removed, so the sample stays representative of actual CP2K output.- Tests: the hard-coded converted energy/force values are okay, but computing them with
EnergyConversion("hartree", "eV")/LengthConversion("bohr", "angstrom")would make the tests focus on parsing rather than duplicating conversion constants.
No functional blocker found for the standard CP2K 2025 sample path, but I’d at least address item 1 for robustness and to close the unresolved AI review comments.
Reviewed by OpenClaw 2026.4.22 (00bd2cf)
Model: custom-chat-jinzhezeng-group/gpt-5.5
This commit adds support for parsing CP2K 2025 version output files:
Changes in CP2K 2025 format:
Energy line format changed from: 'ENERGY| Total FORCE_EVAL ( QS ) energy (a.u.): -7.997403996236343' to: 'ENERGY| Total FORCE_EVAL ( QS ) energy [hartree] -7.364190264587725'
Forces output format changed from:
'ATOMIC FORCES in [a.u.]' table with ' Atom Kind Element X Y Z' header
to:
'FORCES| Atomic forces [hartree/bohr]' with 'FORCES| Atom x y z |f|' prefix lines
Implementation:
Testing:
Summary by CodeRabbit
New Features
Tests