Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3381 +/- ##
=======================================
Coverage 99.95% 99.95%
=======================================
Files 387 387
Lines 54315 54315
=======================================
Hits 54293 54293
Misses 22 22 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Not sure why ITs are failing now, unless I didn't push things correctly but they worked for me on Rupert now on the failing case. |
|
Looks like there's a few issues currently with LFRic 3.1 Signal 11 errors for NVHPC nvhpc fails for GPU run - namelist error Looks like there's some other namelist related error(s)(?) with NVHPC. I looked at the code that must be getting to the relevant point, and it shouldn't be possible to get to with the gungho example: The namelist specificies lbc_option = 'none' and use_physics = .false., so it should never call |
|
So the problem is not The problem is In the past we had issues as nvfortran did not interpret booleans in the namelist the same way as gnu was doing, specially the '.True.' vs a 'T' character, vs a '1' has differences between compilers. Can you check if any of the variables in the conditional come from namelist values and what format they have? If they are 0/1, you can try using the |
Yeah. The input has |
|
@sergisiso @arporter ITs are green for this now. I'm not sure if there's still an intermittent signal 11 for Nvfortran but I've not seen it the last few runs, but maybe we need to keep an eye on it going forward. |
sergisiso
left a comment
There was a problem hiding this comment.
@LonelyCat124 The review of the lfric forks are hard because the PRs include all changes coming from the fork, the best way I could find is looking at the resulting diffs with upstream:
MetOffice/lfric_core@vn3.1...stfc:lfric_core:3.1_sync
and
MetOffice/lfric_apps@vn3.1.1...stfc:lfric_apps:3.1_sync
The changes needed to make nvhpc work are quite concerning, but imo the benefit of testing close to upstream make them worth it. So I am in favour of mering all these changes.
The integration test timings are different (as expected: different code, different measurement), but I don't know if the system was busy or not to take these as the new speedup baselines. I am also surprised ${PSYCLONE_LFRIC_DIR}/KGOs/gungho_model-checksums.txt and ${PSYCLONE_LFRIC_DIR}/KGOs/lfric_atm-checksums.txt didn't require changes.
So before merging I will try to run them manually and try to get answers for the questions above.
|
The KGOs/lfric_atm-checksums.txt produces the exact same output, KGOs/gungho_model-checksums.txt does not, but it is inside the rather large tolerance that we have. I will approve this PR and continue looking at this at #3207 The CI re-run has been done with an empty system, so I will take those numbers as the new baseline. |
Will run CI and check no issues are caused.
Corresponding fork PRs:
stfc/lfric_core#3
stfc/lfric_apps#3