Skip to content

Move to LFRic 3.1 support#3381

Merged
sergisiso merged 24 commits intomasterfrom
3.1_sync
Mar 25, 2026
Merged

Move to LFRic 3.1 support#3381
sergisiso merged 24 commits intomasterfrom
3.1_sync

Conversation

@LonelyCat124
Copy link
Copy Markdown
Collaborator

Will run CI and check no issues are caused.

Corresponding fork PRs:
stfc/lfric_core#3
stfc/lfric_apps#3

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.95%. Comparing base (c54c5f1) to head (193ea12).
⚠️ Report is 25 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3381   +/-   ##
=======================================
  Coverage   99.95%   99.95%           
=======================================
  Files         387      387           
  Lines       54315    54315           
=======================================
  Hits        54293    54293           
  Misses         22       22           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@LonelyCat124
Copy link
Copy Markdown
Collaborator Author

Not sure why ITs are failing now, unless I didn't push things correctly but they worked for me on Rupert now on the failing case.

@LonelyCat124
Copy link
Copy Markdown
Collaborator Author

Looks like there's a few issues currently with LFRic 3.1

Signal 11 errors for NVHPC
If we get past the signal 11:

nvhpc fails for GPU run - namelist error
gnu passthrough works
gnu openmp works
nvhpc openmp CPU - namelist error with 1 mpi rank, seg fault at more ranks in partition_mod::partitioner_rectangular_panels in partition_mod.F90::739

Looks like there's some other namelist related error(s)(?) with NVHPC. I looked at the code that must be getting to the relevant point, and it shouldn't be possible to get to with the gungho example:

    IF (lbc_option == lbc_option_um2lfric_file .OR. (use_physics .AND. cloud == cloud_um)) THEN
      microphysics_nml => modeldb % configuration % get_namelist('microphysics')
      CALL microphysics_nml % get_value('microphysics_casim', microphysics_casim)
    END IF

The namelist specificies lbc_option = 'none' and use_physics = .false., so it should never call modeldb % configuration % get_namelist('microphysics'), but it clearly is for some reason, either because the namelist is reading the other things wrongly or something else. Not sure what the best path forward here is @sergisiso @arporter

@sergisiso
Copy link
Copy Markdown
Collaborator

So the problem is not modeldb % configuration % get_namelist('microphysics') as this really doesn't exist in the namelist.

The problem is lbc_option == lbc_option_um2lfric_file .OR. (use_physics .AND. cloud == cloud_um)

In the past we had issues as nvfortran did not interpret booleans in the namelist the same way as gnu was doing, specially the '.True.' vs a 'T' character, vs a '1' has differences between compilers. Can you check if any of the variables in the conditional come from namelist values and what format they have? If they are 0/1, you can try using the -Munixlogical flag.

@LonelyCat124
Copy link
Copy Markdown
Collaborator Author

So the problem is not modeldb % configuration % get_namelist('microphysics') as this really doesn't exist in the namelist.

The problem is lbc_option == lbc_option_um2lfric_file .OR. (use_physics .AND. cloud == cloud_um)

In the past we had issues as nvfortran did not interpret booleans in the namelist the same way as gnu was doing, specially the '.True.' vs a 'T' character, vs a '1' has differences between compilers. Can you check if any of the variables in the conditional come from namelist values and what format they have? If they are 0/1, you can try using the -Munixlogical flag.

Yeah. The input has use_physics=.false., cloud isn't specified and lbc_option='none'. All of these feel like they should be ok but is there an easy way to make/do we have a namelist test piece of fortran code I can use to test the input? I've never written code to interact with namelists so not something I know how to do...

@LonelyCat124
Copy link
Copy Markdown
Collaborator Author

@sergisiso @arporter ITs are green for this now. I'm not sure if there's still an intermittent signal 11 for Nvfortran but I've not seen it the last few runs, but maybe we need to keep an eye on it going forward.

Copy link
Copy Markdown
Collaborator

@sergisiso sergisiso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LonelyCat124 The review of the lfric forks are hard because the PRs include all changes coming from the fork, the best way I could find is looking at the resulting diffs with upstream:
MetOffice/lfric_core@vn3.1...stfc:lfric_core:3.1_sync
and
MetOffice/lfric_apps@vn3.1.1...stfc:lfric_apps:3.1_sync

The changes needed to make nvhpc work are quite concerning, but imo the benefit of testing close to upstream make them worth it. So I am in favour of mering all these changes.

The integration test timings are different (as expected: different code, different measurement), but I don't know if the system was busy or not to take these as the new speedup baselines. I am also surprised ${PSYCLONE_LFRIC_DIR}/KGOs/gungho_model-checksums.txt and ${PSYCLONE_LFRIC_DIR}/KGOs/lfric_atm-checksums.txt didn't require changes.

So before merging I will try to run them manually and try to get answers for the questions above.

@sergisiso
Copy link
Copy Markdown
Collaborator

The KGOs/lfric_atm-checksums.txt produces the exact same output, KGOs/gungho_model-checksums.txt does not, but it is inside the rather large tolerance that we have. I will approve this PR and continue looking at this at #3207

The CI re-run has been done with an empty system, so I will take those numbers as the new baseline.

@sergisiso sergisiso merged commit f02c0df into master Mar 25, 2026
17 of 18 checks passed
@sergisiso sergisiso deleted the 3.1_sync branch March 25, 2026 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants