Skip to content

Commit

Permalink
Merge pull request #15 from vub-hpc/load_balancing
Browse files Browse the repository at this point in the history
Add documentation and results of load balancing tests
  • Loading branch information
wpoely86 committed Oct 4, 2022
2 parents 9553753 + e09a968 commit 28e2ed5
Show file tree
Hide file tree
Showing 45 changed files with 8,560 additions and 48 deletions.
54 changes: 6 additions & 48 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -311,7 +311,9 @@ Loads software commonly used to analyse the results of the simulations.

* available in Hydra

## CPRNC
## Extra tools

### CPRNC

There is small tool called
[cprnc](https://github.com/ESMCI/cime/tree/master/tools/cprnc) that needs to be
Expand All @@ -337,7 +339,7 @@ Therefore it has to be build with the minimum CPU optimizations

* the binary for Breniac was built in a Broadwell CPU (available upon request)

## mksurfdata_map
### mksurfdata_map

Compilation instructions for the CLM tool `mksurfdata_map`

Expand All @@ -347,51 +349,7 @@ Compilation instructions for the CLM tool `mksurfdata_map`
```
$ USER_FC=gfortran LIB_NETCDF="$EBROOTNETCDFMINFORTRAN/lib" INC_NETCDF="$EBROOTNETCDFMINFORTRAN/include" USER_FFLAGS="-fno-range-check" make
```
## Testing our CESM installations

## Validation of the CESM installation

Basic functionality of the installation can be checked with the
[*pre-alpha* tests of Cheyenne](https://esmci.github.io/cime/versions/master/html/users_guide/porting-cime.html#validating-a-cesm-port-with-prognostic-components).
This collection of tests can be created and executed with the script
``$CIMEROOT/cime/scripts/create_test``

```
$ ./create_test --xml-category prealpha --xml-machine cheyenne --xml-compiler intel --machine hydra --compiler gnu --parallel-jobs 1 --proc-pool 4 --output-root $VSC_SCRATCH/cesm/output/tests
```

It is also possible to carry out a scientific validation of the CESM
installation to verify its reliability. The procedure is described in
http://www.cesm.ucar.edu/models/cesm2/python-tools/.

1. Create ensemble test case with the script
``$CIMEROOT/tools/statistical_ensemble_test/ensemble.py`` in the CESM source
code

2. ``ensemble.py`` will create, build and submit the validation tests in the
cluster
```
$ python ensemble.py --case $VSC_SCRATCH/cesm/cases/UF-CAM-ECT.cesm_2.1.3_2021b.000 --ect cam --uf --mach hydra --compiler gnu --compset F2000climo --res f19_f19_mg17
$ python ensemble.py --case $VSC_SCRATCH/cesm/cases/POP-ECT.cesm_2.1.3_2021b.000 --ect pop --mach hydra --compiler gnu --compset G --res T62_g17
```

3. Use the [web tool from UCAR](https://www.cesm.ucar.edu/models/cesm2/verification/)
to compare the resulting `.nc` files with the reference data

Results of the validations carried out in the VSC clusters can be found in [cesm-config/validation](validation):

* CESM v2.1.1 passed all tests in the following clusters:
* Breniac with intel/2018a
* UF-CAM-ECT test: validated by Steven Johan De Hertog (VUB)
* POP-ECT: validated by Alex Domingo (VUB)
* Hydra with foss/2019a
* UF-CAM-ECT test: validated by Alex Domingo (VUB)
* POP-ECT: validated by Alex Domingo (VUB)

* CESM v2.2.0 passed all tests in the following clusters:
* Hydra with foss/2021b
* UF-CAM-ECT test: validated by Alex Domingo (VUB)
* POP-ECT: validated by Alex Domingo (VUB)
* Hortense with foss/2021b
* UF-CAM-ECT test: validated by Alex Domingo (VUB)
* POP-ECT: validated by Alex Domingo (VUB)
The folder [cesm-config/tests](tests) contains instructions to carry out different tests on a CESM installation, as well as results from multiple of our tests in VSC clusters.

85 changes: 85 additions & 0 deletions tests/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Tests of our CESM installations

We test multiple aspects of each new installation of CESM:
* [Basic functionality](#functionality-tests)
* [Scientific correctness](#scientific-validation-tests)
* [Performance and scaling](#performance-and-scaling-tests)

## Functionality Tests

Basic functionality of the installation can be checked with the
[*pre-alpha* tests of Cheyenne](https://esmci.github.io/cime/versions/master/html/users_guide/porting-cime.html#validating-a-cesm-port-with-prognostic-components).
This collection of tests can be created and executed with the script
``$CIMEROOT/cime/scripts/create_test``

```
$ ./create_test --xml-category prealpha --xml-machine cheyenne --xml-compiler intel --machine hydra --compiler gnu --parallel-jobs 1 --proc-pool 4 --output-root $VSC_SCRATCH/cesm/output/tests
```

⚠️ Warning: these tests are taxing in computational resources. The script
`create_test` will not only create the tests, but also submit them to the job
scheduler. Executing all tests needs a lot of storage (+1 TB) and some of the
tests require multiple full nodes (+8) to run.

## Scientific Validation Tests

UCAR provides a toolset to carry out a scientific validation of the CESM
installation and verify the reliability of its results. The procedure is
described in http://www.cesm.ucar.edu/models/cesm2/python-tools/.

1. Create ensemble test case with the script
``$CIMEROOT/tools/statistical_ensemble_test/ensemble.py`` in the CESM source
code

2. ``ensemble.py`` will create, build and submit the validation tests in the
cluster
```
$ python ensemble.py --case $VSC_SCRATCH/cesm/cases/UF-CAM-ECT.cesm_2.1.3_2021b.000 --ect cam --uf --mach hydra --compiler gnu --compset F2000climo --res f19_f19_mg17
$ python ensemble.py --case $VSC_SCRATCH/cesm/cases/POP-ECT.cesm_2.1.3_2021b.000 --ect pop --mach hydra --compiler gnu --compset G --res T62_g17
```

3. Use the [web tool from UCAR](https://www.cesm.ucar.edu/models/cesm2/verification/)
to compare the resulting `.nc` files with the reference data

Results of the validations carried out in the VSC clusters can be found in
[cesm-config/tests/validation](validation):

* CESM v2.1.1 passed all tests in the following clusters:
* Breniac with intel/2018a
* UF-CAM-ECT test: validated by Steven Johan De Hertog (VUB)
* POP-ECT: validated by Alex Domingo (VUB)
* Hydra with foss/2019a
* UF-CAM-ECT test: validated by Alex Domingo (VUB)
* POP-ECT: validated by Alex Domingo (VUB)

* CESM v2.2.0 passed all tests in the following clusters:
* Hydra with foss/2021b
* UF-CAM-ECT test: validated by Alex Domingo (VUB)
* POP-ECT: validated by Alex Domingo (VUB)
* Hortense with foss/2021b
* UF-CAM-ECT test: validated by Alex Domingo (VUB)
* POP-ECT: validated by Alex Domingo (VUB)

## Performance and Scaling Tests

CIME provides a load balancing tool to measure the performance of different PE layouts. This allows to tune the parallelization of the simulation at a higher level and, depending on the computational resources, it can help to minimize downtimes and improve performance. More info about PE layouts in the [CESM User's Guide](http://www.cesm.ucar.edu/models/cesm1.2/cesm/doc/usersguide/x1927.html).

The [CIME Load Balancing Tool](https://esmci.github.io/cime/versions/cesm2.2/html/misc_tools/load-balancing-tool.html) is located in the source code of CIME at `cime/tools/load_balancing_tool/`. It provides 2 scripts:

* `load_balancing_submit.py`: parses the PE layout description (XML), creates corresponding cases for a given compset and resolution and submits the cases
```
$ module load CESM-deps
$ export PYTHONPATH="$(realpath ../../scripts/):$(realpath .):$PYTHONPATH"
$ python load_balancing_submit.py --res f09_g17 --compset B1850 --pesfile PES-f09_g17-B1850.xml --project badmin
```

* `load_balancing_solve.py`: optimises the layout based on some model (*e.g.* IceLndWavAtmOcn)
```
$ module load CESM-deps PuLP
$ export PYTHONPATH="$(realpath ../../scripts/):$(realpath .):$PYTHONPATH"
$ python load_balancing_solve.py --total-tasks 512 --blocksize 8
```
note: adjust *total tasks* and *blocksize* according to your PE layout

We provide two patches in [cesm-config/tests/load_balancing/patches](load_balancing/patches) to update the load balancing tool in CESM v2.2.0 to make it compatible with Python 3 and PuLP v2.5.1.

Loading

0 comments on commit 28e2ed5

Please sign in to comment.