Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trilinos test failed #39

Open
shahzebsiddiqui opened this issue Sep 30, 2022 · 0 comments
Open

trilinos test failed #39

shahzebsiddiqui opened this issue Sep 30, 2022 · 0 comments

Comments

@shahzebsiddiqui
Copy link
Contributor

CDASH: https://my.cdash.org/test/63278708

buildspec: https://github.com/buildtesters/buildtest-nersc/blob/devel/buildspecs/e4s/E4S-Testsuite/perlmutter/22.05/trilinos.yml

I suspect this test is failing because we have this set in our startup modulefile gpu which is loaded by default

e4s:login34> ml show gpu
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   /global/common/software/nersc/pm-2022.08.4/extra_modulefiles/gpu/1.0.lua:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
family("hardware")
load("cudatoolkit")
load("craype-accel-nvidia80")
setenv("MPICH_GPU_SUPPORT_ENABLED","1")

We can unload this module by just loading cpu module. Anyhow i wanted to bring this up.

Error:

+ cd -
/global/cfs/cdirs/m3503/buildtest/runs/perlmutter_check/2022-09-28/perlmutter.slurm.regular/trilinos/trilinos_e4s_testsuite_22.05/75260858/stage/testsuite/validation_tests/trilinos
Running /global/cfs/cdirs/m3503/buildtest/runs/perlmutter_check/2022-09-28/perlmutter.slurm.regular/trilinos/trilinos_e4s_testsuite_22.05/75260858/stage/testsuite/validation_tests/trilinos
Skipping load: Environment already setup
+ cd ./build
+ export CUDA_MANAGED_FORCE_DEVICE_ALLOC=1
+ CUDA_MANAGED_FORCE_DEVICE_ALLOC=1
+ export OMP_NUM_THREADS=4
+ OMP_NUM_THREADS=4
+ srun -n 8 ./Zoltan
MPICH ERROR [Rank 0] [job id 3289011.0] [Wed Sep 28 19:56:45 2022] [nid003233] - Abort(-1) (rank 0 in comm 0): MPIDI_CRAY_init: GPU_SUPPORT_ENABLED is requested, but GTL library is not linked
 (Other MPI error)

aborting job:
MPIDI_CRAY_init: GPU_SUPPORT_ENABLED is requested, but GTL library is not linked

srun: error: nid003233: tasks 0-7: Segmentation fault
srun: launch/slurm: _step_signal: Terminating StepId=3289011.0
Run failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant