You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am part of the dev team for a Trilinos-based CFD application. Recently we have begun testing on GPUs on Summit at OLCF. We are able to run 2D cases with tens of millions of elements without issue. However, when moving to 3D, we can only run cases with a few thousand elements before running into an "allocation failed" error message (full message and back trace below) during the linear solve. The issue seems to be related to the boundary conditions, as we are able to run large cases in 3D as long as all boundaries are periodic. Adding even a single set of non-periodic boundaries in three dimensions results in this error:
We do not see this issue when using the version of the code compiled on CPUs. I am not exactly sure where to begin debugging this issue, so any help would be appreciated. I am happy to provide whatever other information is needed to help diagnose the problem.
Thanks,
Kellis
The text was updated successfully, but these errors were encountered:
kincaidkc
changed the title
Kokkos/Phalanx: Allocation error in 3D on Summit (OLCF)
Kokkos/Phalanx: Allocation error in 3D on Summit GPUs (OLCF)
Aug 19, 2024
@kincaidkc - that looks like an allocation failure on device. The only time we use allocations on device are for DFad objects when evaluating the Jacobian. It could be a bug in an evaluator, or it could be that you are running out of memory due to AD requirements. Unfortunately, I don't see any information about what evaluator is failing in the stack trace. I would probably start by using the kokkos-tools to look at the highwater memory mark on the gpus. Using more nodes to reduce the per-node memory requirement could be a quick way to check that as well. To figure out what evaluator the code is failing in, you could export the flag TEUCHOS_ENABLE_VERBOSE_TIMERS=1. This will dump every timer during runtime. This is a ton of data and the output will have to separated for each mpi process. This should print the last evaluator called.
Question
Hello,
I am part of the dev team for a Trilinos-based CFD application. Recently we have begun testing on GPUs on Summit at OLCF. We are able to run 2D cases with tens of millions of elements without issue. However, when moving to 3D, we can only run cases with a few thousand elements before running into an "allocation failed" error message (full message and back trace below) during the linear solve. The issue seems to be related to the boundary conditions, as we are able to run large cases in 3D as long as all boundaries are periodic. Adding even a single set of non-periodic boundaries in three dimensions results in this error:
We do not see this issue when using the version of the code compiled on CPUs. I am not exactly sure where to begin debugging this issue, so any help would be appreciated. I am happy to provide whatever other information is needed to help diagnose the problem.
Thanks,
Kellis
The text was updated successfully, but these errors were encountered: