The calculation speed is reduced to one-fourth when I compile deepmd-kit and lammps for performing molecular dynamics (MD) simulations. #3885

Dustglaxy · 2024-06-18T12:30:46Z

Dustglaxy
Jun 18, 2024

Dear developers,

Here, I have compiled the deepmd-kit and lammps using the following commands. However, the molecular dynamics (MD) speed is only 25% compared to when I use a conda installation directly. Since I have utilized a modified plumed, I had to compile them myself. Therefore, I kindly request your assistance in identifying and addressing the underlying issue.

Installation commands:

conda create -n cuda11
conda activate cuda11
conda install python==3.11.5
conda install cuda-nvcc

pip install --upgrade pip
pip install nvidia-cudnn-cu11==8.6.0.163 protobuf==4.23.4 tensorflow==2.13.*

#open a new terminal
CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.file)"))
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/usergpu/soft/anaconda/install/envs/cuda11/lib/:$CUDNN_PATH/lib

## deepmd-kit
tar
cd source
mkdir build
cd build

export PATH=/home/usergpu/soft/cmake/cmake-3.30.0-rc2-linux-x86_64/bin:$PATH

cmake -DUSE_TF_PYTHON_LIBS=TRUE -DCMAKE_INSTALL_PREFIX=/home/usergpu/soft/deepmd-kit/install/ -DTENSORFLOW_ROOT=/home/usergpu/soft/anaconda/install/envs/cuda11/lib/python3.11/site-packages/tensorflow/ ..

make -j12
make install -j12
make lammps

lammps

cd lammps-stable_2Aug2023_update2/
cd src/
cp -r /home/usergpu/soft/deepmd-kit/deepmd-kit-2.2.7/source/build/USER-DEEPMD/ .
make yes-kspace
make yes-extra-fix
make yes-user-deepmd
source /home/usergpu/soft/plumed-2.8.1/sourceme.sh
make lib-plumed args='-p /home/usergpu/xyliu/soft/plumed-2.8.1/bilud/ -m shared'
make yes-user-deepmd
make mpi -j 12

the output in screen when I submit a task

2024-06-18 20:02:33.307208: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2024-06-18 20:02:33.344105: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
DeePMD-kit WARNING: Environmental variable TF_INTRA_OP_PARALLELISM_THREADS is not set. Tune TF_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable TF_INTER_OP_PARALLELISM_THREADS is not set. Tune TF_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable TF_INTRA_OP_PARALLELISM_THREADS is not set. Tune TF_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable TF_INTER_OP_PARALLELISM_THREADS is not set. Tune TF_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
2024-06-18 20:02:34.312498: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
DeePMD-kit WARNING: Environmental variable TF_INTRA_OP_PARALLELISM_THREADS is not set. Tune TF_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable TF_INTER_OP_PARALLELISM_THREADS is not set. Tune TF_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
2024-06-18 20:02:34.333658: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-18 20:02:35.269270: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 36881 MB memory: -> device: 0, name: NVIDIA A800 80GB PCIe, pci bus id: 0000:18:00.0, compute capability: 8.0
2024-06-18 20:02:35.276366: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 77825 MB memory: -> device: 1, name: NVIDIA A800 80GB PCIe, pci bus id: 0000:86:00.0, compute capability: 8.0
2024-06-18 20:02:35.289328: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 36881 MB memory: -> device: 0, name: NVIDIA A800 80GB PCIe, pci bus id: 0000:18:00.0, compute capability: 8.0
2024-06-18 20:02:35.291170: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 77825 MB memory: -> device: 1, name: NVIDIA A800 80GB PCIe, pci bus id: 0000:86:00.0, compute capability: 8.0
2024-06-18 20:02:35.325452: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:375] MLIR V1 optimization pass is not enabled
2024-06-18 20:02:35.358886: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:375] MLIR V1 optimization pass is not enabled
2024-06-18 20:02:35.508553: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 36.02GiB (38673055744 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.512437: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 32.42GiB (34805747712 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.516274: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 29.17GiB (31325171712 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.520040: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 26.26GiB (28192653312 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.524024: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 23.63GiB (25373386752 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.528826: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 21.27GiB (22836047872 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.534870: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 19.14GiB (20552441856 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.540422: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 17.23GiB (18497198080 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.544968: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 15.50GiB (16647478272 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.548758: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 13.95GiB (14982729728 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.552603: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 12.56GiB (13484455936 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.556746: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 11.30GiB (12136009728 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.562263: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 10.17GiB (10922408960 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.567612: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 9.15GiB (9830167552 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.571641: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 8.24GiB (8847150080 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.576848: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 7.42GiB (7962435072 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.580606: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 6.67GiB (7166191616 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.584758: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 6.01GiB (6449572352 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.588573: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 5.41GiB (5804615168 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.594112: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 4.87GiB (5224153600 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.599475: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 4.38GiB (4701737984 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.603644: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 3.94GiB (4231564032 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.607448: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 3.55GiB (3808407552 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.611319: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 3.19GiB (3427566592 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.615339: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 2.87GiB (3084809728 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.620897: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 2.58GiB (2776328704 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.626355: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 2.33GiB (2498695680 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.631273: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 2.09GiB (2248826112 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.635073: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 1.88GiB (2023943424 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.638871: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 1.70GiB (1821549056 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-06-18 20:02:35.643169: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 1.53GiB (1639394048 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
DeePMD-kit WARNING: Environmental variable TF_INTRA_OP_PARALLELISM_THREADS is not set. Tune TF_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable TF_INTER_OP_PARALLELISM_THREADS is not set. Tune TF_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable TF_INTRA_OP_PARALLELISM_THREADS is not set. Tune TF_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable TF_INTER_OP_PARALLELISM_THREADS is not set. Tune TF_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
2024-06-18 20:02:36.492733: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 36881 MB memory: -> device: 0, name: NVIDIA A800 80GB PCIe, pci bus id: 0000:18:00.0, compute capability: 8.0
2024-06-18 20:02:36.494282: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 77825 MB memory: -> device: 1, name: NVIDIA A800 80GB PCIe, pci bus id: 0000:86:00.0, compute capability: 8.0
2024-06-18 20:02:36.497800: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 36881 MB memory: -> device: 0, name: NVIDIA A800 80GB PCIe, pci bus id: 0000:18:00.0, compute capability: 8.0
2024-06-18 20:02:36.499288: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 77825 MB memory: -> device: 1, name: NVIDIA A800 80GB PCIe, pci bus id: 0000:86:00.0, compute capability: 8.0
2024-06-18 20:02:36.532222: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 36881 MB memory: -> device: 0, name: NVIDIA A800 80GB PCIe, pci bus id: 0000:18:00.0, compute capability: 8.0
2024-06-18 20:02:36.549077: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 77825 MB memory: -> device: 1, name: NVIDIA A800 80GB PCIe, pci bus id: 0000:86:00.0, compute capability: 8.0
2024-06-18 20:02:36.558621: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 36881 MB memory: -> device: 0, name: NVIDIA A800 80GB PCIe, pci bus id: 0000:18:00.0, compute capability: 8.0
2024-06-18 20:02:36.560058: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 77825 MB memory: -> device: 1, name: NVIDIA A800 80GB PCIe, pci bus id: 0000:86:00.0, compute capability: 8.0

log. file

Info of deepmd-kit:
installed to: /home/usergpu/xyliu/soft/deepmd-kit/install2/installation
source:
source branch:
source commit:
source commit at:
surpport model ver.:1.1
build variant: cpu
build with tf inc: /home/usergpu/soft/anaconda/install/envs/cuda11_2/lib/python3.11/site-packages/tensorflow/include;/home/usergpu/soft/anaconda/install/envs/cuda11_2/lib/python3.11/site-packages/tensorflow/include
build with tf lib: /home/usergpu/soft/anaconda/install/envs/cuda11_2/lib/python3.11/site-packages/tensorflow/libtensorflow_cc.so.2
set tf intra_op_parallelism_threads: 0
set tf inter_op_parallelism_threads: 0
Info of lammps module:
use deepmd-kit at: /home/usergpu/xyliu/soft/deepmd-kit/install2/installation
source:
source branch:
source commit:
source commit at:
build float prec: double
build with tf inc: /home/usergpu/soft/anaconda/install/envs/cuda11_2/lib/python3.11/site-packages/tensorflow/include;/home/usergpu/soft/anaconda/install/envs/cuda11_2/lib/python3.11/site-packages/tensorflow/include
build with tf lib: /home/usergpu/soft/anaconda/install/envs/cuda11_2/lib/python3.11/site-packages/tensorflow/libtensorflow_cc.so.2
INVALID_ARGUMENT: Tensor spin_attr/ntypes_spin:0, specified in either feed_devices or fetch_devices was not found in the Graph
INVALID_ARGUMENT: Tensor spin_attr/ntypes_spin:0, specified in either feed_devices or fetch_devices was not found in the Graph
INVALID_ARGUMENT: Tensor spin_attr/ntypes_spin:0, specified in either feed_devices or fetch_devices was not found in the Graph
INVALID_ARGUMENT: Tensor spin_attr/ntypes_spin:0, specified in either feed_devices or fetch_devices was not found in the Graph

njzjz · 2024-06-19T21:28:36Z

njzjz
Jun 19, 2024
Maintainer

It seems to me you got an out-of-memory error, so the GPU may not be used.

2024-06-18 20:02:35.643169: I tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:753] failed to allocate 1.53GiB (1639394048 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory

By the way, it looks strange that your two cards have different memory.

1 reply

Dustglaxy Jun 23, 2024
Author

Thank you very much. The strangeness in memory occurs because I have not installed a task management system and multiple tasks use
one GPU simultaneously.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The calculation speed is reduced to one-fourth when I compile deepmd-kit and lammps for performing molecular dynamics (MD) simulations. #3885

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

The calculation speed is reduced to one-fourth when I compile deepmd-kit and lammps for performing molecular dynamics (MD) simulations. #3885

Dustglaxy Jun 18, 2024

lammps

the output in screen when I submit a task

log. file

Replies: 1 comment · 1 reply

njzjz Jun 19, 2024 Maintainer

Dustglaxy Jun 23, 2024 Author

Dustglaxy
Jun 18, 2024

Replies: 1 comment 1 reply

njzjz
Jun 19, 2024
Maintainer

Dustglaxy Jun 23, 2024
Author