Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v4.1.5 UCX_NET_DEVICES not selecting TCP devices correctly #12785

Open
bertiethorpe opened this issue Aug 30, 2024 · 0 comments
Open

v4.1.5 UCX_NET_DEVICES not selecting TCP devices correctly #12785

bertiethorpe opened this issue Aug 30, 2024 · 0 comments

Comments

@bertiethorpe
Copy link

Details of the problem

  • OS version (e.g Linux distro)
    • Rocky Linux release 9.4 (Blue Onyx)
  • Driver version:
    • rdma-core-2404mlnx51-1.2404066.x86_64
    • MLNX_OFED_LINUX-24.04-0.6.6.0

Setting UCX_NET_DEVICES to target only TCP devices when RoCE is available seems to be ignored in favour of some fallback.

I'm running a 2 node IMB_MPI PingPong to benchmark RoCE against regular TCP ethernet.

Setting UCX_NET_DEVICES=all or mlx5_0:1 gives the optimal performance and uses RDMA as expected.
Setting UCX_NET_DEVICES=eth0, eth1, or anything else still appears to use RoCE at only a slightly longer latency

HW information from ibstat or ibv_devinfo -vv command :

        hca_id: mlx5_0
        transport:                      InfiniBand (0)
        fw_ver:                         20.36.1010
        node_guid:                      fa16:3eff:fe4f:f5e9
        sys_image_guid:                 0c42:a103:0003:5d82
        vendor_id:                      0x02c9
        vendor_part_id:                 4124
        hw_ver:                         0x0
        board_id:                       MT_0000000224
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

How ompi is configured from ompi_info | grep Configure :

 Configured architecture: x86_64-pc-linux-gnu
 Configured by: abuild
 Configured on: Thu Aug  3 14:25:15 UTC 2023
 Configure command line: '--prefix=/opt/ohpc/pub/mpi/openmpi4-gnu12/4.1.5'
                                             '--disable-static' '--enable-builtin-atomics'
                                             '--with-sge' '--enable-mpi-cxx'
                                             '--with-hwloc=/opt/ohpc/pub/libs/hwloc'
                                             '--with-libfabric=/opt/ohpc/pub/mpi/libfabric/1.18.0'
                                             '--with-ucx=/opt/ohpc/pub/mpi/ucx-ohpc/1.14.0'
                                             '--without-verbs' '--with-tm=/opt/pbs/'

Following the advice from Here, it is apparently due to a higher priority of OpenMPI's btl/openib component but I don't think it can be if --without-verbs and openib is not available when searching ompi_info | grep btl.

As suggested in the UCX issue, adding -mca pml_ucx_tls any -mca pml_ucx_devices any to my mpirun has fixed this problem, but I was wondering what in the MCA precisely causes this behaviour.

Here's my batch script:

#!/usr/bin/env bash

#SBATCH --ntasks=2
#SBATCH --ntasks-per-node=1
#SBATCH --output=%x.%j.out
#SBATCH --error=%x.%j.out
#SBATCH --exclusive
#SBATCH --partition=standard

module load gnu12 openmpi4 imb

export UCX_NET_DEVICES=mlx5_0:1

echo SLURM_JOB_NODELIST: $SLURM_JOB_NODELIST
echo SLURM_JOB_ID: $SLURM_JOB_ID
echo UCX_NET_DEVICES: $UCX_NET_DEVICES

export UCX_LOG_LEVEL=data
mpirun -mca pml_ucx_tls any -mca pml_ucx_devices any IMB-MPI1 pingpong -iter_policy off
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants