Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test the latest master on AMD GPUs on LUMI with ROCm 6.0 #998

Closed
valassi opened this issue Sep 16, 2024 · 6 comments · Fixed by #1006
Closed

Test the latest master on AMD GPUs on LUMI with ROCm 6.0 #998

valassi opened this issue Sep 16, 2024 · 6 comments · Fixed by #1006
Assignees

Comments

@valassi
Copy link
Member

valassi commented Sep 16, 2024

Test the latest master on AMD GPUs on LUMI

This is something that I have not done during the goodhel/for360 PRs, and I might have done superficially in the june24 PR (which contained a lot of AMD relevant stuff)

@valassi valassi self-assigned this Sep 16, 2024
@valassi
Copy link
Member Author

valassi commented Sep 16, 2024

This fails to link now (maybe the roc installation changed?)

$(gpu_fcheckmain): $(BUILDDIR)/fcheck_sa_fortran.o $(BUILDDIR)/fsampler_$(GPUSUFFIX).o $(LIBDIR)/lib$(MG5AMC_GPULIB).so $(gpu_objects_exe)
ifneq ($(findstring hipcc,$(GPUCC)),) # link fortran/c++/hip using $FC when hipcc is used #802                                                           
        $(FC) -o $@ $(BUILDDIR)/fcheck_sa_fortran.o $(BUILDDIR)/fsampler_$(GPUSUFFIX).o $(LIBFLAGS) -lgfortran -L$(LIBDIR) -l$(MG5AMC_GPULIB) $(gpu_obje\
cts_exe) -lstdc++ -L$(shell dirname $(shell $(GPUCC) -print-prog-name=clang))/../../lib -lamdhip64
else
        $(GPUCC) -o $@ $(BUILDDIR)/fcheck_sa_fortran.o $(BUILDDIR)/fsampler_$(GPUSUFFIX).o $(LIBFLAGS) -lgfortran -L$(LIBDIR) -l$(MG5AMC_GPULIB) $(gpu_o\
bjects_exe)
endif

The problem is this

gfortran -march=znver3 -D__CRAY_X86_TRENTO -D__CRAY_AMD_GFX90A -D__CRAYXT_COMPUTE_LINUX_TARGET -D__TARGET_LINUX__ -ffixed-line-length-132 -o fcheck_hip.exe ./fcheck_sa_fortran.o ./fsampler_hip.o -L../../lib -lmg5amc_common_hip -Xlinker -rpath=$ORIGIN/../../lib -lgfortran -L../../lib -lmg5amc_gg_ttx_hip ./CommonRandomNumberKernel_hip.o ./RamboSamplingKernels_hip.o -lstdc++ -L/opt/rocm-6.0.3/llvm/bin/../../lib -lamdhip64 -Wl,-rpath=/opt/cray/pe/gcc-libs -Wl,-Bdynamic -Wl,--as-needed,-lgfortran,-lquadmath,--no-as-needed -Wl,--as-needed,-lpthread,--no-as-needed -Wl,--disable-new-dtags 

where

ls /opt/rocm-6.0.3/lib/libamdhip64.so
/opt/rocm-6.0.3/lib/libamdhip64.so@

but

ls /opt/rocm-6.0.3/llvm/bin/../../lib/libamdhip64.so
ls: cannot access '/opt/rocm-6.0.3/llvm/bin/../../lib/libamdhip64.so': No such file or directory

that is to say /opt/rocm-6.0.3/llvm/bin/../.. is not /opt/rocm-6.0.3/

@valassi
Copy link
Member Author

valassi commented Sep 16, 2024

Ok fixed with this ugly construction based on cd -L to preserve the meaning of `..`` in symlinks

diff --git a/epochX/cudacpp/gg_tt.mad/SubProcesses/cudacpp.mk b/epochX/cudacpp/gg_tt.mad/SubProcesses/cudacpp.mk
index 47e2f4233..5ffb286fe 100644
--- a/epochX/cudacpp/gg_tt.mad/SubProcesses/cudacpp.mk
+++ b/epochX/cudacpp/gg_tt.mad/SubProcesses/cudacpp.mk
@@ -874,7 +874,7 @@ endif
 $(gpu_fcheckmain): LIBFLAGS += $(GPULIBFLAGSRPATH) # avoid the need for LD_LIBRARY_PATH
 $(gpu_fcheckmain): $(BUILDDIR)/fcheck_sa_fortran.o $(BUILDDIR)/fsampler_$(GPUSUFFIX).o $(LIBDIR)/lib$(MG5AMC_GPULIB).so $(gpu_objects_exe)
 ifneq ($(findstring hipcc,$(GPUCC)),) # link fortran/c++/hip using $FC when hipcc is used #802
-       $(FC) -o $@ $(BUILDDIR)/fcheck_sa_fortran.o $(BUILDDIR)/fsampler_$(GPUSUFFIX).o $(LIBFLAGS) -lgfortran -L$(LIBDIR) -l$(MG5AMC_GPULIB) $(gpu_objec
ts_exe) -lstdc++ -L$(shell dirname $(shell $(GPUCC) -print-prog-name=clang))/../../lib -lamdhip64
+       $(FC) -o $@ $(BUILDDIR)/fcheck_sa_fortran.o $(BUILDDIR)/fsampler_$(GPUSUFFIX).o $(LIBFLAGS) -lgfortran -L$(LIBDIR) -l$(MG5AMC_GPULIB) $(gpu_objec
ts_exe) -lstdc++ -L$(shell cd -L $(shell dirname $(shell $(GPUCC) -print-prog-name=clang))/../..; pwd)/lib -lamdhip64
 else
        $(GPUCC) -o $@ $(BUILDDIR)/fcheck_sa_fortran.o $(BUILDDIR)/fsampler_$(GPUSUFFIX).o $(LIBFLAGS) -lgfortran -L$(LIBDIR) -l$(MG5AMC_GPULIB) $(gpu_ob
jects_exe)
 endif
@@ -975,7 +975,7 @@ else # link only runTest_$(GPUSUFFIX).o (new: in the past, this was linking both
 $(gpu_testmain): LIBFLAGS += $(GPULIBFLAGSRPATH) # avoid the need for LD_LIBRARY_PATH
 $(gpu_testmain): $(LIBDIR)/lib$(MG5AMC_COMMONLIB).so $(gpu_objects_lib) $(gpu_objects_exe) $(GTESTLIBS)
 ifneq ($(findstring hipcc,$(GPUCC)),) # link fortran/c++/hip using $FC when hipcc is used #802
-       $(FC) -o $@ $(gpu_objects_lib) $(gpu_objects_exe) -ldl $(LIBFLAGS) -lstdc++ -lpthread  -L$(shell dirname $(shell $(GPUCC) -print-prog-name=clang)
)/../../lib -lamdhip64
+       $(FC) -o $@ $(gpu_objects_lib) $(gpu_objects_exe) -ldl $(LIBFLAGS) -lstdc++ -lpthread -L$(shell cd -L $(shell dirname $(shell $(GPUCC) -print-pro
g-name=clang))/../..; pwd)/lib -lamdhip64
 else
        $(GPUCC) -o $@ $(gpu_objects_lib) $(gpu_objects_exe) -ldl $(LIBFLAGS) -lcuda
 endif

@valassi
Copy link
Member Author

valassi commented Sep 16, 2024

Next ugly thing

gfortran -march=znver3 -D__CRAY_X86_TRENTO -D__CRAY_AMD_GFX90A -D__CRAYXT_COMPUTE_LINUX_TARGET -D__TARGET_LINUX__ -ffixed-line-length-132 -o fcheck_hip.exe ./fcheck_sa_fortran.o ./fsampler_hip.o -L../../lib -lmg5amc_common_hip -Xlinker -rpath=$ORIGIN/../../lib -lgfortran -L../../lib -lmg5amc_gg_ttx_hip ./CommonRandomNumberKernel_hip.o ./RamboSamplingKernels_hip.o -lstdc++ -L/opt/rocm-6.0.3/lib -lamdhip64 -Wl,-rpath=/opt/cray/pe/gcc-libs -Wl,-Bdynamic -Wl,--as-needed,-lgfortran,-lquadmath,--no-as-needed -Wl,--as-needed,-lpthread,--no-as-needed -Wl,--disable-new-dtags 
/usr/bin/ld: ../../lib/libmg5amc_common_hip.so: undefined reference to `std::ios_base_library_init()@GLIBCXX_3.4.32'
collect2: error: ld returned 1 exit status

@valassi
Copy link
Member Author

valassi commented Sep 16, 2024

This is probably a problem in my setup

module load LUMI/23.09 partition/G
module load cpeGNU/23.09
export CC="cc --cray-bypass-pkgconfig -craype-verbose"
export CXX="CC --cray-bypass-pkgconfig -craype-verbose"
export FC="ftn --cray-bypass-pkgconfig -craype-verbose -ffixed-line-length-132"

The login screen says the following, so I nee dto update

*  | NOTE: The default version of both the Cray PE and LUMI stack   |
   | is now 24.03 and this is also the only version of the Cray PE  |
  *| officially supported on the current system. We recommend       |
   | moving to 24.03 when possible. Base libraries for 24.03 and    |
  *| 23.12 are already on the system and much user-installable      |
   | software for 24.03 is already available also.                  |
*  |                                                                |
   | Unfortunately, due to the late decision to move directly to    |
*  | ROCm 6.0 rather than the originally planned 5.7, we are not    |
  *| able to keep our promise of fully supporting 23.09 also. We    |
  *| did test many of the build recipes and tried to fix problems   |
   | when possible, but a recompile of GPU software may be needed.  |

@valassi
Copy link
Member Author

valassi commented Sep 16, 2024

Ok this solves the issue

module load LUMI/24.03 partition/G
module load cpeGNU/24.03
export CC="cc --cray-bypass-pkgconfig -craype-verbose"
export CXX="CC --cray-bypass-pkgconfig -craype-verbose"
export FC="ftn --cray-bypass-pkgconfig -craype-verbose -ffixed-line-length-132"

valassi added a commit to valassi/madgraph4gpu that referenced this issue Sep 18, 2024
…th to libamdhip64 madgraph5#998

Also fix the LUMI setup to solve a second issue (move from 23.09 to 24.03)
  module load LUMI/24.03 partition/G
  module load cpeGNU/24.03
  export CC="cc --cray-bypass-pkgconfig -craype-verbose"
  export CXX="CC --cray-bypass-pkgconfig -craype-verbose"
  export FC="ftn --cray-bypass-pkgconfig -craype-verbose -ffixed-line-length-132"

(I checked that gg_tt.mad is regenerated as expected)
@valassi valassi linked a pull request Sep 19, 2024 that will close this issue
@valassi valassi changed the title Test the latest master on AMD GPUs on LUMI Test the latest master on AMD GPUs on LUMI with ROCm 6.0 Sep 19, 2024
@valassi
Copy link
Member Author

valassi commented Sep 19, 2024

Renamed to mention ROCm 6.0

This bunch of tests is complete. There are always the usual problems on LUMI

The test logs are in #1006. Closing

@valassi valassi closed this as completed Sep 19, 2024
zeniheisser pushed a commit to zeniheisser/madgraph4gpu that referenced this issue Sep 23, 2024
…th to libamdhip64 madgraph5#998

Also fix the LUMI setup to solve a second issue (move from 23.09 to 24.03)
  module load LUMI/24.03 partition/G
  module load cpeGNU/24.03
  export CC="cc --cray-bypass-pkgconfig -craype-verbose"
  export CXX="CC --cray-bypass-pkgconfig -craype-verbose"
  export FC="ftn --cray-bypass-pkgconfig -craype-verbose -ffixed-line-length-132"

(I checked that gg_tt.mad is regenerated as expected)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant