Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-node benchmarks with Legion #97

Open
ysfess22 opened this issue Oct 5, 2023 · 3 comments
Open

Multi-node benchmarks with Legion #97

ysfess22 opened this issue Oct 5, 2023 · 3 comments

Comments

@ysfess22
Copy link

ysfess22 commented Oct 5, 2023

Hi,
I'm trying to benchmark Legion on a 2 nodes cluster. I would like to confirm if the output I'm getting is the expected behaviour.
To setup the the benchmarks I did the following:

USE_GASNET=1 USE_LEGION=1 ./get_deps.sh
USE_GASNET=1 CONDUIT=udp ./build_all.sh

I designated the two nodes (export SSH_SERVERS = node1,node2) and set up password-less authentication for the two nodes.
using ./task_bench -width 40 -steps 100 -type stencil_1d -kernel compute_bound -iter 1024
gives a GASNet related error where I have to mention the number of nodes.

So I'm using: ./task_bench 2 -width 40 -steps 100 -type stencil_1d -kernel compute_bound -iter 1024 instead.

The results I'm getting from doing this is: 8000 tasks launched, 4000 in node 1 and 4000 in node 2.
Once both nodes are done with their 4000 tasks, I get the results of the benchmark: with running time, FLOP/s etc...

I'm wondering if this is the result I should expect, or is there a way to make the 2 nodes split the 4000 tasks between them?

Thank you.

@elliottslaughter
Copy link
Contributor

Can you do a run with -level announce=2 and let me know what you see in the logs?

This behavior makes it sound like GASNet is failing to connect properly, resulting in each job being run in isolation. The announce logging would confirm if that's the case or not.

@ysfess22
Copy link
Author

ysfess22 commented Oct 5, 2023

image

My benchmark now runs as I expect it. The tasks are split between the nodes and no extra tasks are spawned. I think the problem was that I didn't pay attention to the warm-up run that the legion implementation does and that's why the number of tasks looked double.

@elliottslaughter
Copy link
Contributor

Ok, cool. Let me know if you have any other questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants