Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with --local command line parameter. #49

Open
DitiD opened this issue Oct 21, 2022 · 6 comments
Open

Issue with --local command line parameter. #49

DitiD opened this issue Oct 21, 2022 · 6 comments

Comments

@DitiD
Copy link

DitiD commented Oct 21, 2022

While executing the test case blocked_transform which is present under runtime (https://github.com/bcosenza/sycl-bench/blob/master/runtime/blocked_transform.cpp), we noticed that we are getting a core dump error.

Command used to execute - ./blocked_transform --device=gpu

Output -

********** Results for Runtime_BlockedTransform_iter_64_blocksize_0**********
problem-size: 3072
local-size: 1024
device-name: NVIDIA RTX A4000
sycl-implementation: LLVM CUDA (Codeplay)
blocked_transform: /tmp/llvm-sycl-nightly-20220222/sycl/source/detail/scheduler/commands.cpp:1826: void cl::sycl::detail::adjustNDRangePerKernel(cl::sycl::detail::NDRDescT&, cl::sycl::detail::pi::PiKernel, const cl::sycl::detail::device_impl&): Assertion `NDR.NumWorkGroups[0] != 0 && NDR.LocalSize[0] == 0' failed.
Aborted (core dumped)

However, when we are explicitly assigning the value of the --local parameter to 256 (which is the default value) during runtime, it is executing without any errors.

Command used to execute - ./blocked_transform --device=gpu --local=256

We would like to know if there is a fix for this issue? If so, where can we get the revised code?

@DitiD DitiD changed the title Issue with --local command line paramete. Issue with --local command line parameter. Oct 21, 2022
@DitiD
Copy link
Author

DitiD commented Oct 27, 2022

Hi, we would like to know if there are any solutions for the above mentioned issue. Thanks.

@illuhad
Copy link
Collaborator

illuhad commented Oct 27, 2022

I cannot reproduce the issue with hipSYCL. Your output indicates that somehow a block size of 0 enters the benchmark. This value is derived from the local size. I had a quick look at the code paths, and I don't understand how this could happen - it does not for me.
There's an assert that checks that the block size is non-zero. Can you check what happens when compiling with debug assertions enabled?

@DitiD
Copy link
Author

DitiD commented Oct 31, 2022

Hi, we are not working with hipSYCL. The issue that we are facing is occurring during runtime. The test case is failing to execute when we are not passing the local parameter (as in, when it is taking the value of local parameter as 256 by default).

Command being used to execute - ./blocked_transform --device=gpu

However, it is working fine when we are explicitly defining the local parameter to 256 during runtime.

Command being used to execute - ./blocked_transform --device=gpu --local=256

We are not sure as to why this issue is occurring.

Thanks.

@illuhad
Copy link
Collaborator

illuhad commented Oct 31, 2022

Hi, we are not working with hipSYCL. The issue that we are facing is occurring during runtime.

I'm aware of this. But I don't have an installation of the DPC++ SYCL implementation with CUDA backend here. I'm just saying I cannot reproduce this with my setup. And I don't understand why DPC++ or hipSYCL would behave differently here anyway. The error does not seem to be related to SYCL specific functionality.

The test case is failing to execute when we are not passing the local parameter (as in, when it is taking the value of local parameter as 256 by default).
Command being used to execute - ./blocked_transform --device=gpu
However, it is working fine when we are explicitly defining the local parameter to 256 during runtime.
Command being used to execute - ./blocked_transform --device=gpu --local=256

I understood this. As I've said I cannot reproduce here. Command line option handling is the same for DPC++ and hipSYCL. For further investigation into the issue, I asked you the following:

There's an assert that checks that the block size is non-zero. Can you check what happens when compiling with debug assertions enabled?

i.e. make sure that the NDEBUG macro is not set when building.

@DitiD
Copy link
Author

DitiD commented Nov 7, 2022

Hi, as suggested, I've added the following in the blocked_transform.cpp code and I've rebuilt it again.

#include<assert.h>
#define NDEBUG

It seems that by default, the value of local size is being taken as 1024 (please see attached screenshot below).

Capture

However, when I am defining '--local' to be either 256 (default value) or 1024 explicitly, it is working fine.

Command being used:

./blocked_transform --device=gpu --local=256
./blocked_transform --device=gpu --local=1024

Could this be a bug in the code?
Thanks.

@DitiD
Copy link
Author

DitiD commented Nov 29, 2022

Hi, is there any update regarding this issue? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants