Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image support across multiple devices #218

Closed
agerlach opened this issue Feb 8, 2017 · 5 comments
Closed

Image support across multiple devices #218

agerlach opened this issue Feb 8, 2017 · 5 comments

Comments

@agerlach
Copy link

agerlach commented Feb 8, 2017

In issues #202 and #213 you added image support for multiple devices for vex::symbolic by creating an image on each device and then passing a vector of the images as a kernel argument. Is there a logical way to extend this capability for non-symbolic code?

For example, in tests/image.cpp you have the following in the OpenCL test:

...
cl::Image1D image(ctx.context(0), CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
            cl::ImageFormat(CL_RGBA, CL_FLOAT), m, imdata.data());

vex::vector<int> p(q1, n);
p = vex::element_index() % m;

vex::vector<cl_float4> x(q1, n);

x = imread(image, p);

Unfortunately, this fails when p is a vex::vector that is spread out across multiple devices b/c image only exists on the first device.

@ddemidov
Copy link
Owner

ddemidov commented Feb 8, 2017

Is there a logical way to extend this capability for non-symbolic code?

Not at the moment. For now, you could do this explicitly, wrapping OpenCL buffers owned by vex::vector and allocated on different devices into temporary vectors:

for(int dev = 0; dev < ctx.size(); ++dev) {
  vex::vector<cl_float4> tx(ctx.queue(dev), x(dev));
  vex::vector<int> tp(ctx.queue(dev), p(dev))
  // The following will be applied to the chunk of x located on the current device:
  tx = imread(image[dev], tp);
}

Here image is an std::vector<cl::Image1D>, created in the same way as in #202/#213.

It should be possible to provide a generic wrapper class that would allow to use a std::vector of arbitrary objects in vexcl expressions, and would use the corresponding object on each of the compute devices. Any ideas on how to name such a class?

@agerlach
Copy link
Author

agerlach commented Feb 8, 2017

It should be possible to provide a generic wrapper class that would allow to use a std::vector of arbitrary objects in vexcl expressions

I don't have any solid ideas for this, just brainstorming...

What is the use case and value of being able to pass different objects to each device vs the same object to multiple devices? In my application I have a single image that needs copied to all devices. Additionally these objects are all constants on the device. What about adding the ability to specify a constant that is an arbitrary object?

For this case it would be nice to be able to do something like this:

cl::Image1D image(ctx.context(0), CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
            cl::ImageFormat(CL_RGBA, CL_FLOAT), m, imdata.data());

vex::constant<cl::Image1D> imageConst(ctx, image);

Here, vex::constant is used analogously to vex::vector but each element corresponds to an object across devices, vs across threads.

I think the code you posted is a solution to my specific question. A couple questions about you the code you posted:

  1. Is data actually copied when doing vex::vector<int> tp(ctx.context(dev), p(dev))? My understanding from your documentation is that no copy occurs. I just want to verify that that is true.
  2. Are any of the lines in the for loop blocking, i.e., is this effectively calling imread serially for each device?

@ddemidov
Copy link
Owner

ddemidov commented Feb 9, 2017

What about adding the ability to specify a constant that is an arbitrary object?

Each compute device has separate memory space, so any object with a state needs to have instance on each of the devices. So, even if you use same image on each GPU, you need to create it explicitly on each GPU. And, if we have to pass a vector of instances anyway, then why restrict ourselves with the requirement that all instances are the same?

For this case it would be nice to be able to do something like this:

vex::constant in vexcl is basically a stateless literal that is inserted verbatim into kernel source (and hence does not need kernel parameters to work). That is not the case with images.

Is data actually copied when doing vex::vector tp(ctx.context(dev), p(dev))? My understanding from your documentation is that no copy occurs. I just want to verify that that is true.

That is correct, no data is copied (except for some host-side handles).

Are any of the lines in the for loop blocking, i.e., is this effectively calling imread serially for each device?

No, each iteration in the loop and the loop as a whole should be completely asynchronous. This is basically how things work under the hood anyway.

@agerlach
Copy link
Author

agerlach commented Feb 9, 2017

That all makes sense. Thanks.

As far as the "generic wrapper class", the solution you posted above is simple and explicit, I don't know if a special class is necessary. When I posed the questions, I knew there had to be a way to operator on the vector parts, but it wasn't obvious to me how to do that. I don't know how I missed that in the documentation. In retrospect, it is pretty clear.

@ddemidov
Copy link
Owner

ddemidov commented Feb 9, 2017

Ok, I currently don't see where else such a class would be useful, so I'll close this for now.

@ddemidov ddemidov closed this as completed Feb 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants