Image support across multiple devices #218

agerlach · 2017-02-08T17:57:33Z

In issues #202 and #213 you added image support for multiple devices for vex::symbolic by creating an image on each device and then passing a vector of the images as a kernel argument. Is there a logical way to extend this capability for non-symbolic code?

For example, in tests/image.cpp you have the following in the OpenCL test:

...
cl::Image1D image(ctx.context(0), CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
            cl::ImageFormat(CL_RGBA, CL_FLOAT), m, imdata.data());

vex::vector<int> p(q1, n);
p = vex::element_index() % m;

vex::vector<cl_float4> x(q1, n);

x = imread(image, p);

Unfortunately, this fails when p is a vex::vector that is spread out across multiple devices b/c image only exists on the first device.

ddemidov · 2017-02-08T18:22:13Z

Is there a logical way to extend this capability for non-symbolic code?

Not at the moment. For now, you could do this explicitly, wrapping OpenCL buffers owned by vex::vector and allocated on different devices into temporary vectors:

for(int dev = 0; dev < ctx.size(); ++dev) {
  vex::vector<cl_float4> tx(ctx.queue(dev), x(dev));
  vex::vector<int> tp(ctx.queue(dev), p(dev))
  // The following will be applied to the chunk of x located on the current device:
  tx = imread(image[dev], tp);
}

Here image is an std::vector<cl::Image1D>, created in the same way as in #202/#213.

It should be possible to provide a generic wrapper class that would allow to use a std::vector of arbitrary objects in vexcl expressions, and would use the corresponding object on each of the compute devices. Any ideas on how to name such a class?

agerlach · 2017-02-08T19:13:13Z

It should be possible to provide a generic wrapper class that would allow to use a std::vector of arbitrary objects in vexcl expressions

I don't have any solid ideas for this, just brainstorming...

What is the use case and value of being able to pass different objects to each device vs the same object to multiple devices? In my application I have a single image that needs copied to all devices. Additionally these objects are all constants on the device. What about adding the ability to specify a constant that is an arbitrary object?

For this case it would be nice to be able to do something like this:

cl::Image1D image(ctx.context(0), CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
            cl::ImageFormat(CL_RGBA, CL_FLOAT), m, imdata.data());

vex::constant<cl::Image1D> imageConst(ctx, image);

Here, vex::constant is used analogously to vex::vector but each element corresponds to an object across devices, vs across threads.

I think the code you posted is a solution to my specific question. A couple questions about you the code you posted:

Is data actually copied when doing vex::vector<int> tp(ctx.context(dev), p(dev))? My understanding from your documentation is that no copy occurs. I just want to verify that that is true.
Are any of the lines in the for loop blocking, i.e., is this effectively calling imread serially for each device?

ddemidov · 2017-02-09T05:43:00Z

What about adding the ability to specify a constant that is an arbitrary object?

Each compute device has separate memory space, so any object with a state needs to have instance on each of the devices. So, even if you use same image on each GPU, you need to create it explicitly on each GPU. And, if we have to pass a vector of instances anyway, then why restrict ourselves with the requirement that all instances are the same?

For this case it would be nice to be able to do something like this:

vex::constant in vexcl is basically a stateless literal that is inserted verbatim into kernel source (and hence does not need kernel parameters to work). That is not the case with images.

Is data actually copied when doing vex::vector tp(ctx.context(dev), p(dev))? My understanding from your documentation is that no copy occurs. I just want to verify that that is true.

That is correct, no data is copied (except for some host-side handles).

Are any of the lines in the for loop blocking, i.e., is this effectively calling imread serially for each device?

No, each iteration in the loop and the loop as a whole should be completely asynchronous. This is basically how things work under the hood anyway.

agerlach · 2017-02-09T14:29:37Z

That all makes sense. Thanks.

As far as the "generic wrapper class", the solution you posted above is simple and explicit, I don't know if a special class is necessary. When I posed the questions, I knew there had to be a way to operator on the vector parts, but it wasn't obvious to me how to do that. I don't know how I missed that in the documentation. In retrospect, it is pretty clear.

ddemidov · 2017-02-09T15:18:37Z

Ok, I currently don't see where else such a class would be useful, so I'll close this for now.

ddemidov closed this as completed Feb 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image support across multiple devices #218

Image support across multiple devices #218

agerlach commented Feb 8, 2017

ddemidov commented Feb 8, 2017 •

edited

Loading

agerlach commented Feb 8, 2017

ddemidov commented Feb 9, 2017

agerlach commented Feb 9, 2017

ddemidov commented Feb 9, 2017

Image support across multiple devices #218

Image support across multiple devices #218

Comments

agerlach commented Feb 8, 2017

ddemidov commented Feb 8, 2017 • edited Loading

agerlach commented Feb 8, 2017

ddemidov commented Feb 9, 2017

agerlach commented Feb 9, 2017

ddemidov commented Feb 9, 2017

ddemidov commented Feb 8, 2017 •

edited

Loading