-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image support across multiple devices #218
Comments
Not at the moment. For now, you could do this explicitly, wrapping OpenCL buffers owned by for(int dev = 0; dev < ctx.size(); ++dev) {
vex::vector<cl_float4> tx(ctx.queue(dev), x(dev));
vex::vector<int> tp(ctx.queue(dev), p(dev))
// The following will be applied to the chunk of x located on the current device:
tx = imread(image[dev], tp);
} Here It should be possible to provide a generic wrapper class that would allow to use a |
I don't have any solid ideas for this, just brainstorming... What is the use case and value of being able to pass different objects to each device vs the same object to multiple devices? In my application I have a single image that needs copied to all devices. Additionally these objects are all constants on the device. What about adding the ability to specify a constant that is an arbitrary object? For this case it would be nice to be able to do something like this: cl::Image1D image(ctx.context(0), CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
cl::ImageFormat(CL_RGBA, CL_FLOAT), m, imdata.data());
vex::constant<cl::Image1D> imageConst(ctx, image); Here, vex::constant is used analogously to vex::vector but each element corresponds to an object across devices, vs across threads. I think the code you posted is a solution to my specific question. A couple questions about you the code you posted:
|
Each compute device has separate memory space, so any object with a state needs to have instance on each of the devices. So, even if you use same image on each GPU, you need to create it explicitly on each GPU. And, if we have to pass a vector of instances anyway, then why restrict ourselves with the requirement that all instances are the same?
That is correct, no data is copied (except for some host-side handles).
No, each iteration in the loop and the loop as a whole should be completely asynchronous. This is basically how things work under the hood anyway. |
That all makes sense. Thanks. As far as the "generic wrapper class", the solution you posted above is simple and explicit, I don't know if a special class is necessary. When I posed the questions, I knew there had to be a way to operator on the vector parts, but it wasn't obvious to me how to do that. I don't know how I missed that in the documentation. In retrospect, it is pretty clear. |
Ok, I currently don't see where else such a class would be useful, so I'll close this for now. |
In issues #202 and #213 you added image support for multiple devices for vex::symbolic by creating an image on each device and then passing a vector of the images as a kernel argument. Is there a logical way to extend this capability for non-symbolic code?
For example, in tests/image.cpp you have the following in the OpenCL test:
Unfortunately, this fails when p is a vex::vector that is spread out across multiple devices b/c image only exists on the first device.
The text was updated successfully, but these errors were encountered: