Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solving an ODE w/ Interpolation #202

Closed
agerlach opened this issue Apr 29, 2016 · 25 comments
Closed

Solving an ODE w/ Interpolation #202

agerlach opened this issue Apr 29, 2016 · 25 comments

Comments

@agerlach
Copy link

I have been using ODEINT for doing Monte Carlo simulations of a parachute system descending to the ground model as a point mass. I would like to utilize VexCL to leverage GPUs to speed up computation. I have mostly been able to modify your symbolic example for my particular equations of motion, but I have run into an issue. I need to interpolate a 3D wind field as a function of the position of the system in order to determine the drag forces acting on the system. I would ideally like to leverage the texture interpolation capabilities of the GPU to do this. Do you have any suggestions?

Thanks

If it helps, here is description of the system model:

eom

@ddemidov
Copy link
Owner

Hi,

Unfortunately, VexCL does not expose the OpenCL images/CUDA textures to the user, but it is possible to use those with a little bit of magic. Below is an example that creates an image reading function that works both in a normal vector expression and in the symbolic context. Note that the example is OpenCL-specific, and it depends on the latest commit (e3a6960). It uses 2D image, but I guess it should be possible to convert it for 3D (I have little experience with Image types in OpenCL, so I can not really help you more).

#include <vexcl/vexcl.hpp>

namespace vex {

// Allow cl::Image2D in vector expressions:
namespace traits {
template <> struct is_vector_expr_terminal<cl::Image2D> : std::true_type {};
}

// Let the kernel generator know how to spell cl::Image2D type:
template <>
struct type_name_impl<cl::Image2D> {
    static std::string get() {
        return "image2d_t";
    }
};

}

// Define custom user function that reads a value from an image.
// Need to do it like this (as opposed to using VEX_FUNCTION macro)
// to be able to define sampler object with the function definition.
struct imread_t : vex::UserFunction< imread_t, float(cl::Image2D, cl_int2)> {
    static std::string name() {
        return "imread";
    }

    static void define(vex::backend::source_generator &src) {
        src << VEX_STRINGIZE_SOURCE(
                __constant sampler_t sampler =
                      CLK_NORMALIZED_COORDS_FALSE
                    | CLK_ADDRESS_CLAMP_TO_EDGE
                    | CLK_FILTER_NEAREST;

                float imread(__read_only image2d_t im, int2 i) {
                    return read_imagef(im, sampler, i).x;
                }
                );
    }
} const imread;

int main() {
    vex::Context ctx(vex::Filter::Env && vex::Filter::Count(1));
    std::cout << ctx << std::endl;

    // Create 2D Image, fill it with data.
    const int m = 16;
    std::vector<float> imdata(m * m * 3, 42.0f);
    cl::Image2D image(ctx.context(0), CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
            cl::ImageFormat(CL_RGBA, CL_FLOAT), m, m, 0, imdata.data());


    // Vector of image coordinates (all pointing to the same location for simplicity).
    vex::vector<cl_int2>  p(ctx, 16);
    p = 1;

    // The output vector.
    vex::vector<float> x(ctx, 16);

    // This works:
    x = imread(image, p);
    std::cout << "v1: x = " << x << std::endl;

    // Generating a kernel that uses the function is also possible:
    x = 0;

    // Kernel body:
    std::ostringstream body;
    vex::generator::set_recorder(body);

    // Symbolic variables:
    vex::symbolic<float>       X(vex::symbolic<float>::VectorParameter);
    vex::symbolic<cl::Image2D> I(vex::symbolic<cl::Image2D>::ScalarParameter);
    vex::symbolic<cl_int2>     P(vex::symbolic<cl_int2>::VectorParameter, vex::symbolic<cl_int2>::Const);

    // Record the image reading operation:
    X = imread(I, P);

    // Generate the kernel:
    auto the_answer = vex::generator::build_kernel(ctx, "the_answer", body.str(), X, I, P);

    // Call the kernel:
    the_answer(x, image, p);

    std::cout << "v2: x = " << x << std::endl;
}

@ddemidov
Copy link
Owner

Writing the example made me think the scope_type and constness_type enums should be moved out of the symbolic class template to make the declaration of symbolic variables easier.

@agerlach
Copy link
Author

This "magic" looks really good. I need to take some more time to understand what you have here, though. FYI, I was getting the following compilation error w/ LLVM 6.0:

Default initialization of an object of const type 'const struct imread_t' requires a user-provided default constructor

Adding imread_t() {}; to struct imreat_t, fixed the issue.

@agerlach
Copy link
Author

After looking at this some more, it is unclear to me how to use the result of imread() in an ODE. I've modified your example slightly to consider 3D images and tried using it with your symbolic example using a Lorenz system. Here the first two components of the interpolated image are the sigma and b parameters in the model.

The issue I have is that I don't know what I can do with interpResult in sys_func(). I would like to set sigma and b to interpResult.x and interpResult.y, respectively. Do you have any thoughts, or would you approach this differently?

Thanks.

#include <iostream>
#include <vector>
#include <array>
#include <functional>

#include <vexcl/devlist.hpp>
#include <vexcl/vector.hpp>
#include <vexcl/multivector.hpp>
#include <vexcl/generator.hpp>
#include <vexcl/element_index.hpp>

// http://headmyshoulder.github.com/odeint-v2
#include <boost/numeric/odeint.hpp>

#include <vexcl/vexcl.hpp>
using namespace std;
namespace odeint = boost::numeric::odeint;

typedef float value_type;
typedef vex::symbolic< value_type > sym_value;
typedef std::array<sym_value, 3> sym_state;

namespace vex {

    // Allow cl::Image3D in vector expressions:
    namespace traits {
        template <> struct is_vector_expr_terminal<cl::Image3D> : std::true_type {};
    }

    // Let the kernel generator know how to spell cl::Image3D type:
    template <>
    struct type_name_impl<cl::Image3D> {
        static std::string get() {
            return "image3d_t";
        }
    };

}

// Define custom user function that reads a value from an image.
// Need to do it like this (as opposed to using VEX_FUNCTION macro)
// to be able to define sampler object with the function definition.
struct imread_t : vex::UserFunction< imread_t, cl_float4(cl::Image3D, float,float,float)>{    static std::string name() {
        return "imread";
    }

    imread_t() {};
    static void define(vex::backend::source_generator &src) {
        src << VEX_STRINGIZE_SOURCE(
                                    __constant sampler_t sampler =
                                    CLK_NORMALIZED_COORDS_TRUE
                                    | CLK_ADDRESS_CLAMP_TO_EDGE
                                    | CLK_FILTER_NEAREST;

                                    float4 imread(__read_only image3d_t data, float x, float y, float z){
                                        return read_imagef(data, sampler, (float4)(x,y,z,1.));
                                    }
                                    );
    }
} const imread;

struct sys_func
{

    const sym_value &R;
    const vex::symbolic<cl::Image3D>  &image;
    vex::symbolic<cl_float4> interpResult;

    sys_func(const vex::symbolic<cl::Image3D> &image, const sym_value &R, vex::symbolic<cl_float4> &interpResult) : image(image), R(R), interpResult(interpResult) {}

    template <class Sig>
    struct result {
        typedef void type;
    };

    void operator()( const sym_state &x , sym_state &dxdt , value_type ) const
    {
        interpResult = imread(image, x[0],x[1],x[2]);

        value_type sigma = 10.0;// Needs to be interpResult.x
        value_type b = 8.0/3.0; // Needs to be interpResult.y

        dxdt[0] = sigma * (x[1] - x[0]);
        dxdt[1] = R * x[0] - x[1] - x[0] * x[2];
        dxdt[2] = x[0] * x[1] - b * x[2];
    }
};

int main( int argc , char **argv )
{
    size_t n;
    const value_type dt = 0.01;
    const value_type t_max = 100.0;

    using namespace std;

    n = argc > 1 ? atoi( argv[1] ) : 1024;

    vex::Context ctx( vex::Filter::DoublePrecision && vex::Filter::GPU );
    cout << ctx << endl;

    // Custom kernel body will be recorded here:
    std::ostringstream body;
    vex::generator::set_recorder(body);

    // State types that would become kernel parameters:
    sym_state sym_S = {
        sym_value(sym_value::VectorParameter),
        sym_value(sym_value::VectorParameter),
        sym_value(sym_value::VectorParameter)
    };

    // Const kernel parameter.
    sym_value sym_R(sym_value::VectorParameter, sym_value::Const);

    vex::symbolic<cl_float4>       sym_InterpResult(vex::symbolic<cl_float4>::VectorParameter);
    vex::symbolic<cl::Image3D> sym_Image(vex::symbolic<cl::Image3D>::ScalarParameter);


    // Symbolic stepper:
    odeint::runge_kutta4<
                    sym_state , value_type , sym_state , value_type ,
                    odeint::range_algebra , odeint::default_operations
                    > sym_stepper;

    sys_func sys(sym_Image,sym_R,sym_InterpResult);
    sym_stepper.do_step(std::ref(sys), sym_S, 0, dt);

    auto kernel = vex::generator::build_kernel(ctx, "lorenz", body.str(),
                                               sym_S[0], sym_S[1], sym_S[2], sym_R, sym_Image, sym_InterpResult
                                               );

    // Real state initialization:
    value_type Rmin = 0.1;
    value_type Rmax = 50.0;
    value_type dR   = (Rmax - Rmin) / (n - 1);

    vex::vector<value_type> X(ctx, n);
    vex::vector<value_type> Y(ctx, n);
    vex::vector<value_type> Z(ctx, n);
    vex::vector<value_type> R(ctx, n);
    vex::vector<cl_float4> interpResult(ctx, n);

    // Create 3D Image, fill it with data.
    const int m = 25;
    std::vector<float> imdata(m * m *m * 3, 42.0f);
    for (int i = 0; i < m * m * m * 3; i+=3) {
        imdata[i] = 10.;
        imdata[i+1] = -8.0/3.0;
        imdata[i+2] = 1;
    }

    cl::ImageFormat format(CL_RGB, CL_FLOAT);
    cl::Image3D image(ctx.context(0), CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
                      format, m, m, m, 0,0, imdata.data());

    // Initialize
    X = 1.0;
    Y = 10.0;
    Z = 10.0;
    interpResult = 0.0;
    R = Rmin + dR * vex::element_index();

    // Integration loop:
    for(value_type t = 0; t < t_max; t += dt)
        kernel(X, Y, Z, R, image, interpResult);

    // Copy and print results
    std::vector< value_type > result( n );
    vex::copy( X , result );
    for (int ii = 0; ii < result.size(); ++ii)
        cout << result[ii] << endl;
}

@ddemidov
Copy link
Owner

ddemidov commented Apr 29, 2016

If I understand correctly, you need to introduce a couple of helper functions:

VEX_FUNCTION(float, getx, (cl_float4, v), return v.x; );
VEX_FUNCTION(float, gety, (cl_float4, v), return v.y; );

And use the functions inside sys_func::operator():

value_type sigma = getx(interpResult);
value_type b = gety(interpResult);

Would that work?

@ddemidov
Copy link
Owner

Also you probably want to use CLK_FILTER_LINEAR instead of CLK_FILTER_NEAREST in the definition of the sampler object to actually get the interpolated values. Sorry for my misleading example.

@agerlach
Copy link
Author

agerlach commented May 2, 2016

I had already made the change to CLK_FILTER_LINEAR in my code, but I forgot to change in the Lorenz example I posted.

The helper functions you suggested work perfectly, however I had compilation errors and needed to change the use of the function inside sys_func::operator() from

value_type sigma = getx(interpResult);
value_type b = gety(interpResult);

to

vex::symbolic<float> sigma = getx(interpResult);
vex::symbolic<float> b = gety(interpResult);

I've included the final working source code below for anybody else who comes across this. Do you think the ODEINT team would be interested in including such an example? System parameters interpolation and / or table look-ups are quite common in ODEs.

#include <iostream>
#include <vector>
#include <array>
#include <functional>

#include <vexcl/devlist.hpp>
#include <vexcl/vector.hpp>
#include <vexcl/multivector.hpp>
#include <vexcl/generator.hpp>
#include <vexcl/element_index.hpp>

// http://headmyshoulder.github.com/odeint-v2
#include <boost/numeric/odeint.hpp>

#include <vexcl/vexcl.hpp>
using namespace std;
namespace odeint = boost::numeric::odeint;

typedef float value_type;
typedef vex::symbolic< value_type > sym_value;
typedef std::array<sym_value, 3> sym_state;

VEX_FUNCTION(float, getx, (cl_float4, v), return v.x; );
VEX_FUNCTION(float, gety, (cl_float4, v), return v.y; );

namespace vex {

    // Allow cl::Image3D in vector expressions:
    namespace traits {
        template <> struct is_vector_expr_terminal<cl::Image3D> : std::true_type {};
    }

    // Let the kernel generator know how to spell cl::Image3D type:
    template <>
    struct type_name_impl<cl::Image3D> {
        static std::string get() {
            return "image3d_t";
        }
    };

}

// Define custom user function that reads a value from an image.
// Need to do it like this (as opposed to using VEX_FUNCTION macro)
// to be able to define sampler object with the function definition.
struct imread_t : vex::UserFunction< imread_t, cl_float4(cl::Image3D, float,float,float)>{    static std::string name() {
        return "imread";
    }

    imread_t() {};
    static void define(vex::backend::source_generator &src) {
        src << VEX_STRINGIZE_SOURCE(
                                    __constant sampler_t sampler =
                                    CLK_NORMALIZED_COORDS_TRUE
                                    | CLK_ADDRESS_CLAMP_TO_EDGE
                                    | CLK_FILTER_LINEAR;

                                    float4 imread(__read_only image3d_t data, float x, float y, float z){
                                        return read_imagef(data, sampler, (float4)(x,y,z,1.));
                                    }
                                    );
    }
} const imread;

struct sys_func
{

    const sym_value &R;
    const vex::symbolic<cl::Image3D>  &image;
    vex::symbolic<cl_float4> interpResult;

    sys_func(const vex::symbolic<cl::Image3D> &image, const sym_value &R, vex::symbolic<cl_float4> &interpResult) : image(image), R(R), interpResult(interpResult) {}

    template <class Sig>
    struct result {
        typedef void type;
    };



    void operator()( const sym_state &x , sym_state &dxdt , value_type ) const
    {
        interpResult = imread(image, x[0],x[1],x[2]);  // Do interpolation

        sym_value sigma = getx(interpResult);  // Get X component of float4
        sym_value b = gety(interpResult);      // Get Y component of float4

        dxdt[0] = sigma * (x[1] - x[0]);
        dxdt[1] = R * x[0] - x[1] - x[0] * x[2];
        dxdt[2] = x[0] * x[1] - b * x[2];
    }
};

int main( int argc , char **argv )
{
    size_t n;
    const value_type dt = 0.01;
    const value_type t_max = 100.0;

    using namespace std;

    n = argc > 1 ? atoi( argv[1] ) : 1024;

    vex::Context ctx( vex::Filter::DoublePrecision && vex::Filter::GPU );
    cout << ctx << endl;

    // Custom kernel body will be recorded here:
    std::ostringstream body;
    vex::generator::set_recorder(body);

    // State types that would become kernel parameters:
    sym_state sym_S = {
        sym_value(sym_value::VectorParameter),
        sym_value(sym_value::VectorParameter),
        sym_value(sym_value::VectorParameter)
    };

    // Const kernel parameter.
    sym_value sym_R(sym_value::VectorParameter, sym_value::Const);

    vex::symbolic<cl_float4>       sym_InterpResult(vex::symbolic<cl_float4>::VectorParameter);
    vex::symbolic<cl::Image3D> sym_Image(vex::symbolic<cl::Image3D>::ScalarParameter);


    // Symbolic stepper:
    odeint::runge_kutta4<
                    sym_state , value_type , sym_state , value_type ,
                    odeint::range_algebra , odeint::default_operations
                    > sym_stepper;

    sys_func sys(sym_Image,sym_R,sym_InterpResult);
    sym_stepper.do_step(std::ref(sys), sym_S, 0, dt);

    auto kernel = vex::generator::build_kernel(ctx, "lorenz", body.str(),
                                               sym_S[0], sym_S[1], sym_S[2], sym_R, sym_Image, sym_InterpResult
                                               );

    // Real state initialization:
    value_type Rmin = 0.1;
    value_type Rmax = 50.0;
    value_type dR   = (Rmax - Rmin) / (n - 1);

    vex::vector<value_type> X(ctx, n);
    vex::vector<value_type> Y(ctx, n);
    vex::vector<value_type> Z(ctx, n);
    vex::vector<value_type> R(ctx, n);
    vex::vector<cl_float4> interpResult(ctx, n);

    // Create 3D Image, fill it with data.
    const int m = 25;
    std::vector<float> imdata(m * m *m * 3, 42.0f);
    for (int i = 0; i < m * m * m * 3; i+=3) {
        imdata[i] = 10.;            // R component
        imdata[i+1] = 8.0/3.0;      // G component
        imdata[i+2] = 1;            // B component
    }

    cl::ImageFormat format(CL_RGB, CL_FLOAT);
    cl::Image3D image(ctx.context(0), CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
                      format, m, m, m, 0,0, imdata.data());

    // Initialize
    X = 1.0;
    Y = 10.0;
    Z = 10.0;
    interpResult = 0.0;
    R = Rmin + dR * vex::element_index();

    // Integration loop:
    for(value_type t = 0; t < t_max; t += dt)
        kernel(X, Y, Z, R, image, interpResult);

    // Copy and print results
    std::vector< value_type > result( n );
    vex::copy( X , result );
    for (int ii = 0; ii < result.size(); ++ii)
        cout << result[ii] << endl;
}

@agerlach agerlach closed this as completed May 2, 2016
@ddemidov
Copy link
Owner

ddemidov commented May 2, 2016

This looks nice!

I've included the final working source code below for anybody else who comes across this. Do you think the ODEINT team would be interested in including such an example? System parameters interpolation and / or table look-ups are quite common in ODEs.

Pinging @headmyshoulder and @mariomulansky. What do you guys think? Is it worth a PR?

@ddemidov
Copy link
Owner

ddemidov commented May 2, 2016

One note: I think you can keep interpResult as a local variable. That is, just change the line

interpResult = imread(image, x[0],x[1],x[2]);

inside sym_func::operator() to

vex::symbolic<cl_float4> interpResult = imread(image, x[0],x[1],x[2]);

and remove all references to it outside of the operator(). That way you can save some global memory bandwidth.

@agerlach
Copy link
Author

agerlach commented May 2, 2016

Good catch! Thanks for your help.

Now that I have this working, I would like to implement an event detector w/ VexCL and ODEINT, e.g. save the state of the parachute system after descenting every 10 m. I would also use it to detect ground impact. I currently have my application working using odeint-v2/examples/find_crossing.cpp. However, it is unclear to me how to extend that example with VexCL in an efficient way. I will submit a separate issue, after I look at it some more. Do you think the issue belongs here or would be better suited on ODEINT's github page?

Thanks again.

@ddemidov
Copy link
Owner

ddemidov commented May 2, 2016

Feel free to open a new issue here.

@ddemidov
Copy link
Owner

ddemidov commented May 4, 2016

Do you think the ODEINT team would be interested in including such an example?

Another thought: this and #203 could become a nice section on symbolics in vexcl documentation. I would be grateful if you cared to provide it :).

@agerlach
Copy link
Author

agerlach commented May 5, 2016

this and #203 could become a nice section on symbolics in vexcl documentation

Agreed. VexCL is a great tool, and I would be happy to contribute in some way. Let me get this part of my project completed and then I'll get back to you on this. I'll be in touch.

@ddemidov
Copy link
Owner

ddemidov commented May 6, 2016

72da2a7 adds minimal support for using OpenCL images/CUDA textures in vexcl expressions. Basically, it just provides the required specifications for vex::traits::is_vector_expr_terminal and vex::type_name_impl structs.

@jr-jacket
Copy link

Hi - I work with @agerlach and I tried to compile and run the lorenz example above. I'm using Boost v1.6 and CUDA 7.5 with driver version 352.39. I am running on a Tesla K40c.

When I run the above Lorenz code I get:

user@ubuntu1404:~/dir$ ./executable 1000 1

  1. Tesla K40c (NVIDIA CUDA)

OpenCL error: clEnqueueNDRangeKernel(-5: Out of resources)
terminate called after throwing an instance of 'cl::Error'
what(): clEnqueueReadBuffer
Aborted (core dumped)
user@ubuntu1404:~/dir$

When I remove the 3D image in the code, it runs fine (no errors). So I have confirmed that the problem is related to the 3D texture. I suspect this is a driver issue, although our drivers are not that old. Also I can successfully run the devlist example in vexcl - that worked out of the box.

Any ideas why I am getting the above error, or what else I can try to diagnose it?

p.s. to get the above output I added the try/catch loops from:

https://gist.github.com/ddemidov/2925718#file-hello-cpp-L39-L42

around the kernel execution for loop.

@ddemidov
Copy link
Owner

ddemidov commented May 9, 2016

I never tried the code above, so may be @agerlach would be able to comment on this. I can reproduce the error on Nvidia K40, and AMD W9100 gives slightly different clCreateImage(-39: Invalid image format descriptor). The following diff fixes the error for me:

--- a/hello.cpp
+++ b/hello.cpp
@@ -133,13 +133,13 @@ int main( int argc , char **argv )
     // Create 3D Image, fill it with data.
     const int m = 25;

-    std::vector<float> imdata(m * m * m * 3, 42.0f);
-    for (int i = 0; i < m * m * m * 3; i+=3) {
+    std::vector<float> imdata(m * m *m * 4, 42.0f);
+    for (int i = 0; i < m * m * m * 4; i+=4) {
         imdata[i] = 10.;            // R component
         imdata[i+1] = 8.0/3.0;      // G component
         imdata[i+2] = 1;            // B component
     }

-    cl::ImageFormat format(CL_RGB, CL_FLOAT);
+    cl::ImageFormat format(CL_RGBA, CL_FLOAT);
     cl::Image3D image(ctx.context(0), CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
                       format, m, m, m, 0,0, imdata.data());

@ddemidov
Copy link
Owner

ddemidov commented May 9, 2016

This thread seems to be relevant: https://devtalk.nvidia.com/default/topic/475976/image-formats-cl_rgb-not-supported-by-nvidia-/

EDIT: CL_RGB may only be used with a couple of very specific integer data types: https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/cl_image_format.html

@jr-jacket
Copy link

That fixed it! Thanks so much.

@agerlach
Copy link
Author

agerlach commented May 10, 2016

This thread seems to be relevant: https://devtalk.nvidia.com/default/topic/475976/image-formats-cl_rgb-not-supported-by-nvidia-/

EDIT: CL_RGB may only be used with a couple of very specific integer data types: https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/cl_image_format.html

This is very interesting. I am using an NVIDIA GT 750M on my Mac Book Pro and CL_RBG works without issue. I assume this is from a difference in Apple's OpenCL implementation.

@agerlach
Copy link
Author

@ddemidov Thanks for all your previous help on this. My attention was diverted to some other work the last few months and I am finally getting back to this. I have not forgotten about providing an example for you documentation.

I have a follow up question on the usage of OpenCL images w/ VexCL. How could the above example be extended to run with multiple devices (e.g. across 2 Tesla K40s)? I understand that the vex::vector types are distributed across both devices, but how should the cl::Image3D be handled? I essentially would like exact copies of it on each device.

In the above examples we used the following exclusively:

vex::Context ctx(vex::Filter::Env && vex::Filter::Count(1));
...
cl::Image3D image(ctx.context(0), CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
                      format, m, m, m, 0,0, imdata.data());

Would it be possible to do something like this...

cl::Image3D image0(ctx.context(0), CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
                      format, m, m, m, 0,0, imdata.data());
cl::Image3D image1(ctx.context(1), CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
                      format, m, m, m, 0,0, imdata.data());
...
// Integration loop:
for(value_type t = 0; t < t_max; t += dt)
    kernel(X, Y, Z, R, image0, image1, interpResult);

where the kernel has some sort of check to check the context and then use the appropriate image. Do you have any other ideas or suggestions?

@agerlach agerlach reopened this Oct 27, 2016
ddemidov added a commit that referenced this issue Oct 27, 2016
Kernel args packed into std::vector will be unpacked and passed
to the generated kernels on respective devices.

See #202
@ddemidov
Copy link
Owner

#213 should allow you to do this. Namely, you should be able to create a std::vector<cl::Image3D> (an image per compute device), and pass it to the generated kernel:

std::vector<cl::Image3D> images;
images.emplace_back(ctx.context(0), CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
                      format, m, m, m, 0,0, imdata.data());
images.emplace_back(ctx.context(1), CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
                      format, m, m, m, 0,0, imdata.data());

...

for(value_type t = 0; t < t_max; t += dt)
    kernel(X, Y, Z, R, images, interpResult);

Please see if #213 works for you, I'll merge it when you report back.

@agerlach
Copy link
Author

Beautiful! Everything seems to be working perfectly with a 2 Tesla GPU setup. Run time is almost cut in half as expected.

Oddly, when I run with a context of the CPU and 1 GPU I get 50% worse performance. This isn't really a use case I'm interested in, but I thought it would try it.

@ddemidov
Copy link
Owner

Oddly, when I run with a context of the CPU and 1 GPU I get 50% worse performance

That's probably because you get an unbalanced partitioning across your devices and end up with performance of the slowest one. You could try to create your own weighting function to create a better partitioning, but I doubt it would be worth it.

ddemidov added a commit that referenced this issue Oct 27, 2016
Kernel args packed into std::vector will be unpacked and passed
to the generated kernels on respective devices.

See #202
@ddemidov
Copy link
Owner

Merged #213, thanks for testing!

@agerlach
Copy link
Author

That's probably because you get an unbalanced partitioning across your devices and end up with performance of the slowest one.

That is what I figured. Although I don't plan on using a CPU+GPU configuration it is good to know about the weighting functions.

thanks for testing!

My pleasure! Thanks again for all your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants