Solving an ODE w/ Interpolation #202

agerlach · 2016-04-29T00:32:57Z

I have been using ODEINT for doing Monte Carlo simulations of a parachute system descending to the ground model as a point mass. I would like to utilize VexCL to leverage GPUs to speed up computation. I have mostly been able to modify your symbolic example for my particular equations of motion, but I have run into an issue. I need to interpolate a 3D wind field as a function of the position of the system in order to determine the drag forces acting on the system. I would ideally like to leverage the texture interpolation capabilities of the GPU to do this. Do you have any suggestions?

Thanks

If it helps, here is description of the system model:

ref #202

ddemidov · 2016-04-29T08:12:26Z

Hi,

Unfortunately, VexCL does not expose the OpenCL images/CUDA textures to the user, but it is possible to use those with a little bit of magic. Below is an example that creates an image reading function that works both in a normal vector expression and in the symbolic context. Note that the example is OpenCL-specific, and it depends on the latest commit (e3a6960). It uses 2D image, but I guess it should be possible to convert it for 3D (I have little experience with Image types in OpenCL, so I can not really help you more).

#include <vexcl/vexcl.hpp>

namespace vex {

// Allow cl::Image2D in vector expressions:
namespace traits {
template <> struct is_vector_expr_terminal<cl::Image2D> : std::true_type {};
}

// Let the kernel generator know how to spell cl::Image2D type:
template <>
struct type_name_impl<cl::Image2D> {
    static std::string get() {
        return "image2d_t";
    }
};

}

// Define custom user function that reads a value from an image.
// Need to do it like this (as opposed to using VEX_FUNCTION macro)
// to be able to define sampler object with the function definition.
struct imread_t : vex::UserFunction< imread_t, float(cl::Image2D, cl_int2)> {
    static std::string name() {
        return "imread";
    }

    static void define(vex::backend::source_generator &src) {
        src << VEX_STRINGIZE_SOURCE(
                __constant sampler_t sampler =
                      CLK_NORMALIZED_COORDS_FALSE
                    | CLK_ADDRESS_CLAMP_TO_EDGE
                    | CLK_FILTER_NEAREST;

                float imread(__read_only image2d_t im, int2 i) {
                    return read_imagef(im, sampler, i).x;
                }
                );
    }
} const imread;

int main() {
    vex::Context ctx(vex::Filter::Env && vex::Filter::Count(1));
    std::cout << ctx << std::endl;

    // Create 2D Image, fill it with data.
    const int m = 16;
    std::vector<float> imdata(m * m * 3, 42.0f);
    cl::Image2D image(ctx.context(0), CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
            cl::ImageFormat(CL_RGBA, CL_FLOAT), m, m, 0, imdata.data());


    // Vector of image coordinates (all pointing to the same location for simplicity).
    vex::vector<cl_int2>  p(ctx, 16);
    p = 1;

    // The output vector.
    vex::vector<float> x(ctx, 16);

    // This works:
    x = imread(image, p);
    std::cout << "v1: x = " << x << std::endl;

    // Generating a kernel that uses the function is also possible:
    x = 0;

    // Kernel body:
    std::ostringstream body;
    vex::generator::set_recorder(body);

    // Symbolic variables:
    vex::symbolic<float>       X(vex::symbolic<float>::VectorParameter);
    vex::symbolic<cl::Image2D> I(vex::symbolic<cl::Image2D>::ScalarParameter);
    vex::symbolic<cl_int2>     P(vex::symbolic<cl_int2>::VectorParameter, vex::symbolic<cl_int2>::Const);

    // Record the image reading operation:
    X = imread(I, P);

    // Generate the kernel:
    auto the_answer = vex::generator::build_kernel(ctx, "the_answer", body.str(), X, I, P);

    // Call the kernel:
    the_answer(x, image, p);

    std::cout << "v2: x = " << x << std::endl;
}

ddemidov · 2016-04-29T08:16:43Z

Writing the example made me think the scope_type and constness_type enums should be moved out of the symbolic class template to make the declaration of symbolic variables easier.

agerlach · 2016-04-29T15:03:53Z

This "magic" looks really good. I need to take some more time to understand what you have here, though. FYI, I was getting the following compilation error w/ LLVM 6.0:

Default initialization of an object of const type 'const struct imread_t' requires a user-provided default constructor

Adding imread_t() {}; to struct imreat_t, fixed the issue.

agerlach · 2016-04-29T19:29:14Z

After looking at this some more, it is unclear to me how to use the result of imread() in an ODE. I've modified your example slightly to consider 3D images and tried using it with your symbolic example using a Lorenz system. Here the first two components of the interpolated image are the sigma and b parameters in the model.

The issue I have is that I don't know what I can do with interpResult in sys_func(). I would like to set sigma and b to interpResult.x and interpResult.y, respectively. Do you have any thoughts, or would you approach this differently?

Thanks.

#include <iostream>
#include <vector>
#include <array>
#include <functional>

#include <vexcl/devlist.hpp>
#include <vexcl/vector.hpp>
#include <vexcl/multivector.hpp>
#include <vexcl/generator.hpp>
#include <vexcl/element_index.hpp>

// http://headmyshoulder.github.com/odeint-v2
#include <boost/numeric/odeint.hpp>

#include <vexcl/vexcl.hpp>
using namespace std;
namespace odeint = boost::numeric::odeint;

typedef float value_type;
typedef vex::symbolic< value_type > sym_value;
typedef std::array<sym_value, 3> sym_state;

namespace vex {

    // Allow cl::Image3D in vector expressions:
    namespace traits {
        template <> struct is_vector_expr_terminal<cl::Image3D> : std::true_type {};
    }

    // Let the kernel generator know how to spell cl::Image3D type:
    template <>
    struct type_name_impl<cl::Image3D> {
        static std::string get() {
            return "image3d_t";
        }
    };

}

// Define custom user function that reads a value from an image.
// Need to do it like this (as opposed to using VEX_FUNCTION macro)
// to be able to define sampler object with the function definition.
struct imread_t : vex::UserFunction< imread_t, cl_float4(cl::Image3D, float,float,float)>{    static std::string name() {
        return "imread";
    }

    imread_t() {};
    static void define(vex::backend::source_generator &src) {
        src << VEX_STRINGIZE_SOURCE(
                                    __constant sampler_t sampler =
                                    CLK_NORMALIZED_COORDS_TRUE
                                    | CLK_ADDRESS_CLAMP_TO_EDGE
                                    | CLK_FILTER_NEAREST;

                                    float4 imread(__read_only image3d_t data, float x, float y, float z){
                                        return read_imagef(data, sampler, (float4)(x,y,z,1.));
                                    }
                                    );
    }
} const imread;

struct sys_func
{

    const sym_value &R;
    const vex::symbolic<cl::Image3D>  &image;
    vex::symbolic<cl_float4> interpResult;

    sys_func(const vex::symbolic<cl::Image3D> &image, const sym_value &R, vex::symbolic<cl_float4> &interpResult) : image(image), R(R), interpResult(interpResult) {}

    template <class Sig>
    struct result {
        typedef void type;
    };

    void operator()( const sym_state &x , sym_state &dxdt , value_type ) const
    {
        interpResult = imread(image, x[0],x[1],x[2]);

        value_type sigma = 10.0;// Needs to be interpResult.x
        value_type b = 8.0/3.0; // Needs to be interpResult.y

        dxdt[0] = sigma * (x[1] - x[0]);
        dxdt[1] = R * x[0] - x[1] - x[0] * x[2];
        dxdt[2] = x[0] * x[1] - b * x[2];
    }
};

int main( int argc , char **argv )
{
    size_t n;
    const value_type dt = 0.01;
    const value_type t_max = 100.0;

    using namespace std;

    n = argc > 1 ? atoi( argv[1] ) : 1024;

    vex::Context ctx( vex::Filter::DoublePrecision && vex::Filter::GPU );
    cout << ctx << endl;

    // Custom kernel body will be recorded here:
    std::ostringstream body;
    vex::generator::set_recorder(body);

    // State types that would become kernel parameters:
    sym_state sym_S = {
        sym_value(sym_value::VectorParameter),
        sym_value(sym_value::VectorParameter),
        sym_value(sym_value::VectorParameter)
    };

    // Const kernel parameter.
    sym_value sym_R(sym_value::VectorParameter, sym_value::Const);

    vex::symbolic<cl_float4>       sym_InterpResult(vex::symbolic<cl_float4>::VectorParameter);
    vex::symbolic<cl::Image3D> sym_Image(vex::symbolic<cl::Image3D>::ScalarParameter);


    // Symbolic stepper:
    odeint::runge_kutta4<
                    sym_state , value_type , sym_state , value_type ,
                    odeint::range_algebra , odeint::default_operations
                    > sym_stepper;

    sys_func sys(sym_Image,sym_R,sym_InterpResult);
    sym_stepper.do_step(std::ref(sys), sym_S, 0, dt);

    auto kernel = vex::generator::build_kernel(ctx, "lorenz", body.str(),
                                               sym_S[0], sym_S[1], sym_S[2], sym_R, sym_Image, sym_InterpResult
                                               );

    // Real state initialization:
    value_type Rmin = 0.1;
    value_type Rmax = 50.0;
    value_type dR   = (Rmax - Rmin) / (n - 1);

    vex::vector<value_type> X(ctx, n);
    vex::vector<value_type> Y(ctx, n);
    vex::vector<value_type> Z(ctx, n);
    vex::vector<value_type> R(ctx, n);
    vex::vector<cl_float4> interpResult(ctx, n);

    // Create 3D Image, fill it with data.
    const int m = 25;
    std::vector<float> imdata(m * m *m * 3, 42.0f);
    for (int i = 0; i < m * m * m * 3; i+=3) {
        imdata[i] = 10.;
        imdata[i+1] = -8.0/3.0;
        imdata[i+2] = 1;
    }

    cl::ImageFormat format(CL_RGB, CL_FLOAT);
    cl::Image3D image(ctx.context(0), CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
                      format, m, m, m, 0,0, imdata.data());

    // Initialize
    X = 1.0;
    Y = 10.0;
    Z = 10.0;
    interpResult = 0.0;
    R = Rmin + dR * vex::element_index();

    // Integration loop:
    for(value_type t = 0; t < t_max; t += dt)
        kernel(X, Y, Z, R, image, interpResult);

    // Copy and print results
    std::vector< value_type > result( n );
    vex::copy( X , result );
    for (int ii = 0; ii < result.size(); ++ii)
        cout << result[ii] << endl;
}

ddemidov · 2016-04-29T19:59:51Z

If I understand correctly, you need to introduce a couple of helper functions:

VEX_FUNCTION(float, getx, (cl_float4, v), return v.x; );
VEX_FUNCTION(float, gety, (cl_float4, v), return v.y; );

And use the functions inside sys_func::operator():

value_type sigma = getx(interpResult);
value_type b = gety(interpResult);

Would that work?

ddemidov · 2016-04-29T20:05:37Z

Also you probably want to use CLK_FILTER_LINEAR instead of CLK_FILTER_NEAREST in the definition of the sampler object to actually get the interpolated values. Sorry for my misleading example.

agerlach · 2016-05-02T14:03:38Z

I had already made the change to CLK_FILTER_LINEAR in my code, but I forgot to change in the Lorenz example I posted.

The helper functions you suggested work perfectly, however I had compilation errors and needed to change the use of the function inside sys_func::operator() from

value_type sigma = getx(interpResult);
value_type b = gety(interpResult);

to

vex::symbolic<float> sigma = getx(interpResult);
vex::symbolic<float> b = gety(interpResult);

I've included the final working source code below for anybody else who comes across this. Do you think the ODEINT team would be interested in including such an example? System parameters interpolation and / or table look-ups are quite common in ODEs.

#include <iostream>
#include <vector>
#include <array>
#include <functional>

#include <vexcl/devlist.hpp>
#include <vexcl/vector.hpp>
#include <vexcl/multivector.hpp>
#include <vexcl/generator.hpp>
#include <vexcl/element_index.hpp>

// http://headmyshoulder.github.com/odeint-v2
#include <boost/numeric/odeint.hpp>

#include <vexcl/vexcl.hpp>
using namespace std;
namespace odeint = boost::numeric::odeint;

typedef float value_type;
typedef vex::symbolic< value_type > sym_value;
typedef std::array<sym_value, 3> sym_state;

VEX_FUNCTION(float, getx, (cl_float4, v), return v.x; );
VEX_FUNCTION(float, gety, (cl_float4, v), return v.y; );

namespace vex {

    // Allow cl::Image3D in vector expressions:
    namespace traits {
        template <> struct is_vector_expr_terminal<cl::Image3D> : std::true_type {};
    }

    // Let the kernel generator know how to spell cl::Image3D type:
    template <>
    struct type_name_impl<cl::Image3D> {
        static std::string get() {
            return "image3d_t";
        }
    };

}

// Define custom user function that reads a value from an image.
// Need to do it like this (as opposed to using VEX_FUNCTION macro)
// to be able to define sampler object with the function definition.
struct imread_t : vex::UserFunction< imread_t, cl_float4(cl::Image3D, float,float,float)>{    static std::string name() {
        return "imread";
    }

    imread_t() {};
    static void define(vex::backend::source_generator &src) {
        src << VEX_STRINGIZE_SOURCE(
                                    __constant sampler_t sampler =
                                    CLK_NORMALIZED_COORDS_TRUE
                                    | CLK_ADDRESS_CLAMP_TO_EDGE
                                    | CLK_FILTER_LINEAR;

                                    float4 imread(__read_only image3d_t data, float x, float y, float z){
                                        return read_imagef(data, sampler, (float4)(x,y,z,1.));
                                    }
                                    );
    }
} const imread;

struct sys_func
{

    const sym_value &R;
    const vex::symbolic<cl::Image3D>  &image;
    vex::symbolic<cl_float4> interpResult;

    sys_func(const vex::symbolic<cl::Image3D> &image, const sym_value &R, vex::symbolic<cl_float4> &interpResult) : image(image), R(R), interpResult(interpResult) {}

    template <class Sig>
    struct result {
        typedef void type;
    };



    void operator()( const sym_state &x , sym_state &dxdt , value_type ) const
    {
        interpResult = imread(image, x[0],x[1],x[2]);  // Do interpolation

        sym_value sigma = getx(interpResult);  // Get X component of float4
        sym_value b = gety(interpResult);      // Get Y component of float4

        dxdt[0] = sigma * (x[1] - x[0]);
        dxdt[1] = R * x[0] - x[1] - x[0] * x[2];
        dxdt[2] = x[0] * x[1] - b * x[2];
    }
};

int main( int argc , char **argv )
{
    size_t n;
    const value_type dt = 0.01;
    const value_type t_max = 100.0;

    using namespace std;

    n = argc > 1 ? atoi( argv[1] ) : 1024;

    vex::Context ctx( vex::Filter::DoublePrecision && vex::Filter::GPU );
    cout << ctx << endl;

    // Custom kernel body will be recorded here:
    std::ostringstream body;
    vex::generator::set_recorder(body);

    // State types that would become kernel parameters:
    sym_state sym_S = {
        sym_value(sym_value::VectorParameter),
        sym_value(sym_value::VectorParameter),
        sym_value(sym_value::VectorParameter)
    };

    // Const kernel parameter.
    sym_value sym_R(sym_value::VectorParameter, sym_value::Const);

    vex::symbolic<cl_float4>       sym_InterpResult(vex::symbolic<cl_float4>::VectorParameter);
    vex::symbolic<cl::Image3D> sym_Image(vex::symbolic<cl::Image3D>::ScalarParameter);


    // Symbolic stepper:
    odeint::runge_kutta4<
                    sym_state , value_type , sym_state , value_type ,
                    odeint::range_algebra , odeint::default_operations
                    > sym_stepper;

    sys_func sys(sym_Image,sym_R,sym_InterpResult);
    sym_stepper.do_step(std::ref(sys), sym_S, 0, dt);

    auto kernel = vex::generator::build_kernel(ctx, "lorenz", body.str(),
                                               sym_S[0], sym_S[1], sym_S[2], sym_R, sym_Image, sym_InterpResult
                                               );

    // Real state initialization:
    value_type Rmin = 0.1;
    value_type Rmax = 50.0;
    value_type dR   = (Rmax - Rmin) / (n - 1);

    vex::vector<value_type> X(ctx, n);
    vex::vector<value_type> Y(ctx, n);
    vex::vector<value_type> Z(ctx, n);
    vex::vector<value_type> R(ctx, n);
    vex::vector<cl_float4> interpResult(ctx, n);

    // Create 3D Image, fill it with data.
    const int m = 25;
    std::vector<float> imdata(m * m *m * 3, 42.0f);
    for (int i = 0; i < m * m * m * 3; i+=3) {
        imdata[i] = 10.;            // R component
        imdata[i+1] = 8.0/3.0;      // G component
        imdata[i+2] = 1;            // B component
    }

    cl::ImageFormat format(CL_RGB, CL_FLOAT);
    cl::Image3D image(ctx.context(0), CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
                      format, m, m, m, 0,0, imdata.data());

    // Initialize
    X = 1.0;
    Y = 10.0;
    Z = 10.0;
    interpResult = 0.0;
    R = Rmin + dR * vex::element_index();

    // Integration loop:
    for(value_type t = 0; t < t_max; t += dt)
        kernel(X, Y, Z, R, image, interpResult);

    // Copy and print results
    std::vector< value_type > result( n );
    vex::copy( X , result );
    for (int ii = 0; ii < result.size(); ++ii)
        cout << result[ii] << endl;
}

ddemidov · 2016-05-02T14:09:32Z

This looks nice!

I've included the final working source code below for anybody else who comes across this. Do you think the ODEINT team would be interested in including such an example? System parameters interpolation and / or table look-ups are quite common in ODEs.

Pinging @headmyshoulder and @mariomulansky. What do you guys think? Is it worth a PR?

ddemidov · 2016-05-02T14:21:52Z

One note: I think you can keep interpResult as a local variable. That is, just change the line

interpResult = imread(image, x[0],x[1],x[2]);

inside sym_func::operator() to

vex::symbolic<cl_float4> interpResult = imread(image, x[0],x[1],x[2]);

and remove all references to it outside of the operator(). That way you can save some global memory bandwidth.

agerlach · 2016-05-02T14:40:46Z

Good catch! Thanks for your help.

Now that I have this working, I would like to implement an event detector w/ VexCL and ODEINT, e.g. save the state of the parachute system after descenting every 10 m. I would also use it to detect ground impact. I currently have my application working using odeint-v2/examples/find_crossing.cpp. However, it is unclear to me how to extend that example with VexCL in an efficient way. I will submit a separate issue, after I look at it some more. Do you think the issue belongs here or would be better suited on ODEINT's github page?

Thanks again.

ddemidov · 2016-05-02T15:06:33Z

Feel free to open a new issue here.

ddemidov · 2016-05-04T15:18:08Z

Do you think the ODEINT team would be interested in including such an example?

Another thought: this and #203 could become a nice section on symbolics in vexcl documentation. I would be grateful if you cared to provide it :).

agerlach · 2016-05-05T17:12:26Z

this and #203 could become a nice section on symbolics in vexcl documentation

Agreed. VexCL is a great tool, and I would be happy to contribute in some way. Let me get this part of my project completed and then I'll get back to you on this. I'll be in touch.

ddemidov · 2016-05-06T08:49:50Z

72da2a7 adds minimal support for using OpenCL images/CUDA textures in vexcl expressions. Basically, it just provides the required specifications for vex::traits::is_vector_expr_terminal and vex::type_name_impl structs.

jr-jacket · 2016-05-09T19:20:36Z

Hi - I work with @agerlach and I tried to compile and run the lorenz example above. I'm using Boost v1.6 and CUDA 7.5 with driver version 352.39. I am running on a Tesla K40c.

When I run the above Lorenz code I get:

user@ubuntu1404:~/dir$ ./executable 1000 1

Tesla K40c (NVIDIA CUDA)

OpenCL error: clEnqueueNDRangeKernel(-5: Out of resources)
terminate called after throwing an instance of 'cl::Error'
what(): clEnqueueReadBuffer
Aborted (core dumped)
user@ubuntu1404:~/dir$

When I remove the 3D image in the code, it runs fine (no errors). So I have confirmed that the problem is related to the 3D texture. I suspect this is a driver issue, although our drivers are not that old. Also I can successfully run the devlist example in vexcl - that worked out of the box.

Any ideas why I am getting the above error, or what else I can try to diagnose it?

p.s. to get the above output I added the try/catch loops from:

https://gist.github.com/ddemidov/2925718#file-hello-cpp-L39-L42

around the kernel execution for loop.

ddemidov · 2016-05-09T19:56:04Z

I never tried the code above, so may be @agerlach would be able to comment on this. I can reproduce the error on Nvidia K40, and AMD W9100 gives slightly different clCreateImage(-39: Invalid image format descriptor). The following diff fixes the error for me:

--- a/hello.cpp
+++ b/hello.cpp
@@ -133,13 +133,13 @@ int main( int argc , char **argv )
     // Create 3D Image, fill it with data.
     const int m = 25;

-    std::vector<float> imdata(m * m * m * 3, 42.0f);
-    for (int i = 0; i < m * m * m * 3; i+=3) {
+    std::vector<float> imdata(m * m *m * 4, 42.0f);
+    for (int i = 0; i < m * m * m * 4; i+=4) {
         imdata[i] = 10.;            // R component
         imdata[i+1] = 8.0/3.0;      // G component
         imdata[i+2] = 1;            // B component
     }

-    cl::ImageFormat format(CL_RGB, CL_FLOAT);
+    cl::ImageFormat format(CL_RGBA, CL_FLOAT);
     cl::Image3D image(ctx.context(0), CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
                       format, m, m, m, 0,0, imdata.data());

ddemidov · 2016-05-09T19:57:02Z

This thread seems to be relevant: https://devtalk.nvidia.com/default/topic/475976/image-formats-cl_rgb-not-supported-by-nvidia-/

EDIT: CL_RGB may only be used with a couple of very specific integer data types: https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/cl_image_format.html

jr-jacket · 2016-05-09T20:29:22Z

That fixed it! Thanks so much.

agerlach · 2016-05-10T12:25:14Z

This thread seems to be relevant: https://devtalk.nvidia.com/default/topic/475976/image-formats-cl_rgb-not-supported-by-nvidia-/

EDIT: CL_RGB may only be used with a couple of very specific integer data types: https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/cl_image_format.html

This is very interesting. I am using an NVIDIA GT 750M on my Mac Book Pro and CL_RBG works without issue. I assume this is from a difference in Apple's OpenCL implementation.

agerlach · 2016-10-27T15:18:47Z

@ddemidov Thanks for all your previous help on this. My attention was diverted to some other work the last few months and I am finally getting back to this. I have not forgotten about providing an example for you documentation.

I have a follow up question on the usage of OpenCL images w/ VexCL. How could the above example be extended to run with multiple devices (e.g. across 2 Tesla K40s)? I understand that the vex::vector types are distributed across both devices, but how should the cl::Image3D be handled? I essentially would like exact copies of it on each device.

In the above examples we used the following exclusively:

vex::Context ctx(vex::Filter::Env && vex::Filter::Count(1));
...
cl::Image3D image(ctx.context(0), CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
                      format, m, m, m, 0,0, imdata.data());

Would it be possible to do something like this...

cl::Image3D image0(ctx.context(0), CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
                      format, m, m, m, 0,0, imdata.data());
cl::Image3D image1(ctx.context(1), CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
                      format, m, m, m, 0,0, imdata.data());
...
// Integration loop:
for(value_type t = 0; t < t_max; t += dt)
    kernel(X, Y, Z, R, image0, image1, interpResult);

where the kernel has some sort of check to check the context and then use the appropriate image. Do you have any other ideas or suggestions?

Kernel args packed into std::vector will be unpacked and passed to the generated kernels on respective devices. See #202

ddemidov · 2016-10-27T16:49:36Z

#213 should allow you to do this. Namely, you should be able to create a std::vector<cl::Image3D> (an image per compute device), and pass it to the generated kernel:

std::vector<cl::Image3D> images;
images.emplace_back(ctx.context(0), CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
                      format, m, m, m, 0,0, imdata.data());
images.emplace_back(ctx.context(1), CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
                      format, m, m, m, 0,0, imdata.data());

...

for(value_type t = 0; t < t_max; t += dt)
    kernel(X, Y, Z, R, images, interpResult);

Please see if #213 works for you, I'll merge it when you report back.

agerlach · 2016-10-27T19:25:05Z

Beautiful! Everything seems to be working perfectly with a 2 Tesla GPU setup. Run time is almost cut in half as expected.

Oddly, when I run with a context of the CPU and 1 GPU I get 50% worse performance. This isn't really a use case I'm interested in, but I thought it would try it.

ddemidov · 2016-10-27T19:47:31Z

Oddly, when I run with a context of the CPU and 1 GPU I get 50% worse performance

That's probably because you get an unbalanced partitioning across your devices and end up with performance of the slowest one. You could try to create your own weighting function to create a better partitioning, but I doubt it would be worth it.

Kernel args packed into std::vector will be unpacked and passed to the generated kernels on respective devices. See #202

ddemidov · 2016-10-27T19:48:31Z

Merged #213, thanks for testing!

agerlach · 2016-10-27T19:50:59Z

That's probably because you get an unbalanced partitioning across your devices and end up with performance of the slowest one.

That is what I figured. Although I don't plan on using a CPU+GPU configuration it is good to know about the weighting functions.

thanks for testing!

My pleasure! Thanks again for all your help.

ddemidov added a commit that referenced this issue Apr 29, 2016

Do not create local copies for scalar symbolic parameters

e3a6960

ref #202

agerlach closed this as completed May 2, 2016

This was referenced May 13, 2016

boost::fusion::vector_tie(args...) Error #206

Closed

Different results depending on runtime interface used #207

Closed

agerlach reopened this Oct 27, 2016

ddemidov added a commit that referenced this issue Oct 27, 2016

Allow passing generated kernel args for each GPU

0e13583

Kernel args packed into std::vector will be unpacked and passed to the generated kernels on respective devices. See #202

ddemidov mentioned this issue Oct 27, 2016

Allow passing generated kernel args for each GPU #213

Merged

ddemidov added a commit that referenced this issue Oct 27, 2016

Allow passing generated kernel args for each GPU (#213)

6578ce2

Kernel args packed into std::vector will be unpacked and passed to the generated kernels on respective devices. See #202

agerlach closed this as completed Oct 27, 2016

agerlach mentioned this issue Feb 8, 2017

Image support across multiple devices #218

Closed

agerlach mentioned this issue Jun 21, 2018

GPU Parallel Monte Carlo SciML/DifferentialEquations.jl#308

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solving an ODE w/ Interpolation #202

Solving an ODE w/ Interpolation #202

agerlach commented Apr 29, 2016

ddemidov commented Apr 29, 2016

ddemidov commented Apr 29, 2016

agerlach commented Apr 29, 2016

agerlach commented Apr 29, 2016

ddemidov commented Apr 29, 2016 •

edited

Loading

ddemidov commented Apr 29, 2016

agerlach commented May 2, 2016

ddemidov commented May 2, 2016

ddemidov commented May 2, 2016 •

edited

Loading

agerlach commented May 2, 2016

ddemidov commented May 2, 2016

ddemidov commented May 4, 2016

agerlach commented May 5, 2016 •

edited

Loading

ddemidov commented May 6, 2016

jr-jacket commented May 9, 2016

ddemidov commented May 9, 2016 •

edited

Loading

ddemidov commented May 9, 2016 •

edited

Loading

jr-jacket commented May 9, 2016

agerlach commented May 10, 2016 •

edited

Loading

agerlach commented Oct 27, 2016

ddemidov commented Oct 27, 2016

agerlach commented Oct 27, 2016

ddemidov commented Oct 27, 2016

ddemidov commented Oct 27, 2016

agerlach commented Oct 27, 2016

Solving an ODE w/ Interpolation #202

Solving an ODE w/ Interpolation #202

Comments

agerlach commented Apr 29, 2016

ddemidov commented Apr 29, 2016

ddemidov commented Apr 29, 2016

agerlach commented Apr 29, 2016

agerlach commented Apr 29, 2016

ddemidov commented Apr 29, 2016 • edited Loading

ddemidov commented Apr 29, 2016

agerlach commented May 2, 2016

ddemidov commented May 2, 2016

ddemidov commented May 2, 2016 • edited Loading

agerlach commented May 2, 2016

ddemidov commented May 2, 2016

ddemidov commented May 4, 2016

agerlach commented May 5, 2016 • edited Loading

ddemidov commented May 6, 2016

jr-jacket commented May 9, 2016

ddemidov commented May 9, 2016 • edited Loading

ddemidov commented May 9, 2016 • edited Loading

jr-jacket commented May 9, 2016

agerlach commented May 10, 2016 • edited Loading

agerlach commented Oct 27, 2016

ddemidov commented Oct 27, 2016

agerlach commented Oct 27, 2016

ddemidov commented Oct 27, 2016

ddemidov commented Oct 27, 2016

agerlach commented Oct 27, 2016

ddemidov commented Apr 29, 2016 •

edited

Loading

ddemidov commented May 2, 2016 •

edited

Loading

agerlach commented May 5, 2016 •

edited

Loading

ddemidov commented May 9, 2016 •

edited

Loading

ddemidov commented May 9, 2016 •

edited

Loading

agerlach commented May 10, 2016 •

edited

Loading