Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenMP state #5

Open
neapel opened this issue Jun 11, 2013 · 7 comments
Open

OpenMP state #5

neapel opened this issue Jun 11, 2013 · 7 comments
Assignees

Comments

@neapel
Copy link

neapel commented Jun 11, 2013

No description provided.

@ghost ghost assigned neapel Jun 11, 2013
@mariomulansky
Copy link

For my understanding:
The OpenMP state is basically a collection of chunks of data - and then the omp_algebra will distribute the chunks across the OpenMP threads? For processing each chunk one then uses the algebra S that is given to the omp_algebra as template parameter? If this is correct I think one might not even need an openmp_operation.

@neapel
Copy link
Author

neapel commented Jun 14, 2013

I've implemented a simple algebra in 1725e45 that just uses a random access container like std::vector or array from multiple threads, but doesn't support a dispatcher. The system function is called once from the main thread and the user needs to take care of multi-threading there.

To get to a common parallel interface I think the system function should transparently be called from multiple threads by odeint, with partial views of the state, so the user won't have to worry about parallelization, or later, MPI, it would look the same.

@ddemidov
Copy link

Special OpenMP state may be required in order to initialize the underlying memory properly. This is important for performance on NUMA systems: each OpenMP thread should be the first to touch its chunk of memory (see e.g. this presentation for an explanation). This should probably be handled by odeint's resizer implementation for the state.

@headmyshoulder
Copy link
Member

and you don't even need a separate state type :)

On 06/14/2013 11:39 PM, neapel wrote:

I've implemented a simple algebra in 1725e45
1725e45ac5a41b14ae0e
that just uses a random access container like std::vector or array from
multiple threads, but doesn't support a dispatcher. The system function
is called once from the main thread and the user needs to take care of
multi-threading there.

To get to a common parallel interface I think the system function should
transparently be called from multiple threads by odeint, with partial
views of the state, so the user won't have to worry about
parallelization, or later, MPI, it would look the same.


Reply to this email directly or view it on GitHub
#5 (comment).

@mariomulansky
Copy link

i do think an omp state is a good idea to have more control over the parallelization. how else could you specialize the resizer if you dont have an omp state type you can specialize with.

@headmyshoulder
Copy link
Member

On 17.06.2013 10:29, Mario Mulansky wrote:

i do think an omp state is a good idea to have more control over the
parallelization. how else could you specialize the resizer if you dont
have an omp state type you can specialize with.

Ok, you are right. I think there are possibilities with SFINAE and
enable_if magix, but this might be overkill :)


Reply to this email directly or view it on GitHub
#5 (comment).

neapel pushed a commit that referenced this issue Jun 18, 2013
State splits a given Range into an InnerState, one for each thread. The algebra's for_eachN calls for_eachN in parallel on each part, using the InnerState's algebra. There's an openmp_wrapper to parallelize the system function; this needs a way to pass on the offset.

The idea was that this design should allow using OpenMP on each MPI node with a single-threaded inner_state: mpi_state< openmp_state< inner_state > > with mpi_wrapper(openmp_wrapper(system_function))
neapel pushed a commit that referenced this issue Jun 18, 2013
State splits a given Range into an InnerState, one for each thread. The algebra's for_eachN calls for_eachN in parallel on each part, using the InnerState's algebra. There's an openmp_wrapper to parallelize the system function; this needs a way to pass on the offset.

The idea was that this design should allow using OpenMP on each MPI node with a single-threaded inner_state: mpi_state< openmp_state< inner_state > > with mpi_wrapper(openmp_wrapper(system_function))
neapel pushed a commit that referenced this issue Jul 19, 2013
openmp_range_algebra: parallel for over a random access container.
openmp_nested_algebra: processs parts of a split container in parallel.
openmp_state: a split container based on vector<vector<>>.
openmp_algebra: use a range_algebra on each part of that container.
@neapel
Copy link
Author

neapel commented Jul 19, 2013

openmp_state<T> is now an alias for std::vector<std::vector<T>>, the number of splits to create can be forced by using the normal constructors to initialize a number of elements. The actual splitting happens in odeint::copy, copying from a vector<T> to openmp_state<T> splits the data, if the openmp_state<T> has no size, the number of threads is used. Copying from openmp_state<T> to vector<T> joins the data again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants