Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenMP benchmark #3

Open
neapel opened this issue Jun 11, 2013 · 5 comments
Open

OpenMP benchmark #3

neapel opened this issue Jun 11, 2013 · 5 comments
Assignees

Comments

@neapel
Copy link

neapel commented Jun 11, 2013

Measure speedup

@ghost ghost assigned neapel Jun 11, 2013
@mariomulansky
Copy link

I suggest to use nonlinear oscillator lattices, I have some tuned codes for that already we can check against.

@neapel
Copy link
Author

neapel commented Jun 14, 2013

1725e45 has a trivial benchmark to check if there's any kind of speedup but the values vary wildly with gcc's OpenMP library, with Intel's it's more stable...

@mariomulansky
Copy link

Maybe 1024 lorenz systems is still not enough to benefit from parallelization? For a proper benchmark I suggest to look at the scaling with cores. For such a loranz example, which is completely uncoupled, it should scale almost perfectly with the number of cores (or memory bandwidth for that matter)

neapel pushed a commit that referenced this issue Jul 19, 2013
Based on <https://github.com/mariomulansky/hpx_odeint/tree/9792ca4f330bf0cffde4f000e900fb4c1c254891/osc_chain_1d/openmp2>

Use osc_chain_speedup.{sh,gnu} to compute and plot speedup.
"split" uses openmp_state/openmp_algebra;
"simple" uses vector/openmp_range_algebra
@neapel
Copy link
Author

neapel commented Jul 19, 2013

Benchmark results for GCC 4.7.3 and ICC 13.1.1 on i7-3770 with (short) n=4096, 1024steps; (long) n=4194304, 1step; using (split) openmp_algebra=openmp_nested_algebra<range_algebra> with openmp_state or (simple) openmp_range_algebra with vector. All cases with schedule(runtime) and OMP_SCHEDULE=static.

osc_chain_speedup

@neapel
Copy link
Author

neapel commented Jul 19, 2013

Times for split/simple not comparable because simple case doesn't store values between cycles of the loop see here; all with release build; speedup with debug builds tend to be much larger but the times longer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants