You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a model expressed as a sum of many (~from 10 to 40) SHO kernels and I have been playing around with tinygp and celerite2 (Jax implementation). I have done some tests, and celerite2 is faster than tinygp (see figure below) when using a sum of multiple semi-separable kernels.
Could you give me some insight into why we have such a difference in runtime between the two libraries?
And also would it be possible to reach the celerite2 speed with a modification of the tinygp implementation? I am currently in the process of reading the tinygp code to understand what could explain such a difference.
Thanks,
The text was updated successfully, but these errors were encountered:
Well, celerite2 is written in tuned C++ and tinygp is written in Python and JIT-compiled to XLA, so it would be hard to make a direct comparison! The benefit of the tinygp implementation is that it is much more flexible in the kinds of models it supports, but it can be harder to tune the performance and memory usage.
I haven't done much benchmarking with "wide" models like you're considering, but the relevant algorithms are here:
I have a model expressed as a sum of many (~from 10 to 40) SHO kernels and I have been playing around with tinygp and celerite2 (Jax implementation). I have done some tests, and celerite2 is faster than tinygp (see figure below) when using a sum of multiple semi-separable kernels.
Could you give me some insight into why we have such a difference in runtime between the two libraries?
And also would it be possible to reach the celerite2 speed with a modification of the tinygp implementation? I am currently in the process of reading the tinygp code to understand what could explain such a difference.
Thanks,
The text was updated successfully, but these errors were encountered: