Performance of tinygp for multiple quasiseparable kernels #190

mlefkir · 2023-10-26T12:53:30Z

I have a model expressed as a sum of many (~from 10 to 40) SHO kernels and I have been playing around with tinygp and celerite2 (Jax implementation). I have done some tests, and celerite2 is faster than tinygp (see figure below) when using a sum of multiple semi-separable kernels.

Could you give me some insight into why we have such a difference in runtime between the two libraries?
And also would it be possible to reach the celerite2 speed with a modification of the tinygp implementation? I am currently in the process of reading the tinygp code to understand what could explain such a difference.

Thanks,

dfm · 2023-10-26T14:34:57Z

Well, celerite2 is written in tuned C++ and tinygp is written in Python and JIT-compiled to XLA, so it would be hard to make a direct comparison! The benefit of the tinygp implementation is that it is much more flexible in the kinds of models it supports, but it can be harder to tune the performance and memory usage.

I haven't done much benchmarking with "wide" models like you're considering, but the relevant algorithms are here:

tinygp/src/tinygp/solvers/quasisep/core.py

Line 561 in 3c338bf

def cholesky(self) -> LowerTriQSM:

and here:

tinygp/src/tinygp/solvers/quasisep/core.py

Line 332 in 3c338bf

def solve(self, y: JAXArray) -> JAXArray:

The first place I'd look to improve performance would be those methods, but I'm not sure exactly what I would suggest!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of tinygp for multiple quasiseparable kernels #190

Performance of tinygp for multiple quasiseparable kernels #190

mlefkir commented Oct 26, 2023

dfm commented Oct 26, 2023

Performance of tinygp for multiple quasiseparable kernels #190

Performance of tinygp for multiple quasiseparable kernels #190

Comments

mlefkir commented Oct 26, 2023

dfm commented Oct 26, 2023