Thermohaline hackathon: speeding up FRG24 implementation #633

afraser3 · 2024-03-20T00:49:11Z

The FRG24 model for MHD thermohaline mixing, as currently implemented in the thermohaline_hackathon feature branch, is very slow (as pointed out by @evbauer in d453d85 in the context of #605). I've TeXed up some notes (TL;DR on page 2) on different ways to speed up the model---some with zero loss in accuracy, some with negligible loss in accuracy, and some with a loss in accuracy that is a factor of a few (but still more accurate than HG19 or the other models currently implemented). This draft PR covers these different speedups. Broadly speaking, they involve a change of basis that block-diagonalizes the matrix we're dealing with, asymptotic reductions of the equations, and reliance on HG19 where appropriate.

Some speedups this doesn't cover, but are worth exploring down the road if needed:

The matrix whose eigenvalues we seek is very sparse, so ARPACK-NG's associated sparse methods are likely to be much faster than the currently-implemented LAPACK calls, especially in cases where the matrix size needs to be large. An example where I've tested this in the python implementation can be seen here (note that scipy.sparse is just a wrapper around ARPACK-NG, just as numpy.linalg is to LAPACK as I understand).
Right now we're just sweeping over a bunch of k_z's when trying to maximize sigma vs k_z, and we're sort of shooting in the dark. Empirically, the peak always seems to fall between k_z/l_f = 10^-5 and 1, but that's a huge range! It should be possible to get at least an order-of-magnitude estimate of where the peak lies by using perturbative matrix methods (as in this paper).
In general, reducing the number of Fourier modes N or the density of k_z points scanned will directly speed things up at the cost of accuracy.
Manipulating the matrix to have real-valued coefficients, which Rich has done in the past (Pascale has too, for a similar problem), would let us use DGEEV instead of ZGEEV. Presumably that's faster, but I don't know by how much.

…t run

…plotter makes the same plot

…wrong

…e .dat file, looks like the LPN limit has the same diffusivities out to 6 decimal places

afraser3 · 2024-03-20T02:31:35Z

As of this point, I've implemented the speedups that have zero error (change of basis / block diagonalization), and that have asymptotically small error (low Péclet number limit, which introduces errors of order tau, which is usually between 10^-3 to 10^-7). This should be just about 10 times faster than it was before, but I'm not sure how best to test that.

Next step is to implement the low-Rm limit (Sec V of the PDF I linked above), and use HG19 as guardrails to keep the resulting error to acceptable levels. Also, general cleanup.

…ences in turb_plotter.png if you squint (expect larger differences for the WD case)

… (with low-Rm limit) and trying to use HG19 for first block, but doesn't work right now; also implements "safety" integer for picking which of these new speedups to employ

afraser3 · 2024-03-22T02:27:43Z

I think this is working as intended. I've added a safety option to choose different combinations of the above speedups to employ:

safety = 0: Low-tau limit, low-Rm limit, ditch "block a" (the block matrix with the ordinary KH mode), use HG19 away from large R0
safety = 1: low-tau limit, uses low-Rm limit for "block b" (the block matrix with the weird KH mode), still calculates "block a"
safety = 2: low-tau limit
safety = 3 (max): use full model (I don't think this option will ever be needed since tau is always tiny)

Here is a .zip with the resulting turb/plotter plots for each case, and you can see they all look very similar!

In the notes I TeXed above I claimed that for safety=0 the appropriate thing to do is to let D_C be the min between HG19's D_C and the one calculated with this method. What I've implemented here is I'm instead letting w_f be the min between HG19's w_f and the one calculated here. I'm not sure which is more appropriate off the top of my head.

… methods can be compared against the old in timing tests

afraser3 · 2024-03-22T14:18:41Z

For now, safety=4 reverts back to the model as implemented before I began this PR, for sake of comparison with these other options. Note that safety=4 and safety=3 give identical results since no approximations have been made, it's just putting the matrix into a form that lets you get the eigenvalues faster. So safety=4 should be removed before this PR is merged.

evbauer · 2024-04-03T15:07:55Z

turb/public/turb.f90

@@ -130,14 +131,23 @@ subroutine calc_frg24_w(pr, tau, r0, hb, db, ks, n, w, withTC, ierr, lamhat, l2h
     if (present(lamhat)) then
        ! lamhat and l2hat already calculated, so can pass as arguments
        if(withTC) then
-           w = wf_withTC(pr, tau, r0, hb, db, ks, n, .false., lamhat, l2hat)
+           if (safety == 0) then
+             ! TODO: is it bad for this minimization to be occurring here AND in thermohaline.f90?


I think this is fine. This calc_frg24_w routine just exists to give a public interface for calling the velocities so we can use it in turb/plotter. Any actual stellar evolution goes through the get_D_thermohaline routine that is calling the stuff in thermohaline.f90 independent of this, because right now mesa/star just wants a thermohaline composition diffusion coefficient.

But this structure does end up causing a lot of duplicated code at this point. It would be better to have all that code live in a velocity routine in private/thermohaline.f90, and leave this public interface as just a wrapper to call that private routine.

afraser3 added 6 commits March 19, 2024 15:44

First attempt at replacing Sam's matrix with Rich's. Compiles, doesn'…

5972d9c

…t run

Fortran array indexing starts at 1, not 0. D'oh!

95e860f

Implements block-diagonal form of the matrix, I've checked that turb/…

f0ee709

…plotter makes the same plot

Starts implementing Low Peclet Number limit, but the results are all …

93f5ee4

…wrong

Found one bug. Results still bad.

33bfb0e

Found the bug. Now it recreates the figure beautifully. Looking at th…

b725bcc

…e .dat file, looks like the LPN limit has the same diffusivities out to 6 decimal places

Debraheem assigned evbauer Mar 20, 2024

Debraheem added enhancement New feature or request hackathon Potential issue to be tackled on a hackathon labels Mar 20, 2024

Implements low-Rm limit for the second block; now you see tiny differ…

32e28ee

…ences in turb_plotter.png if you squint (expect larger differences for the WD case)

evbauer assigned afraser3 Mar 20, 2024

afraser3 added 2 commits March 21, 2024 19:53

Implements fastest of the "speedups" by only calculating second block…

3648fc2

… (with low-Rm limit) and trying to use HG19 for first block, but doesn't work right now; also implements "safety" integer for picking which of these new speedups to employ

Fixes safety=0, ready to roll (minus cleanup)

fc2c095

Re-implements the original method as safety=4 for now so that the new…

baeef49

… methods can be compared against the old in timing tests

evbauer reviewed Apr 3, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thermohaline hackathon: speeding up FRG24 implementation #633

Thermohaline hackathon: speeding up FRG24 implementation #633

afraser3 commented Mar 20, 2024 •

edited

Loading

afraser3 commented Mar 20, 2024 •

edited

Loading

afraser3 commented Mar 22, 2024 •

edited

Loading

afraser3 commented Mar 22, 2024

evbauer Apr 3, 2024

evbauer Apr 3, 2024

Thermohaline hackathon: speeding up FRG24 implementation #633

Are you sure you want to change the base?

Thermohaline hackathon: speeding up FRG24 implementation #633

Conversation

afraser3 commented Mar 20, 2024 • edited Loading

afraser3 commented Mar 20, 2024 • edited Loading

afraser3 commented Mar 22, 2024 • edited Loading

afraser3 commented Mar 22, 2024

evbauer Apr 3, 2024

Choose a reason for hiding this comment

evbauer Apr 3, 2024

Choose a reason for hiding this comment

afraser3 commented Mar 20, 2024 •

edited

Loading

afraser3 commented Mar 20, 2024 •

edited

Loading

afraser3 commented Mar 22, 2024 •

edited

Loading