Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thermohaline hackathon: speeding up FRG24 implementation #633

Draft
wants to merge 10 commits into
base: thermohaline_hackathon
Choose a base branch
from

Conversation

afraser3
Copy link

@afraser3 afraser3 commented Mar 20, 2024

The FRG24 model for MHD thermohaline mixing, as currently implemented in the thermohaline_hackathon feature branch, is very slow (as pointed out by @evbauer in d453d85 in the context of #605). I've TeXed up some notes (TL;DR on page 2) on different ways to speed up the model---some with zero loss in accuracy, some with negligible loss in accuracy, and some with a loss in accuracy that is a factor of a few (but still more accurate than HG19 or the other models currently implemented). This draft PR covers these different speedups. Broadly speaking, they involve a change of basis that block-diagonalizes the matrix we're dealing with, asymptotic reductions of the equations, and reliance on HG19 where appropriate.

Some speedups this doesn't cover, but are worth exploring down the road if needed:

  • The matrix whose eigenvalues we seek is very sparse, so ARPACK-NG's associated sparse methods are likely to be much faster than the currently-implemented LAPACK calls, especially in cases where the matrix size needs to be large. An example where I've tested this in the python implementation can be seen here (note that scipy.sparse is just a wrapper around ARPACK-NG, just as numpy.linalg is to LAPACK as I understand).
  • Right now we're just sweeping over a bunch of k_z's when trying to maximize sigma vs k_z, and we're sort of shooting in the dark. Empirically, the peak always seems to fall between k_z/l_f = 10^-5 and 1, but that's a huge range! It should be possible to get at least an order-of-magnitude estimate of where the peak lies by using perturbative matrix methods (as in this paper).
  • In general, reducing the number of Fourier modes N or the density of k_z points scanned will directly speed things up at the cost of accuracy.
  • Manipulating the matrix to have real-valued coefficients, which Rich has done in the past (Pascale has too, for a similar problem), would let us use DGEEV instead of ZGEEV. Presumably that's faster, but I don't know by how much.

@Debraheem Debraheem added enhancement New feature or request hackathon Potential issue to be tackled on a hackathon labels Mar 20, 2024
@afraser3
Copy link
Author

afraser3 commented Mar 20, 2024

As of this point, I've implemented the speedups that have zero error (change of basis / block diagonalization), and that have asymptotically small error (low Péclet number limit, which introduces errors of order tau, which is usually between 10^-3 to 10^-7). This should be just about 10 times faster than it was before, but I'm not sure how best to test that.

Next step is to implement the low-Rm limit (Sec V of the PDF I linked above), and use HG19 as guardrails to keep the resulting error to acceptable levels. Also, general cleanup.

…ences in turb_plotter.png if you squint (expect larger differences for the WD case)
… (with low-Rm limit) and trying to use HG19 for first block, but doesn't work right now; also implements "safety" integer for picking which of these new speedups to employ
@afraser3
Copy link
Author

afraser3 commented Mar 22, 2024

I think this is working as intended. I've added a safety option to choose different combinations of the above speedups to employ:

  • safety = 0: Low-tau limit, low-Rm limit, ditch "block a" (the block matrix with the ordinary KH mode), use HG19 away from large R0
  • safety = 1: low-tau limit, uses low-Rm limit for "block b" (the block matrix with the weird KH mode), still calculates "block a"
  • safety = 2: low-tau limit
  • safety = 3 (max): use full model (I don't think this option will ever be needed since tau is always tiny)

Here is a .zip with the resulting turb/plotter plots for each case, and you can see they all look very similar!

In the notes I TeXed above I claimed that for safety=0 the appropriate thing to do is to let D_C be the min between HG19's D_C and the one calculated with this method. What I've implemented here is I'm instead letting w_f be the min between HG19's w_f and the one calculated here. I'm not sure which is more appropriate off the top of my head.

… methods can be compared against the old in timing tests
@afraser3
Copy link
Author

For now, safety=4 reverts back to the model as implemented before I began this PR, for sake of comparison with these other options. Note that safety=4 and safety=3 give identical results since no approximations have been made, it's just putting the matrix into a form that lets you get the eigenvalues faster. So safety=4 should be removed before this PR is merged.

@@ -130,14 +131,23 @@ subroutine calc_frg24_w(pr, tau, r0, hb, db, ks, n, w, withTC, ierr, lamhat, l2h
if (present(lamhat)) then
! lamhat and l2hat already calculated, so can pass as arguments
if(withTC) then
w = wf_withTC(pr, tau, r0, hb, db, ks, n, .false., lamhat, l2hat)
if (safety == 0) then
! TODO: is it bad for this minimization to be occurring here AND in thermohaline.f90?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine. This calc_frg24_w routine just exists to give a public interface for calling the velocities so we can use it in turb/plotter. Any actual stellar evolution goes through the get_D_thermohaline routine that is calling the stuff in thermohaline.f90 independent of this, because right now mesa/star just wants a thermohaline composition diffusion coefficient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this structure does end up causing a lot of duplicated code at this point. It would be better to have all that code live in a velocity routine in private/thermohaline.f90, and leave this public interface as just a wrapper to call that private routine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request hackathon Potential issue to be tackled on a hackathon
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

3 participants