Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decrease number of guard cells allocated as a function of interpolation order #2336

Open
wants to merge 19 commits into
base: development
Choose a base branch
from

Conversation

NeilZaim
Copy link
Member

@NeilZaim NeilZaim commented Sep 24, 2021

I'm wondering if we're not overly cautious in the number of guard cells we allocate as a function of interpolation order. For instance, for a regular simulation with order 3 shape factors, we will end up with 4 guard cells (because we round it up later on to the closest even integer), while I thought that 2 would be necessary.

So I'm trying to reduce this number in this PR and see if this affects our automated tests.

(This should not affect PSATD simulations, where the number of allocated guard cells is dominated by the needs of the field solver and not by the interpolation order)

Edit: Updates after the CI tests. Some tests have been failing and I've had to make a few modifications:

  • Some tests were PEC boundary conditions were failing. This is because the boxes used inside PEC routines were directly grown by the shape factor, which is no longer compatible with this PR (the total number of guard cells can be lower than the shape factor, e.g. in the case of order 3). This resulted in out-of-bound array accesses.
  • The benchmarks for the test averaged_galilean_2d_psatd_hybrid have changed. After investigating, this is because this PR changes the number of allocated guard cells from (9,16) to (8,16). Indeed, the number of guard cells in the x direction was interestingly not determined by the field solver order, but by the high cfl used, which means that particles can move by 4 cells in the x direction within a timestep, and resulted in a computed value of ng_alloc_Rho of (9,6) (eventually the number of allocated guard cells is the same for all arrays, so all arrays ended up with (9,16) guard cells). However, the value of 9 was obtained by assuming that 3 guard cells are necessary for a shape factor of 3, which is not the case (only 2 are needed). Hence, I think it makes sense that this test only needs 8 guard cells in the x direction and that we can reset the benchmarks (which I've done).
  • I've also had to reset the benchmark for the momentum of the ions in the x direction in the test LaserAccelerationBoost (which was off by ~1e-7). I've investigated this and it seems to be machine precision errors in the current deposition (which end up making 1e-7 errors in the ion momentum, which is practically 0 in the x direction). With the changes of this PR, the number of guard cells for the J field decrease from 4 to 3 in this test. This can affect the results (machine precision errors) via the calculation of xyzmin here:
    const std::array<Real, 3>& xyzmin = WarpX::LowerCorner(tilebox, galilean_shift, depos_lev);

    I've verified that changing back the number of guard cells of the J array from 3 to 4 recovers the previous benchmark, and that slightly changing the values of xyzmin with lines like xyzmin[0] = xyzmin[0] + xyzmin[0]*5.e-15; also modifies this checksum by ~e-7. So I think that we can safely reset this benchmark as well.
  • The same also occured for the test PEC_particle. Except that this time, the numerical errors came from both the current deposition (same as above) and the field gathering. In the latter, it is the computation of xyzmin here:
    const std::array<Real, 3>& xyzmin = WarpX::LowerCorner(box, galilean_shift, gather_lev);

    that is sensitive to machine precision errors and that was changed when reducing the number of guard cells for the E&B fields from 4 to 2. This resulted in very important changes in relative amplitude (>1) for some checksums, but these were checksums with values extremely close to 0 so it's not surprising that they are sensitive to machine precision errors.

@ax3l ax3l requested review from EZoni and RemiLehe September 27, 2021 22:15
@NeilZaim
Copy link
Member Author

@EZoni @RemiLehe I will probably need some help from you on this PR (nothing urgent, can definitely wait till after the milestone).

In the end, I have a single CI test failing (multi_J_rz_psatd) due to reducing the number of guard cells. More specifically, reducing the value of ng_depos_J in the z direction from 4 to 3 leads to out of bonds array accesses during the current deposition. Would you know if there is a part of this test that makes it need one extra guard for the current deposition compared to the other tests? (RZ-PSATD, multi-J, direct current deposition, or a combination of these?)

@EZoni
Copy link
Member

EZoni commented Sep 28, 2021

Hi @NeilZaim, thank you for this PR! I think it might be best to discuss your question and this PR in general in person. Will try to coordinate on Slack. In general, we might need to be careful about differences between even and odd shape factors. I think the nice plot below, taken from this chapter, will help guide our discussion (it will help reviewers go through your PR as well):

In particular, I would think that the guard cells that you set in GuardCellManager.cpp as

constexpr int FGcell[4] = {0,1,1,2};

should rather be

constexpr int FGcell[4] = {1,1,2,2};

based on the assumption that the shape factor should be centered on a given particle's position, if we think about the deposition of the particle's charge and current. I might be wrong about this, though. Or I might be confusing something at the moment. I think it will be easier to discuss in person with you and Remi.

@NeilZaim
Copy link
Member Author

NeilZaim commented Oct 1, 2021

Hi @EZoni. Sure, it's a good idea to have a meeting to talk about this. I think I'll be available most mornings (your time) in the coming weeks.

@ax3l ax3l changed the title Decrease number of guard cells allocated as a function of interpolation order [WIP] Decrease number of guard cells allocated as a function of interpolation order Oct 4, 2021
@RemiLehe
Copy link
Member

RemiLehe commented Oct 4, 2021

Quick summary of an offline discussion with @EZoni and @NeilZaim : The failing tests with multi-J might be because WarpX currently does not take into account (in the allocation of the guard) the fact that the particles can move up to 2*dt ahead and deposit current.
@NeilZaim will correct this (potentially in a separate PR). In the meantime, this PR is set to [WIP].

@EZoni EZoni added the component: parallelization Guard cell exchanges and particle redistribution label Oct 5, 2021
@NeilZaim NeilZaim mentioned this pull request Jun 21, 2022
@NeilZaim NeilZaim changed the title [WIP] Decrease number of guard cells allocated as a function of interpolation order Decrease number of guard cells allocated as a function of interpolation order Jun 24, 2022
@NeilZaim
Copy link
Member Author

So I think that this PR is ready now. I've had to modify a few benchmarks for the following reasons:

  • Some close to zero values are affected by machine precision errors (for example in the calculation of xyzmin in the current deposition routines, see above). This concerns the tests LaserAccelerationBoost, LaserAcceleration_single_precision_comms, PEC_particles and RepellingParticles. If possible, I've tried to remove the close to zero values from the benchmarks.
  • In some cases (averaged_galilean_2d_psatd_hybrid and averaged_galilean_3d_psatd_hybrid) the number of allocated guard cells changes, which affects the psatd related truncation errors.
  • Some Vay deposition tests (VayDeposition2D and VayDeposition3D) are affected by the change in ng_depos_J but it seems to be an error in the implementation of the Vay algorithm (see Bug w/ Vay Deposition #3189) which should be fixed in another PR.

@NeilZaim
Copy link
Member Author

cc. @EZoni @RemiLehe

@ax3l
Copy link
Member

ax3l commented Aug 1, 2022

Trigger CI again

@ax3l ax3l closed this Aug 1, 2022
@ax3l ax3l reopened this Aug 1, 2022
@NeilZaim
Copy link
Member Author

NeilZaim commented Aug 5, 2022

Ok, it looks like the macOS test consistently segfaults at timestep 98, but I am not able to reproduce this locally (I don't have a mac though). I'm currently trying with all valgrind checks enabled but it's taking a million years.

@ax3l in case I cannot reproduce the issue locally, do you know what is the best way to debug the segfault? Am I doomed to debug in CI by adding print statements everywhere? Is there a way to access the backtraces generated during CI?

@NeilZaim
Copy link
Member Author

NeilZaim commented Aug 9, 2022

Ok, looks like the macos issue is indeed fixed by #3294.

@NeilZaim
Copy link
Member Author

NeilZaim commented Aug 9, 2022

The NCI corrector benchmarks were also failing since they were added in #3252. This is due to machine precision difference when reducing the number of guard cells (same as LaserAccelerationBoost, LaserAcceleration_single_precision_comms, PEC_particles and RepellingParticles). The somewhat high relative errors observed (~1e-7) come from the fact that the nci tests are numerically unstable, so that small initial relative errors coming from machine precision can grow with time.

@EZoni
Copy link
Member

EZoni commented Sep 16, 2024

I pushed a few commits to update this old branch and see if we can still get these changes to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: parallelization Guard cell exchanges and particle redistribution
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants