Unexpected behavior with `np.nan` only initialized vectors #138

a-monsch · 2021-08-27T14:29:13Z

I am currently working with this package in combination with pandas on an assignment for study exercises in particle physics and have encountered an unexpected behavior. For a homogenized structure of my data I use np.nan to fill some missing values (particles). Consequently, it often comes to a situation where an initialization of a vector obj/array/... with np.nan occurs. This works perfectly fine, especially the property query, except for pseudorapidity, rapidity and other related quantities.

A small example:

import vector
import numpy as np

# initially, this part is stored inside a pandas DataFrame
v = np.array([[4.0, 3.0, 2.0, 1.0], [np.nan, np.nan, np.nan, np.nan]]).view(
                        [("E", float), ("px", float), ("py", float), ("pz", float)]).view(
                        vector.MomentumNumpy4D)


# this creates a [True, False] mask; as expected 
mask1 = np.abs(v.rapidity) < 2.5   

# This creates a [True, True] mask, since the second value is not np.nan but 0.0
mask2 = np.abs(v.pseudorapidity) < 2.5

My expectation would be that, in this case, any quantity of an object that is initialized with (only) np.nan should also return a np.nan and not an actual number.

Is this behavior, especially for a case described above, intended?

P.S.: I know that via vector.awk there is a possibility to work with inhomogeneous data structures, but in our workgroup we decided against that and for an approach using pandas.

The text was updated successfully, but these errors were encountered:

jpivarski · 2021-08-27T16:30:28Z

The "zero vs nan" decisions are determined in part by coordinate system—if you're in cylindrical coordinates, for example, a nan in phi won't affect a calculation of (pseudo)rapidity—and they're partly determined by what ROOT's TMath::LorentzVector does.

Since you're in Cartesian coordinates, the relevant calculator for pseudorapidity is

vector/src/vector/_compute/spatial/eta.py

Lines 30 to 31 in e39ddb1

    
           def xy_z(lib, x, y, z): 
        
               return lib.nan_to_num(lib.arctanh(z / lib.sqrt(x ** 2 + y ** 2 + z ** 2)), nan=0.0)

and yes, that deliberately replaces nan with zero.

Looking again at ROOT's implementation, ours is not quite right, since they would have an all-nan vector return nan. ROOT either calculates it from rho and z:

https://github.com/root-project/root/blob/30ab704cbe645d2b2f36cb071905e462771df386/math/genvector/inc/Math/GenVector/eta.h#L47-L77

or from theta:

https://github.com/root-project/root/blob/30ab704cbe645d2b2f36cb071905e462771df386/math/genvector/inc/Math/GenVector/eta.h#L84-L96

What we were trying to avoid with the nan_to_num was the case in which 0/0 would return nan. Empirically, we can see that ROOT makes that substitution:

>>> ROOT.Math.XYZVector(0, 0, 0).Eta()
0.0
>>> ROOT.Math.PxPyPzEVector(0, 0, 0, 0).Eta()
0.0

Let's explore this more:

>>> ROOT.Math.XYZVector(0, 0, np.nan).Eta()
nan
>>> ROOT.Math.XYZVector(0, np.nan, 0).Eta()
0.0
>>> ROOT.Math.XYZVector(np.nan, 0, 0).Eta()
0.0
>>> ROOT.Math.PxPyPzEVector(0, 0, 0, np.nan).Eta()
0.0
>>> ROOT.Math.PxPyPzEVector(0, 0, np.nan, 0).Eta()
nan
>>> ROOT.Math.PxPyPzEVector(0, np.nan, 0, 0).Eta()
0.0
>>> ROOT.Math.PxPyPzEVector(np.nan, 0, 0, 0).Eta()
0.0

Only a nan in the z coordinate results in the expression being nan.

We have an additional constraint that our formulas need to be pure expressions (i.e. no "if" statements) since they need to be vectorizable in all current and future backends. I suppose we could get the "z is nan ⇒ output is nan" behavior back by multiplying our expression

vector/src/vector/_compute/spatial/eta.py

Lines 30 to 31 in e39ddb1

    
           def xy_z(lib, x, y, z): 
        
               return lib.nan_to_num(lib.arctanh(z / lib.sqrt(x ** 2 + y ** 2 + z ** 2)), nan=0.0)

by lib.absolute(lib.sign(z)):

>>> z = 0; np.absolute(np.sign(z))
0
>>> z = 1e-16; np.absolute(np.sign(z))
1.0
>>> z = -1e-16; np.absolute(np.sign(z))
1.0
>>> z = 1e16; np.absolute(np.sign(z))
1.0
>>> z = -1e16; np.absolute(np.sign(z))
1.0
>>> z = np.nan; np.absolute(np.sign(z))
nan
>>> z = np.inf; np.absolute(np.sign(z))
1.0
>>> z = -np.inf; np.absolute(np.sign(z))
1.0

There might be other cases where this is needed; we'd need to do a general survey. I think that's what #36 is.

a-monsch · 2021-08-30T08:19:32Z

Thank you very much for the detailed and fast answer.

According to #36, you always compare with ROOT. In my first intention, this was not even the point, since it makes no sense to calculate anything at all from a vector that only contains np.nan, but nice to see that ROOT actually does it as expected, although only taking into account the z and phi components as you pointed out.

The solution with the usage of lib.absolute(lib.sign(z)) would fix the problem. Will this be implemented in the same or similar form in one of the next versions, or will it take longer, since the TODO list at #36 for the 4D vectors has not been addressed yet as far as I can see?

jpivarski · 2021-08-30T12:53:00Z

I'm implementing the lib.absolute(lib.sign(z)) solution now. PR #36 is waylaid because the author had to work on other things. We won't get complete parity with ROOT until those tests are added, but we can fix this one that you noticed now: it will be implemented in PR #139.

jpivarski linked a pull request Aug 30, 2021 that will close this issue

Let 'eta' be NaN if 'z' is NaN. #139

Merged

jpivarski closed this as completed in #139 Aug 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected behavior with `np.nan` only initialized vectors #138

Unexpected behavior with `np.nan` only initialized vectors #138

a-monsch commented Aug 27, 2021 •

edited

Loading

jpivarski commented Aug 27, 2021

a-monsch commented Aug 30, 2021

jpivarski commented Aug 30, 2021

Unexpected behavior with np.nan only initialized vectors #138

Unexpected behavior with np.nan only initialized vectors #138

Comments

a-monsch commented Aug 27, 2021 • edited Loading

jpivarski commented Aug 27, 2021

a-monsch commented Aug 30, 2021

jpivarski commented Aug 30, 2021

Unexpected behavior with `np.nan` only initialized vectors #138

Unexpected behavior with `np.nan` only initialized vectors #138

a-monsch commented Aug 27, 2021 •

edited

Loading