Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: Add page describing how to build Upper Limits table #57

Merged
merged 12 commits into from
Aug 18, 2023

Conversation

kratsg
Copy link
Contributor

@kratsg kratsg commented Aug 8, 2023

This came out of scikit-hep/pyhf#2263.

This describes how to build an upper limit table exactly to what ATLAS publishes in various papers, particularly for the SUSY standard tables. This builds on an existing published analysis EWK 2L2J and its corresponding HEPData entry.

The current rendering can be seen here.

  • Summarize commit messages into a comprehensive review of the PR
* Add rich to requirements.txt and rebuild lock file.
* Add notebook on procedure for building upper limits table common to
  search analyses.
* Add .DS_Store to .gitignore.

@kratsg kratsg added documentation Improvements or additions to documentation enhancement New feature or request labels Aug 8, 2023
Copy link
Member

@matthewfeickert matthewfeickert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't be able to give this a full review until later in the week, but I don't think we don't have rich in book/requirements.txt and so we need to add at least that there and then relock the book/requirements.lock.

@kratsg
Copy link
Contributor Author

kratsg commented Aug 8, 2023

@matthewfeickert - it seems black-jupyter is modifying the file in one way that gets undone by the nbqa-isort hook. Essentially something about a trailing new line that black-jupyter is adding that gets cleaned up by nbqa-isort.

$ pre-commit run black-jupyter --all-files
black-jupyter............................................................Failed
- hook id: black-jupyter
- files were modified by this hook

reformatted book/UpperLimitsTable.ipynb

All done! ✨ 🍰 ✨
1 file reformatted, 14 files left unchanged.

$ git diff
diff --git a/book/UpperLimitsTable.ipynb b/book/UpperLimitsTable.ipynb
index bf9b40d..03e6ae6 100644
--- a/book/UpperLimitsTable.ipynb
+++ b/book/UpperLimitsTable.ipynb
@@ -20,6 +20,7 @@
     "pyhf.set_backend(\"numpy\", \"minuit\")\n",
     "\n",
     "from rich import box\n",
+    "\n",
     "# for making tables\n",
     "from rich.console import Console\n",
     "from rich.table import Table"

$ pre-commit run nbqa-isort --all-files
nbqa-isort...............................................................Failed
- hook id: nbqa-isort
- files were modified by this hook

Fixing /Users/kratsg/pyhf-tutorial/book/UpperLimitsTable.ipynb

$ git diff
$

@kratsg
Copy link
Contributor Author

kratsg commented Aug 8, 2023

I won't be able to give this a full review until later in the week, but I don't think we don't have rich in book/requirements.txt and so we need to add at least that there and then relock the book/requirements.lock.

Why here and not binder/requirements.txt? I tried to run this on my Mac and got a segfault so I'm not sure how this lockfile resolution is working.

@matthewfeickert
Copy link
Member

Why here and not binder/requirements.txt?

Sorry, you're right. I was going from memory, but you do want that to be in binder/requirements.txt.

I tried to run this on my Mac and got a segfault so I'm not sure how this lockfile resolution is working.

Oh bummer. If you just run nox it segfaults on you? If so, can you just update binder/requirements.txt and I'll fix the lock file and then we can switch over to a Docker based solution for the lock file so that this doesn't happen?

@kratsg
Copy link
Contributor Author

kratsg commented Aug 8, 2023

Oh bummer. If you just run nox it segfaults on you? If so, can you just update binder/requirements.txt and I'll fix the lock file and then we can switch over to a Docker based solution for the lock file so that this doesn't happen?

nox seems to work, I was running it manually.. so not sure why there's a difference. I do get errors however, but unrelated to my changes

 nox
nox > Running session lock
nox > Creating virtual environment (virtualenv) using python in .nox/lock
nox > python -m pip install --upgrade pip setuptools wheel
nox > python -m pip install --upgrade pip-tools
nox > cat /Users/kratsg/pyhf-tutorial/binder/requirements.txt /Users/kratsg/pyhf-tutorial/book/requirements.txt
nox > pip-compile --resolver=backtracking --generate-hashes --output-file /Users/kratsg/pyhf-tutorial/book/requirements.lock /Users/kratsg/pyhf-tutorial/requirements.txt
  ERROR: Cannot install pyhf[contrib,minuit,xmlio]==0.7.2 because these package versions have conflicting dependencies.
Discarding numpy==1.25.1 (from -r /Users/kratsg/pyhf-tutorial/book/requirements.lock (line 915)) to proceed the resolution
  ERROR: Cannot install pyhf[contrib,minuit,xmlio]==0.7.2 because these package versions have conflicting dependencies.
Discarding scipy==1.11.1 (from -r /Users/kratsg/pyhf-tutorial/book/requirements.lock (line 1423)) to proceed the resolution
  ERROR: Cannot install -r /Users/kratsg/pyhf-tutorial/requirements.txt (line 3) because these package versions have conflicting dependencies.
Discarding ipython==8.14.0 (from -r /Users/kratsg/pyhf-tutorial/book/requirements.lock (line 539)) to proceed the resolution

@matthewfeickert
Copy link
Member

matthewfeickert commented Aug 10, 2023

I do get errors however, but unrelated to my changes

That's because you're apparently using Python 3.8

This file is autogenerated by pip-compile with Python 3.8

and the lock file was built assuming Python 3.10+ (again, I should Dockerize this, sorry). I'll fix this now and review hopefully tonight.

@matthewfeickert
Copy link
Member

Rebased to pick up PR #58 to get pyhf v0.7.3. Reviewing after I make dinner. 👍

* Correct some typos and note Table 20

* Change word order to make it clear the models are from the published
  analysis being described.

* Remove cell output (gets generated at build)

* Add abbreviations being used, add "DOI: " to make it clear what is being
  referenced.

* Add `level=0.05` kwarg to make it explicit that the 95% CL upper limit is
  being used.

* Add "ifb" to print statement to make it more explicit (already have it in
  a comment which is good)

* Capitalize Equation 52, and use arXiv abstract instead of PDF, use "b" for
  background only as b-only does not render well and it is already assumed
  if b is by itself that it is background only.

* Added "[fb]" to XSec column. Split out "S95Obs" to "S 95 Obs" as "95Obs" is
  difficult to read without mentally flipping the "0" to a zero.
  Can revise this as needed.

  Also just reduced CLb(S 95 Obs) to CLb to match table column better.

* As not all people acknowledged for their help are on the ATLAS Stats Forum
  (apologies if I'm wrong!) I thought this read better.
Copy link
Member

@matthewfeickert matthewfeickert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this is great — many thanks for writing this!

I made some minor language/typo/revision edits in the commit I pushed but I tried to make it clear what they all were in the commit message. Obviously they can all be rolled back if you disagree with them.

I've tried to just break down my review comments by section as trying to make comments in the notebook on GitHub isn't great. :( Happy to chat of course and clarify my thinking (it is after 02:00 when I write this).

I'd also say that you should double check my option on everything with @alexander-held.

  • Expected Events

References:

Is there a reason that these references are chosen? Is there some HEP reference paper on stats that cites the J. Phys. Chem. A 2001 paper? If not, let's check the PDG to see if this is included there, unless there was something particularly striking about this reference.

  • Discovery $p$-values

From Equation 52 in arXiv:1503.07622:

Maybe I need to read this section more carefully, but how does Equation 52 show the following statements? Regardless of how it does, can we make this more explicit here?

  • $p_{\mu}$ (and $\mathrm{CL}_b$)

$\mathrm{CL}_\mathrm{b}$ provides a measure of compatibility of the observed data with the 95% CL signal strength hypothesis relative to fluctuations of the background

This is mixing multiple ideas of an upper limit signal strength and the definition of $\mathrm{CL}_b$. $\mathrm{CL}_b$ is just a statement of the $p$-value of the background only hypothesis given the observed test-statistic, and then flipped with the $1-p_b$ part for geometric reasons for use in the ratio that gives $\mathrm{CL}_s$ — it is a construction that exists only by necessity for the $\mathrm{CL}_s$ method. The $\mathrm{CL}_b$ makes no statement on what is the observed test statistic that you use. Here we are choosing the test stat for the 95% CL upper limit but that is just a choice as we wanted a test size of $\alpha=0.05$, but it could have been anything. So I we should not include the test size in the definition (3 line equality) of $\mathrm{CL}_b$.

See the paragraph above plot on page two in this document.

Is there another reference that we can use? I know that this one comes up in search results because it is the only bloody thing that isn't firewalled by ATLAS, but I've never been a huge fan of this note and it kinda bugs me that it doesn't have any stable reference. It isn't even on CDS (private to ATLAS) as far as I know, but I would love to be wrong on this!

  • $p_0$

$p(s)=0$ measures compatibility of the observed data with the background-only (zero signal strength) hypothesis relative to fluctuations of the background.

$p$-values do not measure compatibility of a model and data. I know that the caption for Table 20 literally says this, but it is wrong. $p$-values are the probability given a model hypothesis to have observed a test statistic as extreme or more so (extreme in terms of disagreement with the model hypothesis) than the observed test statistic. They can not make statements on model compatibility with the data. So I would suggest we rephrase this section.

  • Relationship to $\mathrm{CL}_s$

I'm not really crazy about these slides in general and I feel like while they do provide context in the lecture he gave that they don't actually help here. So I would personally suggest removing this section and that we write up something better later.

Making the table

  • It would probably be useful to mention that DRInt_cuts should correspond to Signal Region "DR-Int-EWK" and comment on why the results we get here are slightly different form the published table. This is maybe beating the point over the head, but I think it would help (would help 02:00 Matthew!).

Once again though, I think this is an excellent educational material and I'm super thankful that you went into a deep dive to write it. 🚀

@kratsg
Copy link
Contributor Author

kratsg commented Aug 18, 2023

Is there a reason that these references are chosen? Is there some HEP reference paper on stats that cites the J. Phys. Chem. A 2001 paper? If not, let's check the PDG to see if this is included there, unless there was something particularly striking about this reference.

39.2.5 has a really small blurb and not useful for anyone imo. The 2001 paper is not only the top result when googling for the method but it's also been linked by stats forum in emails and by @alexander-held as well. I'm not sure there's anything better.

Maybe I need to read this section more carefully, but how does Equation 52 show the following statements? Regardless of how it does, can we make this more explicit here?


you'd have to read through equation 52 to see how it sets the value of $\mu$ here in order to determine the hypotheses. It is written more explicitly by writing out what the hypotheses are.

This is mixing multiple ideas of an upper limit signal strength and the definition of CLb. CLb is just a statement of the p-value of the background only hypothesis given the observed test-statistic, and then flipped with the 1−pb part for geometric reasons for use in the ratio that gives CLs — it is a construction that exists only by necessity for the CLs method. The CLb makes no statement on what is the observed test statistic that you use. Here we are choosing the test stat for the 95% CL upper limit but that is just a choice as we wanted a test size of α=0.05, but it could have been anything. So I we should not include the test size in the definition (3 line equality) of CLb.

This is the official ATLAS/CMS statement here. I didn't change anything because this is agreed upon. Take it up with those collaborations from their decision in like 2012. As in, this is the definition that has been used to make the tables in quite nearly every single SUSY paper published by both collaborations. Which is why it has to be written out more explicitly here, or nobody will figure out how to get those numbers.

Is there another reference that we can use? I know that this one comes up in search results because it is the only bloody thing that isn't firewalled by ATLAS, but I've never been a huge fan of this note and it kinda bugs me that it doesn't have any stable reference. It isn't even on CDS (private to ATLAS) as far as I know, but I would love to be wrong on this!

Nope.

-values do not measure compatibility of a model and data. I know that the caption for Table 20 literally says this, but it is wrong. p-values are the probability given a model hypothesis to have observed a test statistic as extreme or more so (extreme in terms of disagreement with the model hypothesis) than the observed test statistic. They can not make statements on model compatibility with the data. So I would suggest we rephrase this section.

The caption is the official agreed upon text from the ATLAS Stats Committee + ATLAS SUSY group. That will have to be taken up with them. I can even link you the corresponding twikis that have this phrasing: https://twiki.cern.ch/twiki/bin/view/AtlasProtected/StandardSUSYRefs#Suggested_text_agreed_with_StatF

I'm not really crazy about these slides in general and I feel like while they do provide context in the lecture he gave that they don't actually help here. So I would personally suggest removing this section and that we write up something better later.

These are added until something better comes along which is not going to be covered in this notebook so trivially which is already complicated enough.

  • It would probably be useful to mention that DRInt_cuts should correspond to Signal Region "DR-Int-EWK" and comment on why the results we get here are slightly different form the published table. This is maybe beating the point over the head, but I think it would help (would help 02:00 Matthew!).

I don't know for sure why they're different. Part of the problem is that the table up there has some additional rounding that comes into play in certain pieces. I already confirmed that the calculations I show match up with the calculations from regular RooFit/RooStats -- so there's other stuff going on. The differences are close enough given the larger uncertainties/variations that I'm not concerned anyone will be confused that it's not exactly the same.


Just remember this notebook isn't really anything particularly from my own words, but just combining all the various bits and pieces scattered across various documents into one place, to make sure everything that goes into making that table is actually described. Whether or not the descriptions are correct/factual is up to those documents unfortunately.

book/UpperLimitsTable.ipynb Outdated Show resolved Hide resolved
book/UpperLimitsTable.ipynb Outdated Show resolved Hide resolved
@matthewfeickert
Copy link
Member

Just remember this notebook isn't really anything particularly from my own words.
...
Whether or not the descriptions are correct/factual is up to those documents unfortunately.

I don't understand. This is not an ATLAS document and it is not a summary of ATLAS documents, it is a summary of a procedure in our own words. We're under no constraints to write things that aren't correct because they appear somewhere else or to have to use language that was written once a decade ago by someone in ATLAS and then never updated.

@alexander-held
Copy link

A few comments on discussion items: I think I got the reference for error propagation from either Google or some Stackoverflow thread. A better reference if we were to write a paper is probably some textbook, but I think this one does its job well.

I don't think the figures from Alex fit in here really well. They would at least need a detailed explanation to connect them to the text. Right now they feel disconnected.

For the CLs reference: alternatives might be the CLs paper (probably harder to digest). The statistics info note for the Higgs discovery https://cds.cern.ch/record/1379837 also has something on this I believe.

* Add back additional information to tabel label for CLb

* Add back language of ATLAS Stats Forum
@kratsg
Copy link
Contributor Author

kratsg commented Aug 18, 2023

I don't understand. This is not an ATLAS document and it is not a summary of ATLAS documents, it is a summary of a procedure in our own words. We're under no constraints to write things that aren't correct because they appear somewhere else or to have to use language that was written once a decade ago by someone in ATLAS and then never updated.

We can drop the images for sure. The problem is more that people in ATLAS and CMS (and outside) will use pyhf to make these tables, but also to reproduce what is in the papers. Right now, that procedure as described in the notebook is exactly what is being done. We could make this an ATLAS-internal only thing instead (but we don't have an ATLAS-internal pyhf documentation -- and this also impacts CMS as they are using the same definitions). Also this language has gotten updated somewhat frequently.

@matthewfeickert
Copy link
Member

matthewfeickert commented Aug 18, 2023

@kratsg I think we're going to have a lot of discussion on these points, but you've also done a lot of work here that would be better served getting merged in and then having us revise it as needed. I'd propose that you apply @alexander-held's comments, basically just

I don't think the figures from Alex fit in here really well. They would at least need a detailed explanation to connect them to the text. Right now they feel disconnected.

We can drop the images for sure.

and then we merge this and I open up a GitHub Issue from my original comments where we can discuss this and then make any necessary revisions in a follow up PR.

I also think it is fine to drop the images even though you mentioned

These are added until something better comes along which is not going to be covered in this notebook so trivially which is already complicated enough.

as I plan to do exactly this in scikit-hep/pyhf#2274.


I want to "get things right" (to my mind) usually in the first pass, but I also realize that I slow down our code velocity quite a bit with this and so I'm trying (bear with me while I slowly work on growing here) to take a few more pages out of Matt Rocklin's views on the purpose and roles of review: Small Scope and Fast Review: The secret to happy devs

The problem is more that people in ATLAS and CMS (and outside) will use pyhf to make these tables, but also to reproduce what is in the papers. Right now, that procedure as described in the notebook is exactly what is being done.

I agree. The procedure is fine! Things that I want to change are not the procedure, but the description of what is happening in the procedure. Some of the statements (made by others that you are just reproducing, so not an attack on you) are just not true at all. We don't have any obligation to repeat such things, and I would argue we should actively avoid doing so!

We could make this an ATLAS-internal only thing instead (but we don't have an ATLAS-internal pyhf documentation -- and this also impacts CMS as they are using the same definitions).

No, I would say that we should actively avoid making anything pyhf related ATLAS internal. There are obvious things that only can be done with ATLAS at the moment (i.e. full probability models) and if we inside of ATLAS want to produce internal resources for the collaboration that use pyhf that's fine (but why make it internal?), but as a pyhf team product I think it is important that everything is always public.

So we're in agreement here about wanting to avoid this. 👍

Also this language has gotten updated somewhat frequently.

Then it should hopefully be easier to correct internally at some point in the future.

@kratsg
Copy link
Contributor Author

kratsg commented Aug 18, 2023

Done, images dropped. I added a quick blurb for the link to scikit-hep/pyhf#2274 and that should cover that portion of the section there.

Thanks for the thorough review (admittedly, I still haven't gotten any feedback from the Stats Forum on this notebook).

Copy link
Member

@matthewfeickert matthewfeickert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kratsg Thanks for this. Can you check if the commit message summary I added to the PR body looks good? If so, go ahead and merge when ready.

@kratsg kratsg merged commit b26dab0 into main Aug 18, 2023
4 checks passed
@kratsg kratsg deleted the feat/addULTablePage branch August 18, 2023 23:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants