Skip to content

Commit

Permalink
cms-2016-simulated-datasets: richer dataset sample
Browse files Browse the repository at this point in the history
Uses 1k CMS 2016 MC dataset for a richer dataset sample.

Enriches the documentation.

Adds output files to global `.gitignore` file.
  • Loading branch information
tiborsimko committed Jan 16, 2024
1 parent f6a86a7 commit d037721
Show file tree
Hide file tree
Showing 4 changed files with 4,223 additions and 64 deletions.
62 changes: 33 additions & 29 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,43 +1,34 @@
# Environments
*.err
*.pyc
.env
.venv
env/
venv/

*.pyc
*.err
cms-2010-collision-datasets/outputs/*.json
cms-2010-simulated-datasets/outputs/*.json
cms-2011-collision-datasets-runb-update/inputs/config-store
cms-2011-collision-datasets-runb-update/inputs/das-json-config-store
cms-2011-collision-datasets-runb-update/inputs/das-json-store
cms-2011-collision-datasets-runb-update/outputs/*.json
cms-2011-collision-datasets/code/das.py
cms-2011-collision-datasets/inputs/das-json-store
cms-2011-collision-datasets/outputs/*.xml
cms-2011-collision-datasets-runb-update/inputs/das-json-store
cms-2011-collision-datasets-runb-update/inputs/das-json-config-store
cms-2011-collision-datasets-runb-update/inputs/config-store
cms-2011-collision-datasets-runb-update/outputs/*.json
cms-2011-hlt-triggers/outputs/*.html
cms-2011-hlt-triggers/outputs/*.xml
cms-2011-l1-triggers/outputs/*.xml
cms-2011-simulated-datasets/inputs/das-json-store
cms-2011-simulated-datasets/outputs/*.xml
cms-2012-collision-datasets/inputs/das-json-store
cms-2012-collision-datasets/outputs/*.json
cms-2012-collision-datasets-update/inputs/das-json-store
cms-2012-collision-datasets-update/inputs/das-json-config-store
cms-2012-collision-datasets-update/inputs/config-store
cms-2012-collision-datasets-update/inputs/das-json-config-store
cms-2012-collision-datasets-update/inputs/das-json-store
cms-2012-collision-datasets-update/outputs/*.json
cms-2012-collision-datasets/inputs/das-json-store
cms-2012-collision-datasets/outputs/*.json
cms-2012-event-display-files/inputs/ig/
cms-2012-event-display-files/outputs/*.json
cms-2012-simulated-datasets/inputs/config-store
cms-2012-simulated-datasets/inputs/das-json-store
cms-2012-simulated-datasets/outputs/*.json
cms-2012-simulated-datasets/outputs/create-config-store.sh
cms-2012-simulated-datasets/outputs/create-das-json-store.sh
cms-2012-simulated-datasets/outputs/*.json
cms-2013-hlt-triggers/outputs
cms-2013-simulated-datasets-hi/inputs/das-json-store
cms-2013-simulated-datasets-hi/inputs/mcm-store
cms-2013-simulated-datasets-hi/inputs/config-store
cms-2013-simulated-datasets-hi/outputs/
cms-2013-collision-datasets-hi-ppref/inputs/config-store
cms-2013-collision-datasets-hi-ppref/inputs/das-json-config-store
cms-2013-collision-datasets-hi-ppref/inputs/das-json-store
Expand All @@ -46,25 +37,38 @@ cms-2013-collision-datasets-hi/inputs/config-store
cms-2013-collision-datasets-hi/inputs/das-json-config-store
cms-2013-collision-datasets-hi/inputs/das-json-store
cms-2013-collision-datasets-hi/outputs/*.json
cms-2015-collision-datasets/inputs/das-json-store
cms-2015-collision-datasets/inputs/das-json-config-store
cms-2015-collision-datasets/outputs/*.json
cms-2013-hlt-triggers/outputs
cms-2013-simulated-datasets-hi/inputs/config-store
cms-2013-simulated-datasets-hi/inputs/das-json-store
cms-2013-simulated-datasets-hi/inputs/mcm-store
cms-2013-simulated-datasets-hi/outputs/
cms-2015-collision-datasets-hi-ppref/inputs/config-store
cms-2015-collision-datasets-hi-ppref/inputs/das-json-store
cms-2015-collision-datasets-hi-ppref/inputs/das-json-config-store
cms-2015-collision-datasets-hi-ppref/inputs/das-json-store
cms-2015-collision-datasets-hi-ppref/outputs/*.json
cms-2015-collision-datasets/inputs/das-json-config-store
cms-2015-collision-datasets/inputs/das-json-store
cms-2015-collision-datasets/outputs/*.json
cms-2015-simulated-datasets/inputs/config-store
cms-2015-simulated-datasets/inputs/das-json-store
cms-2015-simulated-datasets/inputs/mcm-store
cms-2015-simulated-datasets/inputs/config-store
cms-2015-simulated-datasets/outputs/
cms-2015-simulated-datasets/lhe_generators
cod2-to-cod3/outputs/*.json
opera-2017-multiplicity-studies/outputs/opera-events.json
cms-2015-simulated-datasets/outputs/
cms-2016-simulated-datasets/cookies.txt
cms-2016-simulated-datasets/inputs/config-store
cms-2016-simulated-datasets/inputs/das-json-store
cms-2016-simulated-datasets/inputs/mcm-store
cms-2016-simulated-datasets/lhe_generators
cms-2016-simulated-datasets/outputs/
cms-YYYY-simulated-datasets/cache
cms-YYYY-simulated-datasets/outputs/*.csv
cms-YYYY-simulated-datasets/outputs/*.err
cms-YYYY-simulated-datasets/outputs/*.json
cod2-to-cod3/outputs/*.json
cod2-to-cod3/outputs/*.json
env/
opera-2017-multiplicity-studies/outputs/opera-events.json
opera-2017-multiplicity-studies/outputs/opera-events.json
opera-2019-neutrino-induced-charm/outputs/opera-events.json
opera-2019-electron-neutrinos/outputs/opera-events.json
opera-2019-neutrino-induced-charm/outputs/opera-events.json
venv/
67 changes: 41 additions & 26 deletions cms-2016-simulated-datasets/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,11 @@
This directory contains helper scripts used to prepare CMS 2016 open data
release regarding MC simulated datasets.


- `code/` folder contains the python code.
- `code/` folder contains the python code;
- `inputs/` folder contains input text files with the list of datasets for each
year and input files.
year and input files;
- `outputs/` folder contains generated JSON records to be included as the CERN
Open Data portal fixtures.

Every step necessary to produce the final `*.json` files is handled by the
`cmc-mc/interface.py` script. Details about it can be queried with the command:
Expand All @@ -15,61 +16,75 @@ Every step necessary to produce the final `*.json` files is handled by the
$ python3 code/interface.py --help
```

Make sure to start voms-proxy before creating cache
Please make sure to get the VOMS proxy file before running these scripts:

```console
$ voms-proxy-init --voms cms --rfc --valid 190:00
```

Set the eos path with
Please make sure to set the EOS instance to EOSPUBLIC before running these scripts:

```console
$ export EOS_MGM_URL=root://eospublic.cern.ch
```
Please make sure to have a valid `userkey.nodes.pem` certificate present in
`$HOME/.globus`. If not, you have to run the following on top of the regular
CMS certificate documentation:

```console
$ cd $HOME/.globus
$ ls userkey.nodes.pem
$ openssl pkcs12 -in myCert.p12 -nocerts -nodes -out userkey.nodes.pem # if not present
$ cd -
```

Warning: creating the full local cache might take a long time!
Warning: Creating the full local cache might take a long time.

First step is to create EOS file index cache:

```console
$ python3 ./code/interface.py --create-eos-indexes ../cms-YYYY-simulated-datasets/inputs/CMS-2016-mc-datasets.txt
$ time python3 ./code/interface.py --create-eos-indexes inputs/CMS-2016-mc-datasets.txt
```

This requires the file to be in place in their final location.

For early testing, on lxplus, all steps can be run without the EOS file index cache with the flag `--ignore-eos-store`.

To build sample records (with a limited number of datasets in the input file) do the following:
This requires the data files to be placed in their final location. However, for
early testing on LXPLUS, all steps can be run without the EOS file index cache
by means of adding the command-line option `--ignore-eos-store` to the commands below.

We can now build sample records by doing:

```console
$ python3 ./code/interface.py --create-das-json-store --ignore-eos-store DATASET_LIST
$ time python3 ./code/interface.py --create-das-json-store --ignore-eos-store inputs/CMS-2016-mc-datasets.txt

$ auth-get-sso-cookie -u https://cms-pdmv.cern.ch/mcm -o cookies.txt
$ python3 ./code/interface.py --create-mcm-store --ignore-eos-store DATASET_LIST
$ time python3 ./code/interface.py --create-mcm-store --ignore-eos-store inputs/CMS-2016-mc-datasets.txt

$ openssl pkcs12 -in myCert.p12 -nocerts -nodes -out userkey.nodes.pem # if not present
$ python3 ./code/interface.py --get-conf-files --ignore-eos-store DATASET_LIST
$ time python3 ./code/interface.py --get-conf-files --ignore-eos-store inputs/CMS-2016-mc-datasets.txt

$ python3 code/lhe_generators.py
$ time python3 code/lhe_generators.py

$ python3 ./code/interface.py --create-records --ignore-eos-store DATASET_LIST
$ python3 ./code/interface.py --create-conffiles-records --ignore-eos-store DATASET_LIST
$ time python3 ./code/interface.py --create-records --ignore-eos-store inputs/CMS-2016-mc-datasets.txt
$ time python3 ./code/interface.py --create-conffiles-records --ignore-eos-store inputs/CMS-2016-mc-datasets.txt
```

Note that to build the test records an (empty) input file for DOI's and a recid info file must be present in the inputs directory.
Each step builds a subdirectory with a cache (`das-json-store`, `mcm-store` and `config-store`). They are large, do not upload them to the repository.
Note that to build the test records an (empty) input file for DOIs and a recid
info file must be present in the inputs directory.

The output json file for dataset records go to the `outputs` directory.
Each step builds a subdirectory with a cache (`das-json-store`, `mcm-store` and
`config-store`). They are large, do not upload them to the repository, respect
the `.gitignore`.

The output JSON files for the dataset records will be generated in the
`outputs` directory.

## lhe_generators


```console
python3 code/lhe_generators.py 2> errors > output &
```
- This will get lhe generator parameters from gridpacks for datasets listed in `./inputs/CMS-2016-mc-datasets.txt`
- It works on lxplus or with mounted EOS
- number of threads is set to 20 which is ideal for lxplus

> :warning: There are many cases with various steps to get generator parameters for LHE -see [#97](https://github.com/cernopendata/data-curation/issues/97)-. Thus, in some few cases, the script MIGHT not work as expected so make sure to read it, check errors, and make any necessary tweaks
- This will get lhe generator parameters from gridpacks for datasets listed in `./inputs/CMS-2016-mc-datasets.txt`.
- It works on LXPLUS or with mounted EOS.
- Number of threads is set to 20 which is ideal for LXPLUS.

> :warning: There are many cases with various steps to get generator parameters for LHE -see [#97](https://github.com/cernopendata/data-curation/issues/97)-. Thus, in some few cases, the script MIGHT not work as expected so make sure to read it, check errors, and make any necessary tweaks
Loading

0 comments on commit d037721

Please sign in to comment.