Skip to content
This repository has been archived by the owner on Jul 24, 2024. It is now read-only.

Upstream master -> dev (this seems hard for some reason) #431

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

jkamins7
Copy link
Contributor

@jkamins7 jkamins7 commented Aug 6, 2021

  • Add an environment variable that can be used for writing uniquely named output files across blocks of jobs from AWS batch

  • RStudio in the Docker container

RStudio is now available in the Docker container, which allows development and EDA with the same set of packages as is run in production.

Co-authored-by: Joseph Lemaitre [email protected]

  • Move the spatial setup outside of the scenarios loop since it's expensive to load and doesn't change per scenario.

  • Removing source for packages installable from cran

  • Updated the python rules for reticulate (tests still pass)

  • Removing source for packages installable from cran

  • Updated the python rules for reticulate (tests still pass)

  • Updated based on review

  • Fixed filter issues with makefile setup in case dynfilter isn't provided in config

  • Updated makefile

  • Reduce hospitalization memory pressure

Switch a critical split-apply-combine away from do.call(), which results in a 45% reduction in memory usage and a 35% speedup in execution time in my testing.

  • Packrat (Packrat #253)

  • Adding final form of previous packrat + docker setup after merging weirdness

  • Switching .so to git lfs

  • Removing source for packages installable from cran

  • Updated the python rules for reticulate (tests still pass)

  • Updated based on review

  • Updated to use dev's docker instead of dataseed's

  • Added reticulate zoo and xts

  • Updated docker with git-lfs

  • Packrat (Packrat #267)

  • Adding final form of previous packrat + docker setup after merging weirdness

  • Switching .so to git lfs

  • Removing source for packages installable from cran

  • Updated the python rules for reticulate (tests still pass)

  • Updated based on review

  • Updated to use dev's docker instead of dataseed's

  • Added reticulate zoo and xts

  • Updated docker with git-lfs

  • Updating docker to install current versions of local packages

  • Update .Rprofile

  • Update dockerhub.yaml

  • Update aws.yaml

  • Yet another packrat attempt

  • Update ci.yml

  • Generic version of the batch job launcher/runner (Generic version of the batch job launcher/runner #257)

  • Generic version of batch from the union of jwills_dfU_run and dataseed_batch2

  • Fixes from running stuff on some test jobs

  • Add a vcpu CLI option and update sims_per_job to refer to slots per job

Co-authored-by: jkamins7 [email protected]

The biggest cost in a single sim SEIR run was importation of Numba and the JIT compilation. Change this to compile ahead of time, which results in a nice 60% lift in one run SEIR performance by saving these startup costs---which will be valuable for our large inference runs.

Minor performance benefit when running many simulations as JIT costs are amortized away.

Benchmark #1: single sim JIT compilation (current)
  Time (mean ± σ):     13.429 s ±  0.537 s
  Range (min … max):   12.973 s … 14.867 s    100 runs

Benchmark #2: single sim AOT compilation (new)
  Time (mean ± σ):      5.129 s ±  0.125 s
  Range (min … max):    4.901 s …  5.364 s    100 runs
  • Add Python build directory to .gitignore

  • Integrate build_US_setup into pipeline and... (Integrate build_US_setup into pipeline and... #271)

  • Add hard-coded territory data to build_US_setup

  • Create csv of island area census data since it cannot be accessed by API

  • Change the report targets to follow the conventions of make_makefile

  • Integrate build_US_setup into pipeline

  • Some bug fixes

  • git lfs pull of commute_data.csv and switch docker image

  • Update ci.yml

  • Update ci.yml

  • Remove generated files

  • Update make_makefile.R

  • Update run_tests.py

  • pull census year from config

  • Use census year from config to build_US_setup

  • Update build_US_setup.R

Co-authored-by: eclee25 [email protected]

  • Add check to hospitalization that geodata geoids are in geoid-params.csv (Add check to hospitalization that geodata geoids are in geoid-params.csv #283)

  • added state level script for creating csv reporting out quantiles

  • Fixed a slight bug with static dates and added full geographic extent version of the quantile generation script

  • Added countylevel script

  • Varios fixes and updates to post run summarization scripts.

  • Integrate QuantileSummarizeGeoExtent.R into pipeline (untested)

  • Integrate QuantileSummarizeGeoExtent.R into pipeline

  • Create QuantileSummarizeGeoidLevel.py

  • Working on the python script

  • Integrate quantile scripts into Makefile

  • Delete QuantileSummarizeGeoidLevel.py

  • perf fix for quantile_report_script

  • QuantileSummarizeGeoidLevel on Apache Spark

This commit includes a Python implementation of QuantileSummarizeGeoidLevel.R running on Apache Spark. The job essentially computes quantiles grouped by geoid and time whereby Spark provides the shuffle and quantile estimation mechanism to perform this aggregation efficiently. The job can be run locally within the container (fine for USA run but takes ~45mins on a r5.24xlarge) or distributed on Amazon EMR. This commit adds Spark and consequently Java inside the container.

  • add --name_filter to quantile_summarize_geoid_level as per feedback

  • Adjust quantile scripts so they all have the same interface

  • Fixed bug in both R scripts where num_files was set incorrectly
  • Adjust quantile_summarize_geoid_level.py to take scenarios (+ config file) versus path names as input to mimic the interface of the other scripts
  • Revert make_makefile.R to dev branch version

  • setup file for international countries

  • Fatiguing NPI

  • tested MVP

  • other implementation, maybe cleaner

  • update to hosp_run to take specified geoid-params

  • Added mild infections as output of hospitalization

  • minor

  • Hospitalization package update

  • dev setup

  • fixed rate

  • adding apl deployment to ecr

  • international seeding and setup files created

  • Update to report template docs for country reports

  • update to non-US scripts

  • update to international branch country setup

  • non-US setup Rmd and other scripts finished.

  • update

  • minor print edit

  • updates to script to make international functional with master

  • minor update to report and setup scripts

  • setup fix

  • non-us update

  • dev setup relative min

  • relative min ready

    1. Added integration tests for US and non-US create_seeding.R and build_US_setup.R/build_nonUS_setup.R
  1. create_seeding.R now has the option to choose "CSSE" or "USAFacts" for a US run.
  • Delete jhucsse_case_data_crude.csv

accidental data commit

  • vignette fix

  • Removed man folders from packages

  • fixes in the international branch before the merge

  • Do not update packages

  • Update covidImportation to v1.6.1

  • minor fix

  • fix non-US setup

  • Update local_install.R

  • Fix merge error

  • Reload covidImportation v1.6.1 to fix tidyverse dependency

  • seeding update with inputted incidence multiplier

  • minor names fix

  • Minor fixes to build_US and build_nonUS integration tests

  • deleted a comma

  • minor bug fix

  • Fix reversed international tag

  • fixed error message

  • fixed python error

  • minor

  • Adding updated severity parameters

  • fixing US seeding

  • adding print message

  • Update covidImportation with bug fix

  • minor update

  • Fix filter issue

  • integration testing fixes

  • Non-US makefile added. This should actually work fine for US as well. It also adds the ability to use the setup_name from the config to add a file prefix to model outputs, and then only clean those model outputs when running "make clean".

  • make_makefile.R now includes both US and non-US functionality

  • make_makefile white space fix

  • Add tictoc package to dev docker

  • Updated to fix a docker bug

  • Report devel2 into dev (Report devel2 into dev #352)

  • updates to state template

  • fix load_cum_inf_geounit_dates to use hosp only

  • add hosp method chunks from report_devel

  • adding generic mapping function

  • removing grouping by time for appropriate cumsum in load_cum_inf

  • fixing error in load_cum_inf

  • add ventilator to scenario tbl

  • add warning about loading infections from hosp data

  • deprecate old functions, integration testing temp

  • recreating clean NAMESPACE to remove export of setup_testing_environment preventing pkg install

  • adding sim_num before post_process in load_hosp_sims_filtered for output that does not contain sim_num but requires it for post-processing

  • adding warning about variable name to load_hosp_geounit_threshold

  • moving make_excess_heatmap to deprecated functions

  • prep report_devel2 for dev merge (prep report_devel2 for dev merge #351)

  • Version with pyarrow included

  • Dependencies for arrow in R as well

  • Fixed check_model script

  • Updated for feather integration

  • Updated test cases since n is reserved in yml

  • adding make_excess_heatmap function for hosp outcomes

  • Fixing parallelization mistake

  • Minor fixes

  • Use the "optimize" covidImportation version
  • Always upgrade local packages if upgrade available (vs silently ignore)
  • check_model_reports should ensure axis are dates
  • new figure relative to threshold heatmap

  • Update importation.R to match covidImportation package updates

  • Updated model code to use the new covidImportation package, and also seed to E instead of I (and keep population fixed

  • Fixed typo

  • Final fix to avoid numba

  • Fixed path to install_local script

  • Added package

  • Fixed seeding creation

  • rm NAs and fix create_seeding.R

  • add new cum hosp/deaths check to check_models scr

  • update indexes in check model script

  • long form mobility

  • Update reference to geoid-params.csv inside of hosp_run.R

  • 10x seeding file

  • Write the npi when writing parquet output

  • template

  • report after simulation

  • Removed geodata read from hosp_run.R since it's not being used

  • Updated things that feed into mobility

  • Updated build_US_setup.R to account for the move

  • These files got removed in a previous commit

  • Removing unused (as far as I can tell anyway) data

  • Fix bug when the places are also a number

  • Changing back test cases to use size/prob instead of n/p

  • Updated name to pass checks on case sensitive OS

  • Updated to use file_extension argument`

  • Fix broken tests, though I recommend we eliminate the mean and var checks since they'll be flaky

  • Updated build_US_setup.R to work with the current setup

  • Renamed parameters to avoid confusion; print out simid as 9 digits

SEIR and hospitalization phases have more standardized file format

  • read parquet file times correctly

  • Revert "read parquet file times correctly"

This reverts commit 521dd25.

Co-authored-by: hrmeredith12 [email protected]

  • Report devel (Report devel #208)

  • fix unit test code

  • fix unit test for real

  • fix unit tests

  • adding ability to filter geoids in relative heatmap function

  • adding template for county-specific report for a given state

  • lower tolerance for distribution tests

  • planning_models chunk

  • planning scenario chunk

  • add names to dev team

Co-authored-by: eclee25 [email protected]
Co-authored-by: Kyra Grantz [email protected]
Co-authored-by: hrmeredith12 [email protected]

Co-authored-by: hrmeredith12 [email protected]
Co-authored-by: Elizabeth Lee [email protected]

  • Delete build-model-input.R (Delete build-model-input.R #217)

  • Dataseed merge (Dataseed merge #215)

  • Adding Javier

  • Adding commute data back in

  • rm fixed param and comment out bad plot

  • commit namesapce report gen

  • fix NVentCurr name

  • formatting changes to county report template, removing defaults that should be modified for each report

  • adding references for county report template

  • change importation seeding

  • table formatting

  • limitations chunk considering age specific hosp calculations

  • removing build_hospdeath_geoid_par - old version not used in hosprun.R

  • removing legacy hospitalization scripts. everything runs through hosp_run.R now

  • using current default durations to minimize confusion

Co-authored-by: hrmeredith12 [email protected]
Co-authored-by: Elizabeth Lee [email protected]
Co-authored-by: Kyra Grantz [email protected]

Co-authored-by: jkamins7 [email protected]

Co-authored-by: Josh Wills [email protected]
Co-authored-by: kkintaro [email protected]

Co-authored-by: Elizabeth Lee [email protected]

  • RStudio in the Docker container

RStudio is now available in the Docker container, which allows development and EDA with the same set of packages as is run in production.

Co-authored-by: Joseph Lemaitre [email protected]

  • Move the spatial setup outside of the scenarios loop since it's expensive to load and doesn't change per scenario.

  • Removing source for packages installable from cran

  • Updated the python rules for reticulate (tests still pass)

  • Removing source for packages installable from cran

  • Updated the python rules for reticulate (tests still pass)

  • Updated based on review

  • Fixed filter issues with makefile setup in case dynfilter isn't provided in config

  • Updated makefile

  • Reduce hospitalization memory pressure

Switch a critical split-apply-combine away from do.call(), which results in a 45% reduction in memory usage and a 35% speedup in execution time in my testing.

  • Packrat (Packrat #253)

  • Adding final form of previous packrat + docker setup after merging weirdness

  • Switching .so to git lfs

  • Removing source for packages installable from cran

  • Updated the python rules for reticulate (tests still pass)

  • Updated based on review

  • Updated to use dev's docker instead of dataseed's

  • Added reticulate zoo and xts

  • Updated docker with git-lfs

  • Packrat (Packrat #267)

  • Adding final form of previous packrat + docker setup after merging weirdness

  • Switching .so to git lfs

  • Removing source for packages installable from cran

  • Updated the python rules for reticulate (tests still pass)

  • Updated based on review

  • Updated to use dev's docker instead of dataseed's

  • Added reticulate zoo and xts

  • Updated docker with git-lfs

  • Updating docker to install current versions of local packages

  • Update .Rprofile

  • Update dockerhub.yaml

  • Update aws.yaml

  • Yet another packrat attempt

  • Update ci.yml

  • Generic version of the batch job launcher/runner (Generic version of the batch job launcher/runner #257)

  • Generic version of batch from the union of jwills_dfU_run and dataseed_batch2

  • Fixes from running stuff on some test jobs

  • Add a vcpu CLI option and update sims_per_job to refer to slots per job

Co-authored-by: jkamins7 [email protected]

The biggest cost in a single sim SEIR run was importation of Numba and the JIT compilation. Change this to compile ahead of time, which results in a nice 60% lift in one run SEIR performance by saving these startup costs---which will be valuable for our large inference runs.

Minor performance benefit when running many simulations as JIT costs are amortized away.

Benchmark #1: single sim JIT compilation (current)
  Time (mean ± σ):     13.429 s ±  0.537 s
  Range (min … max):   12.973 s … 14.867 s    100 runs

Benchmark #2: single sim AOT compilation (new)
  Time (mean ± σ):      5.129 s ±  0.125 s
  Range (min … max):    4.901 s …  5.364 s    100 runs
  • Add Python build directory to .gitignore

  • Integrate build_US_setup into pipeline and... (Integrate build_US_setup into pipeline and... #271)

  • Add hard-coded territory data to build_US_setup

  • Create csv of island area census data since it cannot be accessed by API

  • Change the report targets to follow the conventions of make_makefile

  • Integrate build_US_setup into pipeline

  • Some bug fixes

  • git lfs pull of commute_data.csv and switch docker image

  • Update ci.yml

  • Update ci.yml

  • Remove generated files

  • Update make_makefile.R

  • Update run_tests.py

  • pull census year from config

  • Use census year from config to build_US_setup

  • Update build_US_setup.R

Co-authored-by: eclee25 [email protected]

  • Add check to hospitalization that geodata geoids are in geoid-params.csv (Add check to hospitalization that geodata geoids are in geoid-params.csv #283)

  • added state level script for creating csv reporting out quantiles

  • Fixed a slight bug with static dates and added full geographic extent version of the quantile generation script

  • Added countylevel script

  • Varios fixes and updates to post run summarization scripts.

  • Integrate QuantileSummarizeGeoExtent.R into pipeline (untested)

  • Integrate QuantileSummarizeGeoExtent.R into pipeline

  • Create QuantileSummarizeGeoidLevel.py

  • Working on the python script

  • Integrate quantile scripts into Makefile

  • Delete QuantileSummarizeGeoidLevel.py

  • perf fix for quantile_report_script

  • QuantileSummarizeGeoidLevel on Apache Spark

This commit includes a Python implementation of QuantileSummarizeGeoidLevel.R running on Apache Spark. The job essentially computes quantiles grouped by geoid and time whereby Spark provides the shuffle and quantile estimation mechanism to perform this aggregation efficiently. The job can be run locally within the container (fine for USA run but takes ~45mins on a r5.24xlarge) or distributed on Amazon EMR. This commit adds Spark and consequently Java inside the container.

  • add --name_filter to quantile_summarize_geoid_level as per feedback

  • Adjust quantile scripts so they all have the same interface

  • Fixed bug in both R scripts where num_files was set incorrectly
  • Adjust quantile_summarize_geoid_level.py to take scenarios (+ config file) versus path names as input to mimic the interface of the other scripts
  • Revert make_makefile.R to dev branch version

  • setup file for international countries

  • Fatiguing NPI

  • tested MVP

  • other implementation, maybe cleaner

  • update to hosp_run to take specified geoid-params

  • Added mild infections as output of hospitalization

  • minor

  • Hospitalization package update

  • dev setup

  • fixed rate

  • adding apl deployment to ecr

  • international seeding and setup files created

  • Update to report template docs for country reports

  • update to non-US scripts

  • update to international branch country setup

  • non-US setup Rmd and other scripts finished.

  • update

  • minor print edit

  • updates to script to make international functional with master

  • minor update to report and setup scripts

  • setup fix

  • non-us update

  • dev setup relative min

  • relative min ready

    1. Added integration tests for US and non-US create_seeding.R and build_US_setup.R/build_nonUS_setup.R
  1. create_seeding.R now has the option to choose "CSSE" or "USAFacts" for a US run.
  • Delete jhucsse_case_data_crude.csv

accidental data commit

  • vignette fix

  • Removed man folders from packages

  • fixes in the international branch before the merge

  • Do not update packages

  • Update covidImportation to v1.6.1

  • minor fix

  • fix non-US setup

  • Update local_install.R

  • Fix merge error

  • Reload covidImportation v1.6.1 to fix tidyverse dependency

  • seeding update with inputted incidence multiplier

  • minor names fix

  • Minor fixes to build_US and build_nonUS integration tests

  • deleted a comma

  • minor bug fix

  • Fix reversed international tag

  • fixed error message

  • fixed python error

  • minor

  • Adding updated severity parameters

  • fixing US seeding

  • adding print message

  • Update covidImportation with bug fix

  • minor update

  • Fix filter issue

  • integration testing fixes

  • Non-US makefile added. This should actually work fine for US as well. It also adds the ability to use the setup_name from the config to add a file prefix to model outputs, and then only clean those model outputs when running "make clean".

  • make_makefile.R now includes both US and non-US functionality

  • make_makefile white space fix

  • Add tictoc package to dev docker

  • Updated to fix a docker bug

Co-authored-by: Josh Wills [email protected]
Co-authored-by: jkamins7 [email protected]
Co-authored-by: kkintaro [email protected]
Co-authored-by: Kyra Grantz [email protected]
Co-authored-by: Sam Shah [email protected]
Co-authored-by: shauntruelove [email protected]
Co-authored-by: chadi [email protected]
Co-authored-by: hrmeredith12 [email protected]
Co-authored-by: Josh Wills [email protected]
Co-authored-by: Sam Shah [email protected]
Co-authored-by: Dave [email protected]
Co-authored-by: Shaun Truelove [email protected]

  • rename report.generation folder

  • update report.generation path in workflow test

Co-authored-by: Kyra Grantz [email protected]
Co-authored-by: juanderone [email protected]
Co-authored-by: Josh Wills [email protected]
Co-authored-by: jkamins7 [email protected]
Co-authored-by: kkintaro [email protected]
Co-authored-by: Sam Shah [email protected]
Co-authored-by: shauntruelove [email protected]
Co-authored-by: chadi [email protected]
Co-authored-by: hrmeredith12 [email protected]
Co-authored-by: Josh Wills [email protected]
Co-authored-by: Sam Shah [email protected]
Co-authored-by: Dave [email protected]
Co-authored-by: Shaun Truelove [email protected]

Co-authored-by: jkamins7 [email protected]
Co-authored-by: chadi [email protected]

  • readme file changes

  • change to latest docker image

  • dev image

  • make sensible load_config err + test

  • Updated docker file

  • Removed failing workflow

  • Removed more rstudio config from docker file

  • Removed more rstudio config from docker file

  • Removed outdated vignettes

  • Updated covidImportation version in docker

  • Updated packrat

Co-authored-by: Josh Wills [email protected]
Co-authored-by: kkintaro [email protected]
Co-authored-by: Sam Shah [email protected]
Co-authored-by: jkamins7 [email protected]
Co-authored-by: Joseph Lemaitre [email protected]
Co-authored-by: Josh Wills [email protected]
Co-authored-by: Sam Shah [email protected]
Co-authored-by: shauntruelove [email protected]
Co-authored-by: Dave [email protected]
Co-authored-by: Shaun Truelove [email protected]
Co-authored-by: Kyra Grantz [email protected]
Co-authored-by: juanderone [email protected]
Co-authored-by: hrmeredith12 [email protected]

* Add an environment variable that can be used for writing uniquely named output files across blocks of jobs from AWS batch

* RStudio in the Docker container

RStudio is now available in the Docker container, which allows development and EDA with the same set of packages as is run in production.

* Update covidImportation package to v1.6 (#10)

* Update covidImportation package to v1.6 (#250)

* Adding final form of previous packrat + docker setup after merging weirdness

* Switching .so to git lfs

* Updated indexing in simulations and hospitalization

* Added better indexing for hospitalization

* Add ability to reduce alpha, sigma, and gamma (#241)

* Add the ability to reduce multiple parameters

* Add Reduce scenario template to test_simple and documentation

* minor bug test fix

* Minor bugs

Co-authored-by: Joseph Lemaitre <[email protected]>

* Move the spatial setup outside of the scenarios loop since it's expensive to load and doesn't change per scenario.

* Removing source for packages installable from cran

* Updated the python rules for reticulate (tests still pass)

* Removing source for packages installable from cran

* Updated the python rules for reticulate (tests still pass)

* Updated based on review

* Fixed filter issues with makefile setup in case dynfilter isn't provided in config

* Updated makefile

* Reduce hospitalization memory pressure

Switch a critical split-apply-combine away from `do.call()`, which results in a 45% reduction in memory usage and a 35% speedup in execution time in my testing.

* Packrat (#253)

* Adding final form of previous packrat + docker setup after merging weirdness

* Switching .so to git lfs

* Removing source for packages installable from cran

* Updated the python rules for reticulate (tests still pass)

* Updated based on review

* Updated to use dev's docker instead of dataseed's

* Added reticulate zoo and xts

* Updated docker with git-lfs

* Packrat (#267)

* Adding final form of previous packrat + docker setup after merging weirdness

* Switching .so to git lfs

* Removing source for packages installable from cran

* Updated the python rules for reticulate (tests still pass)

* Updated based on review

* Updated to use dev's docker instead of dataseed's

* Added reticulate zoo and xts

* Updated docker with git-lfs

* Updating docker to install current versions of local packages

* Update .Rprofile

* Update dockerhub.yaml

* Update aws.yaml

* Yet another packrat attempt

* Update ci.yml

* Generic version of the batch job launcher/runner (#257)

* Generic version of batch from the union of jwills_dfU_run and dataseed_batch2

* Fixes from running stuff on some test jobs

* Add a vcpu CLI option and update sims_per_job to refer to slots per job

Co-authored-by: jkamins7 <[email protected]>

* Reduce SEIR startup costs (#273)

* 60% speedup in one run SEIR performance

The biggest cost in a single sim SEIR run was importation of Numba and the JIT compilation. Change this to compile ahead of time, which results in a nice 60% lift in one run SEIR performance by saving these startup costs---which will be valuable for our large inference runs.

Minor performance benefit when running many simulations as JIT costs are amortized away.

```
Benchmark #1: single sim JIT compilation (current)
  Time (mean ± σ):     13.429 s ±  0.537 s
  Range (min … max):   12.973 s … 14.867 s    100 runs

Benchmark #2: single sim AOT compilation (new)
  Time (mean ± σ):      5.129 s ±  0.125 s
  Range (min … max):    4.901 s …  5.364 s    100 runs
```

* Add Python build directory to .gitignore

* Integrate build_US_setup into pipeline and... (#271)

* Add hard-coded territory data to build_US_setup

* Create csv of island area census data since it cannot be accessed by API

* Change the report targets to follow the conventions of make_makefile

* Integrate build_US_setup into pipeline

* Some bug fixes

* git lfs pull of commute_data.csv and switch docker image

* Update ci.yml

* Update ci.yml

* Remove generated files

* Update make_makefile.R

* Update run_tests.py

* pull census year from config

* Use census year from config to build_US_setup

* Update build_US_setup.R

Co-authored-by: eclee25 <[email protected]>

* Add check to hospitalization that geodata geoids are in geoid-params.csv (#283)

* added state level script for creating csv reporting out quantiles

* Fixed a slight bug with static dates and added full geographic extent version of the quantile generation script

* Added countylevel script

* Varios fixes and updates to post run summarization scripts.

* Integrate QuantileSummarizeGeoExtent.R into pipeline (untested)

* Integrate QuantileSummarizeGeoExtent.R into pipeline

* Create QuantileSummarizeGeoidLevel.py

* Working on the python script

* Integrate quantile scripts into Makefile

* Delete QuantileSummarizeGeoidLevel.py

* perf fix for quantile_report_script

* QuantileSummarizeGeoidLevel on Apache Spark

This commit includes a Python implementation of `QuantileSummarizeGeoidLevel.R` running on Apache Spark. The job essentially computes quantiles grouped by geoid and time whereby Spark provides the shuffle and quantile estimation mechanism to perform this aggregation efficiently. The job can be run locally within the container (fine for USA run but takes ~45mins on a r5.24xlarge) or distributed on Amazon EMR. This commit adds Spark and consequently Java inside the container.

* add `--name_filter` to quantile_summarize_geoid_level as per feedback

* Adjust quantile scripts so they all have the same interface

- Fixed bug in both R scripts where `num_files` was set incorrectly
- Adjust quantile_summarize_geoid_level.py to take scenarios (+ config file) versus path names as input to mimic the interface of the other scripts

* Revert make_makefile.R to dev branch version

* setup file for international countries

* Fatiguing NPI

* tested MVP

* other implementation, maybe cleaner

* update to hosp_run to take specified geoid-params

* Added mild infections as output of hospitalization

* minor

* Hospitalization package update

* dev setup

* fixed rate

* adding apl deployment to ecr

* international seeding and setup files created

* Update to report template docs for country reports

* update to non-US scripts

* update to international branch country setup

* non-US setup Rmd and other scripts finished.

* update

* minor print edit

* updates to script to make international functional with master

* minor update to report and setup scripts

* setup fix

* non-us update

* dev setup relative min

* relative min ready

* 1. Added integration tests for US and non-US create_seeding.R and build_US_setup.R/build_nonUS_setup.R

2. create_seeding.R now has the option to choose "CSSE" or "USAFacts" for a US run.

* Delete jhucsse_case_data_crude.csv

accidental data commit

* vignette fix

* Removed man folders from packages

* fixes in the international branch before the merge

* Do not update packages

* Update covidImportation to v1.6.1

* minor fix

* fix non-US setup

* Update local_install.R

* Fix merge error

* Reload covidImportation v1.6.1 to fix tidyverse dependency

* seeding update with inputted incidence multiplier

* minor names fix

* Minor fixes to build_US and build_nonUS integration tests

* deleted a comma

* minor bug fix

* Fix reversed international tag

* fixed error message

* fixed python error

* minor

* Adding updated severity parameters

* fixing US seeding

* adding print message

* Update covidImportation with bug fix

* minor update

* Fix filter issue

* integration testing fixes

* Non-US makefile added. This should actually work fine for US as well. It also adds the ability to use the setup_name from the config to add a file prefix to model outputs, and then only clean those model outputs when running "make clean".

* make_makefile.R now includes both US and non-US  functionality

* make_makefile white space fix

* Add tictoc package to dev docker

* Updated to fix a docker bug

* Report devel2 into dev (#352)

* updates to state template

* fix load_cum_inf_geounit_dates to use hosp only

* add hosp method chunks from report_devel

* adding generic mapping function

* removing grouping by time for appropriate cumsum in load_cum_inf

* fixing error in load_cum_inf

* add ventilator to scenario tbl

* add warning about loading infections from hosp data

* deprecate old functions, integration testing temp

* recreating clean NAMESPACE to remove export of setup_testing_environment preventing pkg install

* adding sim_num before post_process in load_hosp_sims_filtered for output that does not contain sim_num but requires it for post-processing

* adding warning about variable name to load_hosp_geounit_threshold

* moving make_excess_heatmap to deprecated functions

* prep report_devel2 for dev merge (#351)

* Version with pyarrow included

* Dependencies for arrow in R as well

* Fixed check_model script

* Updated for feather integration

* Updated test cases since `n` is reserved in yml

* adding make_excess_heatmap function for hosp outcomes

* Fixing parallelization mistake

* Minor fixes

- Use the "optimize" covidImportation version
- Always upgrade local packages if upgrade available (vs silently ignore)
- check_model_reports should ensure axis are dates

* new figure relative to threshold heatmap

* Update importation.R to match covidImportation package updates

* Updated model code to use the new covidImportation package, and also seed to E instead of I (and keep population fixed

* Fixed typo

* Final fix to avoid numba

* Fixed path to install_local script

* Added package

* Fixed seeding creation

* rm NAs and fix create_seeding.R

* add new cum hosp/deaths check to check_models scr

* update indexes in check model script

* long form mobility

* Update reference to geoid-params.csv inside of hosp_run.R

* 10x seeding file

* Write the npi when writing parquet output

* template

* report after simulation

* Removed geodata read from hosp_run.R since it's not being used

* Updated things that feed into mobility

* Updated build_US_setup.R to account for the move

* These files got removed in a previous commit

* Removing unused (as far as I can tell anyway) data

* Fix bug when the places are also a number

* Changing back test cases to use size/prob instead of n/p

* Updated name to pass checks on case sensitive OS

* Updated to use file_extension argument`

* Fix broken tests, though I recommend we eliminate the mean and var checks since they'll be flaky

* Updated build_US_setup.R to work with the current setup

* Renamed parameters to avoid confusion; print out simid as 9 digits

SEIR and hospitalization phases have more standardized file format

* read parquet file times correctly

* Revert "read parquet file times correctly"

This reverts commit 521dd25.

* parquet date fixes (#207)

Co-authored-by: hrmeredith12 <[email protected]>

* Report devel (#208)

* fix unit test code

* fix unit test for real

* fix unit tests

* adding ability to filter geoids in relative heatmap function

* adding template for county-specific report for a given state

* lower tolerance for distribution tests

* planning_models chunk

* planning scenario chunk

* add names to dev team

Co-authored-by: eclee25 <[email protected]>
Co-authored-by: Kyra Grantz <[email protected]>
Co-authored-by: hrmeredith12 <[email protected]>

* Adding Javier (#210)

Co-authored-by: hrmeredith12 <[email protected]>
Co-authored-by: Elizabeth Lee <[email protected]>

* Delete build-model-input.R (#217)

* Dataseed merge (#215)

* Adding Javier

* Adding commute data back in

* rm fixed param and comment out bad plot

* commit namesapce report gen

* fix NVentCurr name

* formatting changes to county report template, removing defaults that should be modified for each report

* adding references for county report template

* change importation seeding

* table formatting

* limitations chunk considering age specific hosp calculations

* removing build_hospdeath_geoid_par - old version not used in hosprun.R

* removing legacy hospitalization scripts. everything runs through hosp_run.R now

* using current default durations to minimize confusion

Co-authored-by: hrmeredith12 <[email protected]>
Co-authored-by: Elizabeth Lee <[email protected]>
Co-authored-by: Kyra Grantz <[email protected]>

* Removing config.yml and changing the variable name in create_seeding to be truthful. (#219)

* Fixed the low in followup issue (#224)

* Fixed the low in followup issue

* Adding initial ^

* adding county report template yaml (#221)

Co-authored-by: jkamins7 <[email protected]>

* Fix load-bearing typo (#225)

* Fix load-bearing typo

* pretty sure it's supposed to be this

Co-authored-by: Josh Wills <[email protected]>
Co-authored-by: kkintaro <[email protected]>

* Add an environment variable that can be used for writing uniquely named output files across blocks of jobs from AWS batch

* fix for 1 scenario (#230)

Co-authored-by: Elizabeth Lee <[email protected]>

* RStudio in the Docker container

RStudio is now available in the Docker container, which allows development and EDA with the same set of packages as is run in production.

* Update covidImportation package to v1.6 (#10)

* Update covidImportation package to v1.6 (#250)

* Adding final form of previous packrat + docker setup after merging weirdness

* Switching .so to git lfs

* Updated indexing in simulations and hospitalization

* Added better indexing for hospitalization

* Add ability to reduce alpha, sigma, and gamma (#241)

* Add the ability to reduce multiple parameters

* Add Reduce scenario template to test_simple and documentation

* minor bug test fix

* Minor bugs

Co-authored-by: Joseph Lemaitre <[email protected]>

* Move the spatial setup outside of the scenarios loop since it's expensive to load and doesn't change per scenario.

* Removing source for packages installable from cran

* Updated the python rules for reticulate (tests still pass)

* Removing source for packages installable from cran

* Updated the python rules for reticulate (tests still pass)

* Updated based on review

* Fixed filter issues with makefile setup in case dynfilter isn't provided in config

* Updated makefile

* Reduce hospitalization memory pressure

Switch a critical split-apply-combine away from `do.call()`, which results in a 45% reduction in memory usage and a 35% speedup in execution time in my testing.

* Packrat (#253)

* Adding final form of previous packrat + docker setup after merging weirdness

* Switching .so to git lfs

* Removing source for packages installable from cran

* Updated the python rules for reticulate (tests still pass)

* Updated based on review

* Updated to use dev's docker instead of dataseed's

* Added reticulate zoo and xts

* Updated docker with git-lfs

* Packrat (#267)

* Adding final form of previous packrat + docker setup after merging weirdness

* Switching .so to git lfs

* Removing source for packages installable from cran

* Updated the python rules for reticulate (tests still pass)

* Updated based on review

* Updated to use dev's docker instead of dataseed's

* Added reticulate zoo and xts

* Updated docker with git-lfs

* Updating docker to install current versions of local packages

* Update .Rprofile

* Update dockerhub.yaml

* Update aws.yaml

* Yet another packrat attempt

* Update ci.yml

* Generic version of the batch job launcher/runner (#257)

* Generic version of batch from the union of jwills_dfU_run and dataseed_batch2

* Fixes from running stuff on some test jobs

* Add a vcpu CLI option and update sims_per_job to refer to slots per job

Co-authored-by: jkamins7 <[email protected]>

* changing covidImportation tag to 1.6.1

* Reduce SEIR startup costs (#273)

* 60% speedup in one run SEIR performance

The biggest cost in a single sim SEIR run was importation of Numba and the JIT compilation. Change this to compile ahead of time, which results in a nice 60% lift in one run SEIR performance by saving these startup costs---which will be valuable for our large inference runs.

Minor performance benefit when running many simulations as JIT costs are amortized away.

```
Benchmark #1: single sim JIT compilation (current)
  Time (mean ± σ):     13.429 s ±  0.537 s
  Range (min … max):   12.973 s … 14.867 s    100 runs

Benchmark #2: single sim AOT compilation (new)
  Time (mean ± σ):      5.129 s ±  0.125 s
  Range (min … max):    4.901 s …  5.364 s    100 runs
```

* Add Python build directory to .gitignore

* Integrate build_US_setup into pipeline and... (#271)

* Add hard-coded territory data to build_US_setup

* Create csv of island area census data since it cannot be accessed by API

* Change the report targets to follow the conventions of make_makefile

* Integrate build_US_setup into pipeline

* Some bug fixes

* git lfs pull of commute_data.csv and switch docker image

* Update ci.yml

* Update ci.yml

* Remove generated files

* Update make_makefile.R

* Update run_tests.py

* pull census year from config

* Use census year from config to build_US_setup

* Update build_US_setup.R

Co-authored-by: eclee25 <[email protected]>

* Add check to hospitalization that geodata geoids are in geoid-params.csv (#283)

* added state level script for creating csv reporting out quantiles

* Fixed a slight bug with static dates and added full geographic extent version of the quantile generation script

* Added countylevel script

* Varios fixes and updates to post run summarization scripts.

* Integrate QuantileSummarizeGeoExtent.R into pipeline (untested)

* Integrate QuantileSummarizeGeoExtent.R into pipeline

* Create QuantileSummarizeGeoidLevel.py

* Working on the python script

* Integrate quantile scripts into Makefile

* Delete QuantileSummarizeGeoidLevel.py

* perf fix for quantile_report_script

* QuantileSummarizeGeoidLevel on Apache Spark

This commit includes a Python implementation of `QuantileSummarizeGeoidLevel.R` running on Apache Spark. The job essentially computes quantiles grouped by geoid and time whereby Spark provides the shuffle and quantile estimation mechanism to perform this aggregation efficiently. The job can be run locally within the container (fine for USA run but takes ~45mins on a r5.24xlarge) or distributed on Amazon EMR. This commit adds Spark and consequently Java inside the container.

* add `--name_filter` to quantile_summarize_geoid_level as per feedback

* Adjust quantile scripts so they all have the same interface

- Fixed bug in both R scripts where `num_files` was set incorrectly
- Adjust quantile_summarize_geoid_level.py to take scenarios (+ config file) versus path names as input to mimic the interface of the other scripts

* Revert make_makefile.R to dev branch version

* setup file for international countries

* Fatiguing NPI

* tested MVP

* other implementation, maybe cleaner

* update to hosp_run to take specified geoid-params

* Added mild infections as output of hospitalization

* minor

* Hospitalization package update

* dev setup

* fixed rate

* adding apl deployment to ecr

* international seeding and setup files created

* Update to report template docs for country reports

* update to non-US scripts

* update to international branch country setup

* non-US setup Rmd and other scripts finished.

* update

* minor print edit

* updates to script to make international functional with master

* minor update to report and setup scripts

* setup fix

* non-us update

* dev setup relative min

* relative min ready

* 1. Added integration tests for US and non-US create_seeding.R and build_US_setup.R/build_nonUS_setup.R

2. create_seeding.R now has the option to choose "CSSE" or "USAFacts" for a US run.

* Delete jhucsse_case_data_crude.csv

accidental data commit

* vignette fix

* Removed man folders from packages

* fixes in the international branch before the merge

* Do not update packages

* Update covidImportation to v1.6.1

* minor fix

* fix non-US setup

* Update local_install.R

* Fix merge error

* Reload covidImportation v1.6.1 to fix tidyverse dependency

* seeding update with inputted incidence multiplier

* minor names fix

* Minor fixes to build_US and build_nonUS integration tests

* deleted a comma

* minor bug fix

* Fix reversed international tag

* fixed error message

* fixed python error

* minor

* Adding updated severity parameters

* fixing US seeding

* adding print message

* Update covidImportation with bug fix

* minor update

* Fix filter issue

* integration testing fixes

* Non-US makefile added. This should actually work fine for US as well. It also adds the ability to use the setup_name from the config to add a file prefix to model outputs, and then only clean those model outputs when running "make clean".

* make_makefile.R now includes both US and non-US  functionality

* make_makefile white space fix

* Add tictoc package to dev docker

* Updated to fix a docker bug

Co-authored-by: Josh Wills <[email protected]>
Co-authored-by: jkamins7 <[email protected]>
Co-authored-by: kkintaro <[email protected]>
Co-authored-by: Kyra Grantz <[email protected]>
Co-authored-by: Sam Shah <[email protected]>
Co-authored-by: shauntruelove <[email protected]>
Co-authored-by: chadi <[email protected]>
Co-authored-by: hrmeredith12 <[email protected]>
Co-authored-by: Josh Wills <[email protected]>
Co-authored-by: Sam Shah <[email protected]>
Co-authored-by: Dave <[email protected]>
Co-authored-by: Shaun Truelove <[email protected]>

* rename report.generation folder

* update report.generation path in workflow test

Co-authored-by: Kyra Grantz <[email protected]>
Co-authored-by: juanderone <[email protected]>
Co-authored-by: Josh Wills <[email protected]>
Co-authored-by: jkamins7 <[email protected]>
Co-authored-by: kkintaro <[email protected]>
Co-authored-by: Sam Shah <[email protected]>
Co-authored-by: shauntruelove <[email protected]>
Co-authored-by: chadi <[email protected]>
Co-authored-by: hrmeredith12 <[email protected]>
Co-authored-by: Josh Wills <[email protected]>
Co-authored-by: Sam Shah <[email protected]>
Co-authored-by: Dave <[email protected]>
Co-authored-by: Shaun Truelove <[email protected]>

* configurable delay and ratio for seeding

* seeding file extra comma

* change path to report.generation

* rm double parens

* Dev make (#358)

* make_makefile - rm filter and add seeding & intl

* add parens

* typo

* Removed filter from tests

* fix parens issue

* fixes #338 by raising an error

* fixes #339 by raising an error

* better and correct message

* bugfixes

* better presentation

* consistency accross messages

* accidently deleted some test, putting them back

* newlines

* Updated make_makefile.R to pass tests multiple times in a row

* integ test 2x, update local install

* try to fix 2x integ test

* rm unnecessary chdir

* fix typo in aws apl workflow

Co-authored-by: jkamins7 <[email protected]>
Co-authored-by: chadi <[email protected]>

* readme file changes

* change to latest docker image

* dev image

* make sensible load_config err + test

* Updated docker file

* Removed failing workflow

* Removed more rstudio config from docker file

* Removed more rstudio config from docker file

* Removed outdated vignettes

* Updated covidImportation version in docker

* Updated packrat

Co-authored-by: Josh Wills <[email protected]>
Co-authored-by: kkintaro <[email protected]>
Co-authored-by: Sam Shah <[email protected]>
Co-authored-by: jkamins7 <[email protected]>
Co-authored-by: Joseph Lemaitre <[email protected]>
Co-authored-by: Josh Wills <[email protected]>
Co-authored-by: Sam Shah <[email protected]>
Co-authored-by: shauntruelove <[email protected]>
Co-authored-by: Dave <[email protected]>
Co-authored-by: Shaun Truelove <[email protected]>
Co-authored-by: Kyra Grantz <[email protected]>
Co-authored-by: juanderone <[email protected]>
Co-authored-by: hrmeredith12 <[email protected]>
@jcblemai
Copy link
Collaborator

jcblemai commented Mar 7, 2022

Does this squash the commits ?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants