Upstream master -> dev (this seems hard for some reason) #431

jkamins7 · 2021-08-06T15:18:35Z

Add an environment variable that can be used for writing uniquely named output files across blocks of jobs from AWS batch
RStudio in the Docker container

RStudio is now available in the Docker container, which allows development and EDA with the same set of packages as is run in production.

Update covidImportation package to v1.6 (Merge dataseed into master (at least most of it) #10)
Update covidImportation package to v1.6 (Update covidImportation package to v1.6 #250)
Adding final form of previous packrat + docker setup after merging weirdness
Switching .so to git lfs
Updated indexing in simulations and hospitalization
Added better indexing for hospitalization
Add ability to reduce alpha, sigma, and gamma (Add ability to reduce alpha, sigma, and gamma #241)
Add the ability to reduce multiple parameters
Add Reduce scenario template to test_simple and documentation
minor bug test fix
Minor bugs

Co-authored-by: Joseph Lemaitre [email protected]

Move the spatial setup outside of the scenarios loop since it's expensive to load and doesn't change per scenario.
Removing source for packages installable from cran
Updated the python rules for reticulate (tests still pass)
Removing source for packages installable from cran
Updated the python rules for reticulate (tests still pass)
Updated based on review
Fixed filter issues with makefile setup in case dynfilter isn't provided in config
Updated makefile
Reduce hospitalization memory pressure

Switch a critical split-apply-combine away from do.call(), which results in a 45% reduction in memory usage and a 35% speedup in execution time in my testing.

Packrat (Packrat #253)
Adding final form of previous packrat + docker setup after merging weirdness
Switching .so to git lfs
Removing source for packages installable from cran
Updated the python rules for reticulate (tests still pass)
Updated based on review
Updated to use dev's docker instead of dataseed's
Added reticulate zoo and xts
Updated docker with git-lfs
Packrat (Packrat #267)
Adding final form of previous packrat + docker setup after merging weirdness
Switching .so to git lfs
Removing source for packages installable from cran
Updated the python rules for reticulate (tests still pass)
Updated based on review
Updated to use dev's docker instead of dataseed's
Added reticulate zoo and xts
Updated docker with git-lfs
Updating docker to install current versions of local packages
Update .Rprofile
Update dockerhub.yaml
Update aws.yaml
Yet another packrat attempt
Update ci.yml
Generic version of the batch job launcher/runner (Generic version of the batch job launcher/runner #257)
Generic version of batch from the union of jwills_dfU_run and dataseed_batch2
Fixes from running stuff on some test jobs
Add a vcpu CLI option and update sims_per_job to refer to slots per job

Co-authored-by: jkamins7 [email protected]

Reduce SEIR startup costs (Reduce SEIR startup costs #273)
60% speedup in one run SEIR performance

The biggest cost in a single sim SEIR run was importation of Numba and the JIT compilation. Change this to compile ahead of time, which results in a nice 60% lift in one run SEIR performance by saving these startup costs---which will be valuable for our large inference runs.

Minor performance benefit when running many simulations as JIT costs are amortized away.

Benchmark #1: single sim JIT compilation (current)
  Time (mean ± σ):     13.429 s ±  0.537 s
  Range (min … max):   12.973 s … 14.867 s    100 runs

Benchmark #2: single sim AOT compilation (new)
  Time (mean ± σ):      5.129 s ±  0.125 s
  Range (min … max):    4.901 s …  5.364 s    100 runs

Add Python build directory to .gitignore
Integrate build_US_setup into pipeline and... (Integrate build_US_setup into pipeline and... #271)
Add hard-coded territory data to build_US_setup
Create csv of island area census data since it cannot be accessed by API
Change the report targets to follow the conventions of make_makefile
Integrate build_US_setup into pipeline
Some bug fixes
git lfs pull of commute_data.csv and switch docker image
Update ci.yml
Update ci.yml
Remove generated files
Update make_makefile.R
Update run_tests.py
pull census year from config
Use census year from config to build_US_setup
Update build_US_setup.R

Co-authored-by: eclee25 [email protected]

Add check to hospitalization that geodata geoids are in geoid-params.csv (Add check to hospitalization that geodata geoids are in geoid-params.csv #283)
added state level script for creating csv reporting out quantiles
Fixed a slight bug with static dates and added full geographic extent version of the quantile generation script
Added countylevel script
Varios fixes and updates to post run summarization scripts.
Integrate QuantileSummarizeGeoExtent.R into pipeline (untested)
Integrate QuantileSummarizeGeoExtent.R into pipeline
Create QuantileSummarizeGeoidLevel.py
Working on the python script
Integrate quantile scripts into Makefile
Delete QuantileSummarizeGeoidLevel.py
perf fix for quantile_report_script
QuantileSummarizeGeoidLevel on Apache Spark

This commit includes a Python implementation of QuantileSummarizeGeoidLevel.R running on Apache Spark. The job essentially computes quantiles grouped by geoid and time whereby Spark provides the shuffle and quantile estimation mechanism to perform this aggregation efficiently. The job can be run locally within the container (fine for USA run but takes ~45mins on a r5.24xlarge) or distributed on Amazon EMR. This commit adds Spark and consequently Java inside the container.

add --name_filter to quantile_summarize_geoid_level as per feedback
Adjust quantile scripts so they all have the same interface

Fixed bug in both R scripts where num_files was set incorrectly
Adjust quantile_summarize_geoid_level.py to take scenarios (+ config file) versus path names as input to mimic the interface of the other scripts

Revert make_makefile.R to dev branch version
setup file for international countries
Fatiguing NPI
tested MVP
other implementation, maybe cleaner
update to hosp_run to take specified geoid-params
Added mild infections as output of hospitalization
minor
Hospitalization package update
dev setup
fixed rate
adding apl deployment to ecr
international seeding and setup files created
Update to report template docs for country reports
update to non-US scripts
update to international branch country setup
non-US setup Rmd and other scripts finished.
update
minor print edit
updates to script to make international functional with master
minor update to report and setup scripts
setup fix
non-us update
dev setup relative min
relative min ready
1. Added integration tests for US and non-US create_seeding.R and build_US_setup.R/build_nonUS_setup.R

create_seeding.R now has the option to choose "CSSE" or "USAFacts" for a US run.

Delete jhucsse_case_data_crude.csv

accidental data commit

vignette fix
Removed man folders from packages
fixes in the international branch before the merge
Do not update packages
Update covidImportation to v1.6.1
minor fix
fix non-US setup
Update local_install.R
Fix merge error
Reload covidImportation v1.6.1 to fix tidyverse dependency
seeding update with inputted incidence multiplier
minor names fix
Minor fixes to build_US and build_nonUS integration tests
deleted a comma
minor bug fix
Fix reversed international tag
fixed error message
fixed python error
minor
Adding updated severity parameters
fixing US seeding
adding print message
Update covidImportation with bug fix
minor update
Fix filter issue
integration testing fixes
Non-US makefile added. This should actually work fine for US as well. It also adds the ability to use the setup_name from the config to add a file prefix to model outputs, and then only clean those model outputs when running "make clean".
make_makefile.R now includes both US and non-US functionality
make_makefile white space fix
Add tictoc package to dev docker
Updated to fix a docker bug
Report devel2 into dev (Report devel2 into dev #352)
updates to state template
fix load_cum_inf_geounit_dates to use hosp only
add hosp method chunks from report_devel
adding generic mapping function
removing grouping by time for appropriate cumsum in load_cum_inf
fixing error in load_cum_inf
add ventilator to scenario tbl
add warning about loading infections from hosp data
deprecate old functions, integration testing temp
recreating clean NAMESPACE to remove export of setup_testing_environment preventing pkg install
adding sim_num before post_process in load_hosp_sims_filtered for output that does not contain sim_num but requires it for post-processing
adding warning about variable name to load_hosp_geounit_threshold
moving make_excess_heatmap to deprecated functions
prep report_devel2 for dev merge (prep report_devel2 for dev merge #351)
Version with pyarrow included
Dependencies for arrow in R as well
Fixed check_model script
Updated for feather integration
Updated test cases since n is reserved in yml
adding make_excess_heatmap function for hosp outcomes
Fixing parallelization mistake
Minor fixes

Use the "optimize" covidImportation version
Always upgrade local packages if upgrade available (vs silently ignore)
check_model_reports should ensure axis are dates

new figure relative to threshold heatmap
Update importation.R to match covidImportation package updates
Updated model code to use the new covidImportation package, and also seed to E instead of I (and keep population fixed
Fixed typo
Final fix to avoid numba
Fixed path to install_local script
Added package
Fixed seeding creation
rm NAs and fix create_seeding.R
add new cum hosp/deaths check to check_models scr
update indexes in check model script
long form mobility
Update reference to geoid-params.csv inside of hosp_run.R
10x seeding file
Write the npi when writing parquet output
template
report after simulation
Removed geodata read from hosp_run.R since it's not being used
Updated things that feed into mobility
Updated build_US_setup.R to account for the move
These files got removed in a previous commit
Removing unused (as far as I can tell anyway) data
Fix bug when the places are also a number
Changing back test cases to use size/prob instead of n/p
Updated name to pass checks on case sensitive OS
Updated to use file_extension argument`
Fix broken tests, though I recommend we eliminate the mean and var checks since they'll be flaky
Updated build_US_setup.R to work with the current setup
Renamed parameters to avoid confusion; print out simid as 9 digits

SEIR and hospitalization phases have more standardized file format

read parquet file times correctly
Revert "read parquet file times correctly"

This reverts commit 521dd25.

parquet date fixes (parquet date fixes #207)

Co-authored-by: hrmeredith12 [email protected]

Report devel (Report devel #208)
fix unit test code
fix unit test for real
fix unit tests
adding ability to filter geoids in relative heatmap function
adding template for county-specific report for a given state
lower tolerance for distribution tests
planning_models chunk
planning scenario chunk
add names to dev team

Co-authored-by: eclee25 [email protected]
Co-authored-by: Kyra Grantz [email protected]
Co-authored-by: hrmeredith12 [email protected]

Adding Javier (Dataseed merge #210)

Co-authored-by: hrmeredith12 [email protected]
Co-authored-by: Elizabeth Lee [email protected]

Delete build-model-input.R (Delete build-model-input.R #217)
Dataseed merge (Dataseed merge #215)
Adding Javier
Adding commute data back in
rm fixed param and comment out bad plot
commit namesapce report gen
fix NVentCurr name
formatting changes to county report template, removing defaults that should be modified for each report
adding references for county report template
change importation seeding
table formatting
limitations chunk considering age specific hosp calculations
removing build_hospdeath_geoid_par - old version not used in hosprun.R
removing legacy hospitalization scripts. everything runs through hosp_run.R now
using current default durations to minimize confusion

Co-authored-by: hrmeredith12 [email protected]
Co-authored-by: Elizabeth Lee [email protected]
Co-authored-by: Kyra Grantz [email protected]

Removing config.yml and changing the variable name in create_seeding to be truthful. (Removing config.yml and changing the variable name in create_seeding … #219)
Fixed the low in followup issue (Fixed the low in followup issue #224)
Fixed the low in followup issue
Adding initial ^
adding county report template yaml (adding county report template yaml #221)

Co-authored-by: jkamins7 [email protected]

Fix load-bearing typo (Fix load-bearing typo #225)
Fix load-bearing typo
pretty sure it's supposed to be this

Co-authored-by: Josh Wills [email protected]
Co-authored-by: kkintaro [email protected]

Add an environment variable that can be used for writing uniquely named output files across blocks of jobs from AWS batch
fix for 1 scenario (Fix in case there is only one scenario to plot... #230)

Co-authored-by: Elizabeth Lee [email protected]

RStudio in the Docker container

RStudio is now available in the Docker container, which allows development and EDA with the same set of packages as is run in production.

Update covidImportation package to v1.6 (Merge dataseed into master (at least most of it) #10)
Update covidImportation package to v1.6 (Update covidImportation package to v1.6 #250)
Adding final form of previous packrat + docker setup after merging weirdness
Switching .so to git lfs
Updated indexing in simulations and hospitalization
Added better indexing for hospitalization
Add ability to reduce alpha, sigma, and gamma (Add ability to reduce alpha, sigma, and gamma #241)
Add the ability to reduce multiple parameters
Add Reduce scenario template to test_simple and documentation
minor bug test fix
Minor bugs

Co-authored-by: Joseph Lemaitre [email protected]

Move the spatial setup outside of the scenarios loop since it's expensive to load and doesn't change per scenario.
Removing source for packages installable from cran
Updated the python rules for reticulate (tests still pass)
Removing source for packages installable from cran
Updated the python rules for reticulate (tests still pass)
Updated based on review
Fixed filter issues with makefile setup in case dynfilter isn't provided in config
Updated makefile
Reduce hospitalization memory pressure

Switch a critical split-apply-combine away from do.call(), which results in a 45% reduction in memory usage and a 35% speedup in execution time in my testing.

Packrat (Packrat #253)
Adding final form of previous packrat + docker setup after merging weirdness
Switching .so to git lfs
Removing source for packages installable from cran
Updated the python rules for reticulate (tests still pass)
Updated based on review
Updated to use dev's docker instead of dataseed's
Added reticulate zoo and xts
Updated docker with git-lfs
Packrat (Packrat #267)
Adding final form of previous packrat + docker setup after merging weirdness
Switching .so to git lfs
Removing source for packages installable from cran
Updated the python rules for reticulate (tests still pass)
Updated based on review
Updated to use dev's docker instead of dataseed's
Added reticulate zoo and xts
Updated docker with git-lfs
Updating docker to install current versions of local packages
Update .Rprofile
Update dockerhub.yaml
Update aws.yaml
Yet another packrat attempt
Update ci.yml
Generic version of the batch job launcher/runner (Generic version of the batch job launcher/runner #257)
Generic version of batch from the union of jwills_dfU_run and dataseed_batch2
Fixes from running stuff on some test jobs
Add a vcpu CLI option and update sims_per_job to refer to slots per job

Co-authored-by: jkamins7 [email protected]

changing covidImportation tag to 1.6.1
Reduce SEIR startup costs (Reduce SEIR startup costs #273)
60% speedup in one run SEIR performance

The biggest cost in a single sim SEIR run was importation of Numba and the JIT compilation. Change this to compile ahead of time, which results in a nice 60% lift in one run SEIR performance by saving these startup costs---which will be valuable for our large inference runs.

Minor performance benefit when running many simulations as JIT costs are amortized away.

Benchmark #1: single sim JIT compilation (current)
  Time (mean ± σ):     13.429 s ±  0.537 s
  Range (min … max):   12.973 s … 14.867 s    100 runs

Benchmark #2: single sim AOT compilation (new)
  Time (mean ± σ):      5.129 s ±  0.125 s
  Range (min … max):    4.901 s …  5.364 s    100 runs

Add Python build directory to .gitignore
Integrate build_US_setup into pipeline and... (Integrate build_US_setup into pipeline and... #271)
Add hard-coded territory data to build_US_setup
Create csv of island area census data since it cannot be accessed by API
Change the report targets to follow the conventions of make_makefile
Integrate build_US_setup into pipeline
Some bug fixes
git lfs pull of commute_data.csv and switch docker image
Update ci.yml
Update ci.yml
Remove generated files
Update make_makefile.R
Update run_tests.py
pull census year from config
Use census year from config to build_US_setup
Update build_US_setup.R

Co-authored-by: eclee25 [email protected]

Add check to hospitalization that geodata geoids are in geoid-params.csv (Add check to hospitalization that geodata geoids are in geoid-params.csv #283)
added state level script for creating csv reporting out quantiles
Fixed a slight bug with static dates and added full geographic extent version of the quantile generation script
Added countylevel script
Varios fixes and updates to post run summarization scripts.
Integrate QuantileSummarizeGeoExtent.R into pipeline (untested)
Integrate QuantileSummarizeGeoExtent.R into pipeline
Create QuantileSummarizeGeoidLevel.py
Working on the python script
Integrate quantile scripts into Makefile
Delete QuantileSummarizeGeoidLevel.py
perf fix for quantile_report_script
QuantileSummarizeGeoidLevel on Apache Spark

This commit includes a Python implementation of QuantileSummarizeGeoidLevel.R running on Apache Spark. The job essentially computes quantiles grouped by geoid and time whereby Spark provides the shuffle and quantile estimation mechanism to perform this aggregation efficiently. The job can be run locally within the container (fine for USA run but takes ~45mins on a r5.24xlarge) or distributed on Amazon EMR. This commit adds Spark and consequently Java inside the container.

add --name_filter to quantile_summarize_geoid_level as per feedback
Adjust quantile scripts so they all have the same interface

Fixed bug in both R scripts where num_files was set incorrectly
Adjust quantile_summarize_geoid_level.py to take scenarios (+ config file) versus path names as input to mimic the interface of the other scripts

Revert make_makefile.R to dev branch version
setup file for international countries
Fatiguing NPI
tested MVP
other implementation, maybe cleaner
update to hosp_run to take specified geoid-params
Added mild infections as output of hospitalization
minor
Hospitalization package update
dev setup
fixed rate
adding apl deployment to ecr
international seeding and setup files created
Update to report template docs for country reports
update to non-US scripts
update to international branch country setup
non-US setup Rmd and other scripts finished.
update
minor print edit
updates to script to make international functional with master
minor update to report and setup scripts
setup fix
non-us update
dev setup relative min
relative min ready
1. Added integration tests for US and non-US create_seeding.R and build_US_setup.R/build_nonUS_setup.R

create_seeding.R now has the option to choose "CSSE" or "USAFacts" for a US run.

Delete jhucsse_case_data_crude.csv

accidental data commit

vignette fix
Removed man folders from packages
fixes in the international branch before the merge
Do not update packages
Update covidImportation to v1.6.1
minor fix
fix non-US setup
Update local_install.R
Fix merge error
Reload covidImportation v1.6.1 to fix tidyverse dependency
seeding update with inputted incidence multiplier
minor names fix
Minor fixes to build_US and build_nonUS integration tests
deleted a comma
minor bug fix
Fix reversed international tag
fixed error message
fixed python error
minor
Adding updated severity parameters
fixing US seeding
adding print message
Update covidImportation with bug fix
minor update
Fix filter issue
integration testing fixes
Non-US makefile added. This should actually work fine for US as well. It also adds the ability to use the setup_name from the config to add a file prefix to model outputs, and then only clean those model outputs when running "make clean".
make_makefile.R now includes both US and non-US functionality
make_makefile white space fix
Add tictoc package to dev docker
Updated to fix a docker bug

Co-authored-by: Josh Wills [email protected]
Co-authored-by: jkamins7 [email protected]
Co-authored-by: kkintaro [email protected]
Co-authored-by: Kyra Grantz [email protected]
Co-authored-by: Sam Shah [email protected]
Co-authored-by: shauntruelove [email protected]
Co-authored-by: chadi [email protected]
Co-authored-by: hrmeredith12 [email protected]
Co-authored-by: Josh Wills [email protected]
Co-authored-by: Sam Shah [email protected]
Co-authored-by: Dave [email protected]
Co-authored-by: Shaun Truelove [email protected]

rename report.generation folder
update report.generation path in workflow test

Co-authored-by: Kyra Grantz [email protected]
Co-authored-by: juanderone [email protected]
Co-authored-by: Josh Wills [email protected]
Co-authored-by: jkamins7 [email protected]
Co-authored-by: kkintaro [email protected]
Co-authored-by: Sam Shah [email protected]
Co-authored-by: shauntruelove [email protected]
Co-authored-by: chadi [email protected]
Co-authored-by: hrmeredith12 [email protected]
Co-authored-by: Josh Wills [email protected]
Co-authored-by: Sam Shah [email protected]
Co-authored-by: Dave [email protected]
Co-authored-by: Shaun Truelove [email protected]

configurable delay and ratio for seeding
seeding file extra comma
change path to report.generation
rm double parens
Dev make (Dev make #358)
make_makefile - rm filter and add seeding & intl
add parens
typo
Removed filter from tests
fix parens issue
fixes Zero-population regions break the SEIR #338 by raising an error
fixes The model runs without warning if the sum of all mobility from a region is greater than the region's population #339 by raising an error
better and correct message
bugfixes
better presentation
consistency accross messages
accidently deleted some test, putting them back
newlines
Updated make_makefile.R to pass tests multiple times in a row
integ test 2x, update local install
try to fix 2x integ test
rm unnecessary chdir
fix typo in aws apl workflow

Co-authored-by: jkamins7 [email protected]
Co-authored-by: chadi [email protected]

readme file changes
change to latest docker image
dev image
make sensible load_config err + test
Updated docker file
Removed failing workflow
Removed more rstudio config from docker file
Removed more rstudio config from docker file
Removed outdated vignettes
Updated covidImportation version in docker
Updated packrat

Co-authored-by: Josh Wills [email protected]
Co-authored-by: kkintaro [email protected]
Co-authored-by: Sam Shah [email protected]
Co-authored-by: jkamins7 [email protected]
Co-authored-by: Joseph Lemaitre [email protected]
Co-authored-by: Josh Wills [email protected]
Co-authored-by: Sam Shah [email protected]
Co-authored-by: shauntruelove [email protected]
Co-authored-by: Dave [email protected]
Co-authored-by: Shaun Truelove [email protected]
Co-authored-by: Kyra Grantz [email protected]
Co-authored-by: juanderone [email protected]
Co-authored-by: hrmeredith12 [email protected]

* Add an environment variable that can be used for writing uniquely named output files across blocks of jobs from AWS batch * RStudio in the Docker container RStudio is now available in the Docker container, which allows development and EDA with the same set of packages as is run in production. * Update covidImportation package to v1.6 (#10) * Update covidImportation package to v1.6 (#250) * Adding final form of previous packrat + docker setup after merging weirdness * Switching .so to git lfs * Updated indexing in simulations and hospitalization * Added better indexing for hospitalization * Add ability to reduce alpha, sigma, and gamma (#241) * Add the ability to reduce multiple parameters * Add Reduce scenario template to test_simple and documentation * minor bug test fix * Minor bugs Co-authored-by: Joseph Lemaitre <[email protected]> * Move the spatial setup outside of the scenarios loop since it's expensive to load and doesn't change per scenario. * Removing source for packages installable from cran * Updated the python rules for reticulate (tests still pass) * Removing source for packages installable from cran * Updated the python rules for reticulate (tests still pass) * Updated based on review * Fixed filter issues with makefile setup in case dynfilter isn't provided in config * Updated makefile * Reduce hospitalization memory pressure Switch a critical split-apply-combine away from `do.call()`, which results in a 45% reduction in memory usage and a 35% speedup in execution time in my testing. * Packrat (#253) * Adding final form of previous packrat + docker setup after merging weirdness * Switching .so to git lfs * Removing source for packages installable from cran * Updated the python rules for reticulate (tests still pass) * Updated based on review * Updated to use dev's docker instead of dataseed's * Added reticulate zoo and xts * Updated docker with git-lfs * Packrat (#267) * Adding final form of previous packrat + docker setup after merging weirdness * Switching .so to git lfs * Removing source for packages installable from cran * Updated the python rules for reticulate (tests still pass) * Updated based on review * Updated to use dev's docker instead of dataseed's * Added reticulate zoo and xts * Updated docker with git-lfs * Updating docker to install current versions of local packages * Update .Rprofile * Update dockerhub.yaml * Update aws.yaml * Yet another packrat attempt * Update ci.yml * Generic version of the batch job launcher/runner (#257) * Generic version of batch from the union of jwills_dfU_run and dataseed_batch2 * Fixes from running stuff on some test jobs * Add a vcpu CLI option and update sims_per_job to refer to slots per job Co-authored-by: jkamins7 <[email protected]> * Reduce SEIR startup costs (#273) * 60% speedup in one run SEIR performance The biggest cost in a single sim SEIR run was importation of Numba and the JIT compilation. Change this to compile ahead of time, which results in a nice 60% lift in one run SEIR performance by saving these startup costs---which will be valuable for our large inference runs. Minor performance benefit when running many simulations as JIT costs are amortized away. ``` Benchmark #1: single sim JIT compilation (current) Time (mean ± σ): 13.429 s ± 0.537 s Range (min … max): 12.973 s … 14.867 s 100 runs Benchmark #2: single sim AOT compilation (new) Time (mean ± σ): 5.129 s ± 0.125 s Range (min … max): 4.901 s … 5.364 s 100 runs ``` * Add Python build directory to .gitignore * Integrate build_US_setup into pipeline and... (#271) * Add hard-coded territory data to build_US_setup * Create csv of island area census data since it cannot be accessed by API * Change the report targets to follow the conventions of make_makefile * Integrate build_US_setup into pipeline * Some bug fixes * git lfs pull of commute_data.csv and switch docker image * Update ci.yml * Update ci.yml * Remove generated files * Update make_makefile.R * Update run_tests.py * pull census year from config * Use census year from config to build_US_setup * Update build_US_setup.R Co-authored-by: eclee25 <[email protected]> * Add check to hospitalization that geodata geoids are in geoid-params.csv (#283) * added state level script for creating csv reporting out quantiles * Fixed a slight bug with static dates and added full geographic extent version of the quantile generation script * Added countylevel script * Varios fixes and updates to post run summarization scripts. * Integrate QuantileSummarizeGeoExtent.R into pipeline (untested) * Integrate QuantileSummarizeGeoExtent.R into pipeline * Create QuantileSummarizeGeoidLevel.py * Working on the python script * Integrate quantile scripts into Makefile * Delete QuantileSummarizeGeoidLevel.py * perf fix for quantile_report_script * QuantileSummarizeGeoidLevel on Apache Spark This commit includes a Python implementation of `QuantileSummarizeGeoidLevel.R` running on Apache Spark. The job essentially computes quantiles grouped by geoid and time whereby Spark provides the shuffle and quantile estimation mechanism to perform this aggregation efficiently. The job can be run locally within the container (fine for USA run but takes ~45mins on a r5.24xlarge) or distributed on Amazon EMR. This commit adds Spark and consequently Java inside the container. * add `--name_filter` to quantile_summarize_geoid_level as per feedback * Adjust quantile scripts so they all have the same interface - Fixed bug in both R scripts where `num_files` was set incorrectly - Adjust quantile_summarize_geoid_level.py to take scenarios (+ config file) versus path names as input to mimic the interface of the other scripts * Revert make_makefile.R to dev branch version * setup file for international countries * Fatiguing NPI * tested MVP * other implementation, maybe cleaner * update to hosp_run to take specified geoid-params * Added mild infections as output of hospitalization * minor * Hospitalization package update * dev setup * fixed rate * adding apl deployment to ecr * international seeding and setup files created * Update to report template docs for country reports * update to non-US scripts * update to international branch country setup * non-US setup Rmd and other scripts finished. * update * minor print edit * updates to script to make international functional with master * minor update to report and setup scripts * setup fix * non-us update * dev setup relative min * relative min ready * 1. Added integration tests for US and non-US create_seeding.R and build_US_setup.R/build_nonUS_setup.R 2. create_seeding.R now has the option to choose "CSSE" or "USAFacts" for a US run. * Delete jhucsse_case_data_crude.csv accidental data commit * vignette fix * Removed man folders from packages * fixes in the international branch before the merge * Do not update packages * Update covidImportation to v1.6.1 * minor fix * fix non-US setup * Update local_install.R * Fix merge error * Reload covidImportation v1.6.1 to fix tidyverse dependency * seeding update with inputted incidence multiplier * minor names fix * Minor fixes to build_US and build_nonUS integration tests * deleted a comma * minor bug fix * Fix reversed international tag * fixed error message * fixed python error * minor * Adding updated severity parameters * fixing US seeding * adding print message * Update covidImportation with bug fix * minor update * Fix filter issue * integration testing fixes * Non-US makefile added. This should actually work fine for US as well. It also adds the ability to use the setup_name from the config to add a file prefix to model outputs, and then only clean those model outputs when running "make clean". * make_makefile.R now includes both US and non-US functionality * make_makefile white space fix * Add tictoc package to dev docker * Updated to fix a docker bug * Report devel2 into dev (#352) * updates to state template * fix load_cum_inf_geounit_dates to use hosp only * add hosp method chunks from report_devel * adding generic mapping function * removing grouping by time for appropriate cumsum in load_cum_inf * fixing error in load_cum_inf * add ventilator to scenario tbl * add warning about loading infections from hosp data * deprecate old functions, integration testing temp * recreating clean NAMESPACE to remove export of setup_testing_environment preventing pkg install * adding sim_num before post_process in load_hosp_sims_filtered for output that does not contain sim_num but requires it for post-processing * adding warning about variable name to load_hosp_geounit_threshold * moving make_excess_heatmap to deprecated functions * prep report_devel2 for dev merge (#351) * Version with pyarrow included * Dependencies for arrow in R as well * Fixed check_model script * Updated for feather integration * Updated test cases since `n` is reserved in yml * adding make_excess_heatmap function for hosp outcomes * Fixing parallelization mistake * Minor fixes - Use the "optimize" covidImportation version - Always upgrade local packages if upgrade available (vs silently ignore) - check_model_reports should ensure axis are dates * new figure relative to threshold heatmap * Update importation.R to match covidImportation package updates * Updated model code to use the new covidImportation package, and also seed to E instead of I (and keep population fixed * Fixed typo * Final fix to avoid numba * Fixed path to install_local script * Added package * Fixed seeding creation * rm NAs and fix create_seeding.R * add new cum hosp/deaths check to check_models scr * update indexes in check model script * long form mobility * Update reference to geoid-params.csv inside of hosp_run.R * 10x seeding file * Write the npi when writing parquet output * template * report after simulation * Removed geodata read from hosp_run.R since it's not being used * Updated things that feed into mobility * Updated build_US_setup.R to account for the move * These files got removed in a previous commit * Removing unused (as far as I can tell anyway) data * Fix bug when the places are also a number * Changing back test cases to use size/prob instead of n/p * Updated name to pass checks on case sensitive OS * Updated to use file_extension argument` * Fix broken tests, though I recommend we eliminate the mean and var checks since they'll be flaky * Updated build_US_setup.R to work with the current setup * Renamed parameters to avoid confusion; print out simid as 9 digits SEIR and hospitalization phases have more standardized file format * read parquet file times correctly * Revert "read parquet file times correctly" This reverts commit 521dd25. * parquet date fixes (#207) Co-authored-by: hrmeredith12 <[email protected]> * Report devel (#208) * fix unit test code * fix unit test for real * fix unit tests * adding ability to filter geoids in relative heatmap function * adding template for county-specific report for a given state * lower tolerance for distribution tests * planning_models chunk * planning scenario chunk * add names to dev team Co-authored-by: eclee25 <[email protected]> Co-authored-by: Kyra Grantz <[email protected]> Co-authored-by: hrmeredith12 <[email protected]> * Adding Javier (#210) Co-authored-by: hrmeredith12 <[email protected]> Co-authored-by: Elizabeth Lee <[email protected]> * Delete build-model-input.R (#217) * Dataseed merge (#215) * Adding Javier * Adding commute data back in * rm fixed param and comment out bad plot * commit namesapce report gen * fix NVentCurr name * formatting changes to county report template, removing defaults that should be modified for each report * adding references for county report template * change importation seeding * table formatting * limitations chunk considering age specific hosp calculations * removing build_hospdeath_geoid_par - old version not used in hosprun.R * removing legacy hospitalization scripts. everything runs through hosp_run.R now * using current default durations to minimize confusion Co-authored-by: hrmeredith12 <[email protected]> Co-authored-by: Elizabeth Lee <[email protected]> Co-authored-by: Kyra Grantz <[email protected]> * Removing config.yml and changing the variable name in create_seeding to be truthful. (#219) * Fixed the low in followup issue (#224) * Fixed the low in followup issue * Adding initial ^ * adding county report template yaml (#221) Co-authored-by: jkamins7 <[email protected]> * Fix load-bearing typo (#225) * Fix load-bearing typo * pretty sure it's supposed to be this Co-authored-by: Josh Wills <[email protected]> Co-authored-by: kkintaro <[email protected]> * Add an environment variable that can be used for writing uniquely named output files across blocks of jobs from AWS batch * fix for 1 scenario (#230) Co-authored-by: Elizabeth Lee <[email protected]> * RStudio in the Docker container RStudio is now available in the Docker container, which allows development and EDA with the same set of packages as is run in production. * Update covidImportation package to v1.6 (#10) * Update covidImportation package to v1.6 (#250) * Adding final form of previous packrat + docker setup after merging weirdness * Switching .so to git lfs * Updated indexing in simulations and hospitalization * Added better indexing for hospitalization * Add ability to reduce alpha, sigma, and gamma (#241) * Add the ability to reduce multiple parameters * Add Reduce scenario template to test_simple and documentation * minor bug test fix * Minor bugs Co-authored-by: Joseph Lemaitre <[email protected]> * Move the spatial setup outside of the scenarios loop since it's expensive to load and doesn't change per scenario. * Removing source for packages installable from cran * Updated the python rules for reticulate (tests still pass) * Removing source for packages installable from cran * Updated the python rules for reticulate (tests still pass) * Updated based on review * Fixed filter issues with makefile setup in case dynfilter isn't provided in config * Updated makefile * Reduce hospitalization memory pressure Switch a critical split-apply-combine away from `do.call()`, which results in a 45% reduction in memory usage and a 35% speedup in execution time in my testing. * Packrat (#253) * Adding final form of previous packrat + docker setup after merging weirdness * Switching .so to git lfs * Removing source for packages installable from cran * Updated the python rules for reticulate (tests still pass) * Updated based on review * Updated to use dev's docker instead of dataseed's * Added reticulate zoo and xts * Updated docker with git-lfs * Packrat (#267) * Adding final form of previous packrat + docker setup after merging weirdness * Switching .so to git lfs * Removing source for packages installable from cran * Updated the python rules for reticulate (tests still pass) * Updated based on review * Updated to use dev's docker instead of dataseed's * Added reticulate zoo and xts * Updated docker with git-lfs * Updating docker to install current versions of local packages * Update .Rprofile * Update dockerhub.yaml * Update aws.yaml * Yet another packrat attempt * Update ci.yml * Generic version of the batch job launcher/runner (#257) * Generic version of batch from the union of jwills_dfU_run and dataseed_batch2 * Fixes from running stuff on some test jobs * Add a vcpu CLI option and update sims_per_job to refer to slots per job Co-authored-by: jkamins7 <[email protected]> * changing covidImportation tag to 1.6.1 * Reduce SEIR startup costs (#273) * 60% speedup in one run SEIR performance The biggest cost in a single sim SEIR run was importation of Numba and the JIT compilation. Change this to compile ahead of time, which results in a nice 60% lift in one run SEIR performance by saving these startup costs---which will be valuable for our large inference runs. Minor performance benefit when running many simulations as JIT costs are amortized away. ``` Benchmark #1: single sim JIT compilation (current) Time (mean ± σ): 13.429 s ± 0.537 s Range (min … max): 12.973 s … 14.867 s 100 runs Benchmark #2: single sim AOT compilation (new) Time (mean ± σ): 5.129 s ± 0.125 s Range (min … max): 4.901 s … 5.364 s 100 runs ``` * Add Python build directory to .gitignore * Integrate build_US_setup into pipeline and... (#271) * Add hard-coded territory data to build_US_setup * Create csv of island area census data since it cannot be accessed by API * Change the report targets to follow the conventions of make_makefile * Integrate build_US_setup into pipeline * Some bug fixes * git lfs pull of commute_data.csv and switch docker image * Update ci.yml * Update ci.yml * Remove generated files * Update make_makefile.R * Update run_tests.py * pull census year from config * Use census year from config to build_US_setup * Update build_US_setup.R Co-authored-by: eclee25 <[email protected]> * Add check to hospitalization that geodata geoids are in geoid-params.csv (#283) * added state level script for creating csv reporting out quantiles * Fixed a slight bug with static dates and added full geographic extent version of the quantile generation script * Added countylevel script * Varios fixes and updates to post run summarization scripts. * Integrate QuantileSummarizeGeoExtent.R into pipeline (untested) * Integrate QuantileSummarizeGeoExtent.R into pipeline * Create QuantileSummarizeGeoidLevel.py * Working on the python script * Integrate quantile scripts into Makefile * Delete QuantileSummarizeGeoidLevel.py * perf fix for quantile_report_script * QuantileSummarizeGeoidLevel on Apache Spark This commit includes a Python implementation of `QuantileSummarizeGeoidLevel.R` running on Apache Spark. The job essentially computes quantiles grouped by geoid and time whereby Spark provides the shuffle and quantile estimation mechanism to perform this aggregation efficiently. The job can be run locally within the container (fine for USA run but takes ~45mins on a r5.24xlarge) or distributed on Amazon EMR. This commit adds Spark and consequently Java inside the container. * add `--name_filter` to quantile_summarize_geoid_level as per feedback * Adjust quantile scripts so they all have the same interface - Fixed bug in both R scripts where `num_files` was set incorrectly - Adjust quantile_summarize_geoid_level.py to take scenarios (+ config file) versus path names as input to mimic the interface of the other scripts * Revert make_makefile.R to dev branch version * setup file for international countries * Fatiguing NPI * tested MVP * other implementation, maybe cleaner * update to hosp_run to take specified geoid-params * Added mild infections as output of hospitalization * minor * Hospitalization package update * dev setup * fixed rate * adding apl deployment to ecr * international seeding and setup files created * Update to report template docs for country reports * update to non-US scripts * update to international branch country setup * non-US setup Rmd and other scripts finished. * update * minor print edit * updates to script to make international functional with master * minor update to report and setup scripts * setup fix * non-us update * dev setup relative min * relative min ready * 1. Added integration tests for US and non-US create_seeding.R and build_US_setup.R/build_nonUS_setup.R 2. create_seeding.R now has the option to choose "CSSE" or "USAFacts" for a US run. * Delete jhucsse_case_data_crude.csv accidental data commit * vignette fix * Removed man folders from packages * fixes in the international branch before the merge * Do not update packages * Update covidImportation to v1.6.1 * minor fix * fix non-US setup * Update local_install.R * Fix merge error * Reload covidImportation v1.6.1 to fix tidyverse dependency * seeding update with inputted incidence multiplier * minor names fix * Minor fixes to build_US and build_nonUS integration tests * deleted a comma * minor bug fix * Fix reversed international tag * fixed error message * fixed python error * minor * Adding updated severity parameters * fixing US seeding * adding print message * Update covidImportation with bug fix * minor update * Fix filter issue * integration testing fixes * Non-US makefile added. This should actually work fine for US as well. It also adds the ability to use the setup_name from the config to add a file prefix to model outputs, and then only clean those model outputs when running "make clean". * make_makefile.R now includes both US and non-US functionality * make_makefile white space fix * Add tictoc package to dev docker * Updated to fix a docker bug Co-authored-by: Josh Wills <[email protected]> Co-authored-by: jkamins7 <[email protected]> Co-authored-by: kkintaro <[email protected]> Co-authored-by: Kyra Grantz <[email protected]> Co-authored-by: Sam Shah <[email protected]> Co-authored-by: shauntruelove <[email protected]> Co-authored-by: chadi <[email protected]> Co-authored-by: hrmeredith12 <[email protected]> Co-authored-by: Josh Wills <[email protected]> Co-authored-by: Sam Shah <[email protected]> Co-authored-by: Dave <[email protected]> Co-authored-by: Shaun Truelove <[email protected]> * rename report.generation folder * update report.generation path in workflow test Co-authored-by: Kyra Grantz <[email protected]> Co-authored-by: juanderone <[email protected]> Co-authored-by: Josh Wills <[email protected]> Co-authored-by: jkamins7 <[email protected]> Co-authored-by: kkintaro <[email protected]> Co-authored-by: Sam Shah <[email protected]> Co-authored-by: shauntruelove <[email protected]> Co-authored-by: chadi <[email protected]> Co-authored-by: hrmeredith12 <[email protected]> Co-authored-by: Josh Wills <[email protected]> Co-authored-by: Sam Shah <[email protected]> Co-authored-by: Dave <[email protected]> Co-authored-by: Shaun Truelove <[email protected]> * configurable delay and ratio for seeding * seeding file extra comma * change path to report.generation * rm double parens * Dev make (#358) * make_makefile - rm filter and add seeding & intl * add parens * typo * Removed filter from tests * fix parens issue * fixes #338 by raising an error * fixes #339 by raising an error * better and correct message * bugfixes * better presentation * consistency accross messages * accidently deleted some test, putting them back * newlines * Updated make_makefile.R to pass tests multiple times in a row * integ test 2x, update local install * try to fix 2x integ test * rm unnecessary chdir * fix typo in aws apl workflow Co-authored-by: jkamins7 <[email protected]> Co-authored-by: chadi <[email protected]> * readme file changes * change to latest docker image * dev image * make sensible load_config err + test * Updated docker file * Removed failing workflow * Removed more rstudio config from docker file * Removed more rstudio config from docker file * Removed outdated vignettes * Updated covidImportation version in docker * Updated packrat Co-authored-by: Josh Wills <[email protected]> Co-authored-by: kkintaro <[email protected]> Co-authored-by: Sam Shah <[email protected]> Co-authored-by: jkamins7 <[email protected]> Co-authored-by: Joseph Lemaitre <[email protected]> Co-authored-by: Josh Wills <[email protected]> Co-authored-by: Sam Shah <[email protected]> Co-authored-by: shauntruelove <[email protected]> Co-authored-by: Dave <[email protected]> Co-authored-by: Shaun Truelove <[email protected]> Co-authored-by: Kyra Grantz <[email protected]> Co-authored-by: juanderone <[email protected]> Co-authored-by: hrmeredith12 <[email protected]>

jcblemai · 2022-03-07T15:52:06Z

Does this squash the commits ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upstream master -> dev (this seems hard for some reason) #431

Upstream master -> dev (this seems hard for some reason) #431

jkamins7 commented Aug 6, 2021

jcblemai commented Mar 7, 2022

Upstream master -> dev (this seems hard for some reason) #431

Are you sure you want to change the base?

Upstream master -> dev (this seems hard for some reason) #431

Conversation

jkamins7 commented Aug 6, 2021

jcblemai commented Mar 7, 2022