Skip to content

Commit

Permalink
Merge pull request #82 from m2lines/notebooks-cleanup
Browse files Browse the repository at this point in the history
Notebooks cleanup
  • Loading branch information
MarionBWeinzierl committed Oct 3, 2023
2 parents e82bbe6 + edbc5f8 commit 6bae25d
Show file tree
Hide file tree
Showing 22 changed files with 590 additions and 220,898 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ __pycache__/*

# MLflow output
/mlruns/*
/examples/jupyter-notebooks/mlruns/*

# Jupyter notebook cache files
.ipynb_checkpoints/
/.pytest_cache/
25 changes: 13 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,15 +55,15 @@ With `pip` installed, run the following in the root directory:
[Poetry](https://python-poetry.org/). To use, rename `pyproject-poetry.toml` to
`pyproject.toml` (overwriting the existing file) and use Poetry as normal. Note
that the Poetry build is not actively supported-- if it fails, check that the
dependencies are up to date with the setuptools `pyproject.toml`.)*
dependencies are up-to-date with the setuptools `pyproject.toml`.)*

#### System
Some graphing code uses cartopy, which requires [GEOS](https://libgeos.org/). To
install on Ubuntu:

sudo apt install libgeos-dev

On MacOS, via Homebrew:
On macOS, via Homebrew:

brew install geos

Expand Down Expand Up @@ -100,16 +100,18 @@ with `--no-conda`
In order to make sure that data in- and output locations are well-defined, the
environment variable `MLFLOW_TRACKING_URI` must be set to the intended data location:

> export MLFLOW_TRACKING_URI="/path/to/data/dir"
export MLFLOW_TRACKING_URI="/path/to/data/dir"

in Linux, or
> %env MLFLOW_TRACKING_URI /path/to/data/dir
```
%env MLFLOW_TRACKING_URI /path/to/data/dir
```

in a Jupyter Notebook, or

```
import os
os.environ['MLFLOW_TRACKING_URI] = '/path/to/data/dir'
os.environ['MLFLOW_TRACKING_URI'] = '/path/to/data/dir'
```
in Python.

Expand Down Expand Up @@ -161,7 +163,7 @@ MLflow call example:

```
mlflow run . --experiment-name <name> -e train --env-manager=local \
-P exp_id=692154129919725696 -P run_id=c57b36da385e4fc4a967e7790192ecb2 \
-P run_id=<run id> \
-P learning_rate=0/5e-4/15/5e-5/30/5e-6 -P n_epochs=200 -P weight_decay=0.00 -P train_split=0.8 \
-P test_split=0.85 -P model_module_name=models.models1 -P model_cls_name=FullyCNN -P batchsize=4 \
-P transformation_cls_name=SoftPlusTransform -P submodel=transform3 \
Expand All @@ -175,7 +177,7 @@ Relevant parameters:
* `run_id`: id of the run that generated the forcing data that will be used for
training.
* `loss_cls_name`: name of the class that defines the loss. This class should be
defined in train/losses.py in order for the script to find it. Currently the
defined in train/losses.py in order for the script to find it. Currently, the
main available options are:
* `HeteroskedasticGaussianLossV2`: this corresponds to the loss used in the
2021 paper
Expand Down Expand Up @@ -212,17 +214,16 @@ In this step it is particularly important to set the environment variable `MLFLO
in order for the data to be found and stored in a sensible place.

One can run the inference step by interactively
running the following project root directory:
running the following in the project root directory:

>python3 -m gz21_ocean_momentum.inference.main --n_splits=40
python3 -m gz21_ocean_momentum.inference.main --n_splits=40

with `n_splits` being the number of subsets which the dataset is split
into for the processing, before being put back together for the final output.
This is done in order to avoid memory issues for large datasets.
Other useful arguments for this call would be
- `to_experiment`: the name of the mlflow experiment used for this run
n_splits: the number of splits applied to the data
- `batch_size`: the batch size used in running the neural network on the data
- `to_experiment`: the name of the mlflow experiment used for this run (default is "test").
- `batch_size`: the batch size used in running the neural network on the data.


After the script has started running, it will first require
Expand Down
Loading

0 comments on commit 6bae25d

Please sign in to comment.