Skip to content

Commit

Permalink
hydra: plugins_path and advanced config (#5097)
Browse files Browse the repository at this point in the history
* hydra: plugins_path and advanced config

* drop unused link

* add link to hydra plugins

* explain you can run code with or without hydra
  • Loading branch information
dberenbaum committed Feb 2, 2024
1 parent 5eb2717 commit 881165d
Show file tree
Hide file tree
Showing 2 changed files with 107 additions and 3 deletions.
105 changes: 102 additions & 3 deletions content/docs/user-guide/experiment-management/hydra-composition.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ supports Hydra's [config composition] as a way to configure [experiment runs].

<admon type="info">

At the moment you must explicitly enable this feature with:
You must explicitly enable this feature with:

```cli
$ dvc config hydra.enabled True
Expand Down Expand Up @@ -139,8 +139,9 @@ We parametrize the shell commands above (`mkdir`, `tar`, `wget`) as well as

<admon type="tip">

You can use `dvc.api.params_show()` to load params in Python code. For other
languages, use [dictionary unpacking] or a YAML parsing library.
You can load the params with any YAML parsing library. In Python, you can use
the built-in `dvc.api.params_show()` or `OmegaConf.load("params.yaml")` (which
comes with Hydra).

[dictionary unpacking]:
/doc/user-guide/project-structure/dvcyaml-files#dictionary-unpacking
Expand Down Expand Up @@ -221,4 +222,102 @@ Stage 'train' didn't change, skipping

</admon>

`dvc exp run` will compose a new `params.yaml` each time you run it, so it is
not a reliable way to reproduce past experiments. Instead, use `dvc repro` when
you want to reproduce a previously run experiment.

[debug]: /doc/user-guide/pipelines/running-pipelines#debugging-stages

## Migrating Hydra Projects

If you already have Hydra configured and want to start using DVC alongside it,
you may need to refactor your code slightly. DVC will not pass the Hydra config
to `@hydra.main()`, so it should be dropped from the code. Instead, DVC composes
the Hydra config before your code runs and dumps the results to `params.yaml`.

Using the example above, here's how the Python code in `train.py` might look
using Hydra without DVC:

```python
import hydra
from omegaconf import DictConfig
@hydra.main(version_base=None, config_path="conf", config_name="config")
def main(cfg: DictConfig) -> None:
# train model using cfg parameters
if __name__ == "__main__":
main()
```

To convert the same code to use DVC with Hydra composition enabled:

```python
from omegaconf import OmegaConf
def main() -> None:
cfg = OmegaConf.load("params.yaml")
# train model using cfg parameters
if __name__ == "__main__":
main()
```

You no longer need to import Hydra into your code. A `main()` method is included
in this example because it is good practice, but it's not necessary. This
separation between config and code can help debug because the entire config
generated by Hydra gets written to `params.yaml` before the experiment starts.
You can run the same code with or without Hydra (or DVC). You can also reuse
`params.yaml` across multiple scripts in different stages of a DVC pipeline.

## Advanced Hydra config

You can configure how DVC works with Hydra.

By default, DVC will look for Hydra [config groups] in a `conf` directory, but
you can set a different directory using `dvc config hydra.config_dir other_dir`.
This is equivalent to the `config_path` argument in `@hydra.main()`.

Within that directory, DVC will look for [defaults list] in `config.yaml`, but
you can set a different path using `dvc config hydra.config_name other.yaml`.
This is equivalent to the `config_name` argument in `@hydra.main()`.

Hydra will automatically discover [plugins] in the `hydra_plugins` directory. By
default, DVC will look for `hydra_plugins` in the root directory of the DVC
repository, but you can set a different path with
`dvc config hydra.plugins_path other_path`.

### Custom resolvers

You can register [OmegaConf custom resolvers] as plugins by writing them to a
file inside `hydra_plugins`. DVC will use these custom resolvers when composing
the Hydra config. For example, add a custom resolver to
`hydra_plugins/my_resolver.py`:

```python
import os
from omegaconf import OmegaConf
OmegaConf.register_new_resolver('join', lambda x, y : os.path.join(x, y))
```

You can use that custom resolver inside the Hydra config:

```yaml
dir: raw/data
relpath: dataset.csv
fullpath: ${join:${dir},${relpath}}
```

The final `params.yaml` will look like:

```yaml
dir: raw/data
relpath: dataset.csv
fullpath: raw/data/dataset.csv
```

[plugins]:
https://hydra.cc/docs/advanced/plugins/develop/#automatic-plugin-discovery-process
[OmegaConf custom resolvers]:
https://omegaconf.readthedocs.io/en/latest/custom_resolvers.html
5 changes: 5 additions & 0 deletions content/docs/user-guide/project-structure/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -258,12 +258,17 @@ Composition].
groups]. Defaults to `conf`.
- `hydra.config_name` - the name of the file containing the Hydra [defaults
list] (located inside `hydra.config_dir`). Defaults to `config.yaml`.
- `hydra.plugins_path` - location of the parent directory of `hydra_plugins`,
where Hydra will automatically discover [plugins]. Defaults to the root of the
DVC repository.
[config composition]:
https://hydra.cc/docs/tutorials/basic/your_first_app/composition/
[config groups]:
https://hydra.cc/docs/tutorials/basic/your_first_app/config_groups/
[defaults list]: https://hydra.cc/docs/tutorials/basic/your_first_app/defaults/
[plugins]:
https://hydra.cc/docs/advanced/plugins/develop/#automatic-plugin-discovery-process
</details>
Expand Down

0 comments on commit 881165d

Please sign in to comment.