From 881165d9a487bd0de225a78c9fedd97ed5edc091 Mon Sep 17 00:00:00 2001 From: Dave Berenbaum Date: Fri, 2 Feb 2024 14:39:24 +0000 Subject: [PATCH] hydra: plugins_path and advanced config (#5097) * hydra: plugins_path and advanced config * drop unused link * add link to hydra plugins * explain you can run code with or without hydra --- .../hydra-composition.md | 105 +++++++++++++++++- .../project-structure/configuration.md | 5 + 2 files changed, 107 insertions(+), 3 deletions(-) diff --git a/content/docs/user-guide/experiment-management/hydra-composition.md b/content/docs/user-guide/experiment-management/hydra-composition.md index 87afcefee3..e6a0a6ae60 100644 --- a/content/docs/user-guide/experiment-management/hydra-composition.md +++ b/content/docs/user-guide/experiment-management/hydra-composition.md @@ -5,7 +5,7 @@ supports Hydra's [config composition] as a way to configure [experiment runs]. -At the moment you must explicitly enable this feature with: +You must explicitly enable this feature with: ```cli $ dvc config hydra.enabled True @@ -139,8 +139,9 @@ We parametrize the shell commands above (`mkdir`, `tar`, `wget`) as well as -You can use `dvc.api.params_show()` to load params in Python code. For other -languages, use [dictionary unpacking] or a YAML parsing library. +You can load the params with any YAML parsing library. In Python, you can use +the built-in `dvc.api.params_show()` or `OmegaConf.load("params.yaml")` (which +comes with Hydra). [dictionary unpacking]: /doc/user-guide/project-structure/dvcyaml-files#dictionary-unpacking @@ -221,4 +222,102 @@ Stage 'train' didn't change, skipping +`dvc exp run` will compose a new `params.yaml` each time you run it, so it is +not a reliable way to reproduce past experiments. Instead, use `dvc repro` when +you want to reproduce a previously run experiment. + [debug]: /doc/user-guide/pipelines/running-pipelines#debugging-stages + +## Migrating Hydra Projects + +If you already have Hydra configured and want to start using DVC alongside it, +you may need to refactor your code slightly. DVC will not pass the Hydra config +to `@hydra.main()`, so it should be dropped from the code. Instead, DVC composes +the Hydra config before your code runs and dumps the results to `params.yaml`. + +Using the example above, here's how the Python code in `train.py` might look +using Hydra without DVC: + +```python +import hydra +from omegaconf import DictConfig + +@hydra.main(version_base=None, config_path="conf", config_name="config") +def main(cfg: DictConfig) -> None: + # train model using cfg parameters + +if __name__ == "__main__": + main() +``` + +To convert the same code to use DVC with Hydra composition enabled: + +```python +from omegaconf import OmegaConf + +def main() -> None: + cfg = OmegaConf.load("params.yaml") + # train model using cfg parameters + +if __name__ == "__main__": + main() +``` + +You no longer need to import Hydra into your code. A `main()` method is included +in this example because it is good practice, but it's not necessary. This +separation between config and code can help debug because the entire config +generated by Hydra gets written to `params.yaml` before the experiment starts. +You can run the same code with or without Hydra (or DVC). You can also reuse +`params.yaml` across multiple scripts in different stages of a DVC pipeline. + +## Advanced Hydra config + +You can configure how DVC works with Hydra. + +By default, DVC will look for Hydra [config groups] in a `conf` directory, but +you can set a different directory using `dvc config hydra.config_dir other_dir`. +This is equivalent to the `config_path` argument in `@hydra.main()`. + +Within that directory, DVC will look for [defaults list] in `config.yaml`, but +you can set a different path using `dvc config hydra.config_name other.yaml`. +This is equivalent to the `config_name` argument in `@hydra.main()`. + +Hydra will automatically discover [plugins] in the `hydra_plugins` directory. By +default, DVC will look for `hydra_plugins` in the root directory of the DVC +repository, but you can set a different path with +`dvc config hydra.plugins_path other_path`. + +### Custom resolvers + +You can register [OmegaConf custom resolvers] as plugins by writing them to a +file inside `hydra_plugins`. DVC will use these custom resolvers when composing +the Hydra config. For example, add a custom resolver to +`hydra_plugins/my_resolver.py`: + +```python +import os +from omegaconf import OmegaConf + +OmegaConf.register_new_resolver('join', lambda x, y : os.path.join(x, y)) +``` + +You can use that custom resolver inside the Hydra config: + +```yaml +dir: raw/data +relpath: dataset.csv +fullpath: ${join:${dir},${relpath}} +``` + +The final `params.yaml` will look like: + +```yaml +dir: raw/data +relpath: dataset.csv +fullpath: raw/data/dataset.csv +``` + +[plugins]: + https://hydra.cc/docs/advanced/plugins/develop/#automatic-plugin-discovery-process +[OmegaConf custom resolvers]: + https://omegaconf.readthedocs.io/en/latest/custom_resolvers.html diff --git a/content/docs/user-guide/project-structure/configuration.md b/content/docs/user-guide/project-structure/configuration.md index 586d8ff4ee..19c96586f5 100644 --- a/content/docs/user-guide/project-structure/configuration.md +++ b/content/docs/user-guide/project-structure/configuration.md @@ -258,12 +258,17 @@ Composition]. groups]. Defaults to `conf`. - `hydra.config_name` - the name of the file containing the Hydra [defaults list] (located inside `hydra.config_dir`). Defaults to `config.yaml`. +- `hydra.plugins_path` - location of the parent directory of `hydra_plugins`, + where Hydra will automatically discover [plugins]. Defaults to the root of the + DVC repository. [config composition]: https://hydra.cc/docs/tutorials/basic/your_first_app/composition/ [config groups]: https://hydra.cc/docs/tutorials/basic/your_first_app/config_groups/ [defaults list]: https://hydra.cc/docs/tutorials/basic/your_first_app/defaults/ +[plugins]: + https://hydra.cc/docs/advanced/plugins/develop/#automatic-plugin-discovery-process