hydra: plugins_path and advanced config (#5097)

* hydra: plugins_path and advanced config * drop unused link * add link to hydra plugins * explain you can run code with or without hydra
iterative · Feb 2, 2024 · 881165d · 881165d
1 parent 5eb2717
commit 881165d
Show file tree

Hide file tree

Showing 2 changed files with 107 additions and 3 deletions.
diff --git a/content/docs/user-guide/experiment-management/hydra-composition.md b/content/docs/user-guide/experiment-management/hydra-composition.md
@@ -5,7 +5,7 @@ supports Hydra's [config composition] as a way to configure [experiment runs].
 
 <admon type="info">
 
-At the moment you must explicitly enable this feature with:
+You must explicitly enable this feature with:
 
 ```cli
 $ dvc config hydra.enabled True
@@ -139,8 +139,9 @@ We parametrize the shell commands above (`mkdir`, `tar`, `wget`) as well as
 
 <admon type="tip">
 
-You can use `dvc.api.params_show()` to load params in Python code. For other
-languages, use [dictionary unpacking] or a YAML parsing library.
+You can load the params with any YAML parsing library. In Python, you can use
+the built-in `dvc.api.params_show()` or `OmegaConf.load("params.yaml")` (which
+comes with Hydra).
 
 [dictionary unpacking]:
   /doc/user-guide/project-structure/dvcyaml-files#dictionary-unpacking
@@ -221,4 +222,102 @@ Stage 'train' didn't change, skipping
 
 </admon>
 
+`dvc exp run` will compose a new `params.yaml` each time you run it, so it is
+not a reliable way to reproduce past experiments. Instead, use `dvc repro` when
+you want to reproduce a previously run experiment.
+
 [debug]: /doc/user-guide/pipelines/running-pipelines#debugging-stages
+
+## Migrating Hydra Projects
+
+If you already have Hydra configured and want to start using DVC alongside it,
+you may need to refactor your code slightly. DVC will not pass the Hydra config
+to `@hydra.main()`, so it should be dropped from the code. Instead, DVC composes
+the Hydra config before your code runs and dumps the results to `params.yaml`.
+
+Using the example above, here's how the Python code in `train.py` might look
+using Hydra without DVC:
+
+```python
+import hydra
+from omegaconf import DictConfig
+
+@hydra.main(version_base=None, config_path="conf", config_name="config")
+def main(cfg: DictConfig) -> None:
+    # train model using cfg parameters
+
+if __name__ == "__main__":
+    main()
+```
+
+To convert the same code to use DVC with Hydra composition enabled:
+
+```python
+from omegaconf import OmegaConf
+
+def main() -> None:
+    cfg = OmegaConf.load("params.yaml")
+    # train model using cfg parameters
+
+if __name__ == "__main__":
+    main()
+```
+
+You no longer need to import Hydra into your code. A `main()` method is included
+in this example because it is good practice, but it's not necessary. This
+separation between config and code can help debug because the entire config
+generated by Hydra gets written to `params.yaml` before the experiment starts.
+You can run the same code with or without Hydra (or DVC). You can also reuse
+`params.yaml` across multiple scripts in different stages of a DVC pipeline.
+
+## Advanced Hydra config
+
+You can configure how DVC works with Hydra.
+
+By default, DVC will look for Hydra [config groups] in a `conf` directory, but
+you can set a different directory using `dvc config hydra.config_dir other_dir`.
+This is equivalent to the `config_path` argument in `@hydra.main()`.
+
+Within that directory, DVC will look for [defaults list] in `config.yaml`, but
+you can set a different path using `dvc config hydra.config_name other.yaml`.
+This is equivalent to the `config_name` argument in `@hydra.main()`.
+
+Hydra will automatically discover [plugins] in the `hydra_plugins` directory. By
+default, DVC will look for `hydra_plugins` in the root directory of the DVC
+repository, but you can set a different path with
+`dvc config hydra.plugins_path other_path`.
+
+### Custom resolvers
+
+You can register [OmegaConf custom resolvers] as plugins by writing them to a
+file inside `hydra_plugins`. DVC will use these custom resolvers when composing
+the Hydra config. For example, add a custom resolver to
+`hydra_plugins/my_resolver.py`:
+
+```python
+import os
+from omegaconf import OmegaConf
+
+OmegaConf.register_new_resolver('join', lambda x, y : os.path.join(x, y))
+```
+
+You can use that custom resolver inside the Hydra config:
+
+```yaml
+dir: raw/data
+relpath: dataset.csv
+fullpath: ${join:${dir},${relpath}}
+```
+
+The final `params.yaml` will look like:
+
+```yaml
+dir: raw/data
+relpath: dataset.csv
+fullpath: raw/data/dataset.csv
+```
+
+[plugins]:
+  https://hydra.cc/docs/advanced/plugins/develop/#automatic-plugin-discovery-process
+[OmegaConf custom resolvers]:
+  https://omegaconf.readthedocs.io/en/latest/custom_resolvers.html
diff --git a/content/docs/user-guide/project-structure/configuration.md b/content/docs/user-guide/project-structure/configuration.md
@@ -258,12 +258,17 @@ Composition].
   groups]. Defaults to `conf`.
 - `hydra.config_name` - the name of the file containing the Hydra [defaults
   list] (located inside `hydra.config_dir`). Defaults to `config.yaml`.
+- `hydra.plugins_path` - location of the parent directory of `hydra_plugins`,
+  where Hydra will automatically discover [plugins]. Defaults to the root of the
+  DVC repository.
 
 [config composition]:
   https://hydra.cc/docs/tutorials/basic/your_first_app/composition/
 [config groups]:
   https://hydra.cc/docs/tutorials/basic/your_first_app/config_groups/
 [defaults list]: https://hydra.cc/docs/tutorials/basic/your_first_app/defaults/
+[plugins]:
+  https://hydra.cc/docs/advanced/plugins/develop/#automatic-plugin-discovery-process
 
 </details>