-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
159 additions
and
89 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,28 +1,30 @@ | ||
![example workflow](https://github.com/MArpogaus/dvc-stage/actions/workflows/pre-commit.yml/badge.svg) | ||
![example workflow](https://github.com/MArpogaus/dvc-stage/actions/workflows/run_demo.yaml/badge.svg) | ||
![example workflow](https://github.com/MArpogaus/dvc-stage/actions/workflows/tox-gh.yaml/badge.svg) | ||
|
||
[![img](https://img.shields.io/github/contributors/MArpogaus/dvc-stage.svg?style=flat-square)](https://github.com/MArpogaus/dvc-stage/graphs/contributors) | ||
[![img](https://img.shields.io/github/forks/MArpogaus/dvc-stage.svg?style=flat-square)](https://github.com/MArpogaus/dvc-stage/network/members) | ||
[![img](https://img.shields.io/github/stars/MArpogaus/dvc-stage.svg?style=flat-square)](https://github.com/MArpogaus/dvc-stage/stargazers) | ||
[![img](https://img.shields.io/github/issues/MArpogaus/dvc-stage.svg?style=flat-square)](https://github.com/MArpogaus/dvc-stage/issues) | ||
[![img](https://img.shields.io/github/license/MArpogaus/dvc-stage.svg?style=flat-square)](https://github.com/MArpogaus/dvc-stage/blob/master/COPYING) | ||
[![img](https://img.shields.io/github/license/MArpogaus/dvc-stage.svg?style=flat-square)](https://github.com/MArpogaus/dvc-stage/blob/main/LICENSE) | ||
[![img](https://img.shields.io/github/actions/workflow/status/MArpogaus/dvc-stage/test.yaml.svg?label=test&style=flat-square)](https://github.com/MArpogaus/dvc-stage/actions/workflows/test.yaml) | ||
[![img](https://img.shields.io/github/actions/workflow/status/MArpogaus/dvc-stage/release.yaml.svg?label=release&style=flat-square)](https://github.com/MArpogaus/dvc-stage/actions/workflows/release.yaml) | ||
[![img](https://img.shields.io/badge/pre--commit-enabled-brightgreen.svg?logo=pre-commit&style=flat-square)](https://github.com/MArpogaus/dvc-stage/blob/main/.pre-commit-config.yaml) | ||
[![img](https://img.shields.io/badge/-LinkedIn-black.svg?style=flat-square&logo=linkedin&colorB=555)](https://linkedin.com/in/MArpogaus) | ||
|
||
[![img](https://img.shields.io/pypi/v/dvc-stage.svg?style=flat-square)](https://pypi.org/project/dvc-stage) | ||
|
||
|
||
# DVC-Stage | ||
|
||
1. [About The Project](#about-the-project) | ||
2. [Getting Started](#getting-started) | ||
1. [Prerequisites](#prerequisites) | ||
2. [Installation](#installation) | ||
3. [Usage](#usage) | ||
4. [License](#license) | ||
5. [Contact](#contact) | ||
6. [Acknowledgments](#acknowledgments) | ||
1. [About The Project](#org2ce5616) | ||
2. [Getting Started](#orgd5c9d2a) | ||
1. [Prerequisites](#orgb6c61f1) | ||
2. [Installation](#org4a4a796) | ||
3. [Usage](#org7523cf7) | ||
4. [Contributing](#org06cba36) | ||
5. [License](#org88720e8) | ||
6. [Contact](#orgbac8fb8) | ||
7. [Acknowledgments](#orgd27aafc) | ||
|
||
|
||
<a id="about-the-project"></a> | ||
<a id="org2ce5616"></a> | ||
|
||
## About The Project | ||
|
||
|
@@ -34,7 +36,7 @@ This python script provides a easy and parameterizeable way of defining typical | |
- data validation | ||
|
||
|
||
<a id="getting-started"></a> | ||
<a id="orgd5c9d2a"></a> | ||
|
||
## Getting Started | ||
|
||
|
@@ -43,7 +45,7 @@ project locally. To get a local copy up and running follow these simple | |
example steps. | ||
|
||
|
||
<a id="prerequisites"></a> | ||
<a id="orgb6c61f1"></a> | ||
|
||
### Prerequisites | ||
|
||
|
@@ -52,78 +54,98 @@ example steps. | |
- `pyyaml>=5` | ||
|
||
|
||
<a id="installation"></a> | ||
<a id="org4a4a796"></a> | ||
|
||
### Installation | ||
|
||
pip install git+https://github.com/MArpogaus/dvc-stage.git | ||
This package is available on [PyPI](https://pypi.org/project/dvc-stage/). | ||
You install it and all of its dependencies using pip: | ||
|
||
pip install dvc-stage | ||
|
||
<a id="usage"></a> | ||
|
||
<a id="org7523cf7"></a> | ||
|
||
## Usage | ||
|
||
DVC-Stage works ontop of two files: `dvc.yaml` and `params.yaml`. | ||
They are expected to be at the root of an initialized [dvc project](https://dvc.org/). | ||
From there you can execute `dvc-stage -h` to see available commands or `dvc-stage get-config STAGE` to generate the dvc stages from the `params.yaml` file. The tool then generates the respective yaml which you can then manually paste into the `dvc.yaml` file. Existing stages can then be updated inplace using `dvc-stage update-stage STAGE`. | ||
DVC-Stage works ontop of two files: `dvc.yaml` and `params.yaml`. They | ||
are expected to be at the root of an initialized [dvc | ||
project](https://dvc.org/). From there you can execute `dvc-stage -h` to see available | ||
commands or `dvc-stage get-config STAGE` to generate the dvc stages from | ||
the `params.yaml` file. The tool then generates the respective yaml | ||
which you can then manually paste into the `dvc.yaml` file. Existing | ||
stages can then be updated inplace using `dvc-stage update-stage STAGE`. | ||
|
||
Stages are defined inside `params.yaml` in the following schema: | ||
```yaml | ||
STAGE_NAME: | ||
load: {} | ||
transformations: [] | ||
validations: [] | ||
write: {} | ||
``` | ||
The `load` and `write` sections both require the yaml-keys `path` and `format` to read and save data respectively. | ||
|
||
The `transformations` and `validations` sections require a sequence of functions to apply, where `transformations` return data and `validations` return a truth value (derived from data). | ||
Functions are defined by the key `id` an can be either: | ||
- Methods defined on Pandas Dataframes, e.g. | ||
```yaml | ||
transformations: | ||
- id: transpose | ||
``` | ||
- Imported from any python module, e.g. | ||
```yaml | ||
transformations: | ||
- id: custom | ||
description: duplikate rows | ||
import_from: demo.duplicate | ||
``` | ||
- Predefined by DVC-Stage, e.g. | ||
```yaml | ||
validations: | ||
- id: validate_pandera_schema | ||
schema: | ||
import_from: demo.get_schema | ||
``` | ||
|
||
When writing a custom function, you need to make sure the function gracefully handles data being `None`, which is required for type inference. Data is passed as first argument. Further arguments can be provided as additional keys, as shown above for `validate_pandera_schema`, where schema is passed as second argument to the function. | ||
|
||
STAGE_NAME: | ||
load: {} | ||
transformations: [] | ||
validations: [] | ||
write: {} | ||
|
||
The `load` and `write` sections both require the yaml-keys `path` and | ||
`format` to read and save data respectively. | ||
|
||
The `transformations` and `validations` sections require a sequence of | ||
functions to apply, where `transformations` return data and | ||
`validations` return a truth value (derived from data). Functions are | ||
defined by the key `id` an can be either: | ||
|
||
- Methods defined on Pandas Dataframes, e.g. | ||
|
||
transformations: | ||
- id: transpose | ||
|
||
- Imported from any python module, e.g. | ||
|
||
transformations: | ||
- id: custom | ||
description: duplikate rows | ||
import_from: demo.duplicate | ||
|
||
- Predefined by DVC-Stage, e.g. | ||
|
||
validations: | ||
- id: validate_pandera_schema | ||
schema: | ||
import_from: demo.get_schema | ||
|
||
When writing a custom function, you need to make sure the function | ||
gracefully handles data being `None`, which is required for type | ||
inference. Data is passed as first argument. Further arguments can be | ||
provided as additional keys, as shown above for | ||
`validate_pandera_schema`, where schema is passed as second argument to | ||
the function. | ||
|
||
A working demonstration can be found at `examples/`. | ||
|
||
|
||
<a id="license"></a> | ||
<a id="org06cba36"></a> | ||
|
||
## Contributing | ||
|
||
Any Contributions are greatly appreciated! If you have a question, an issue or would like to contribute, please read our [contributing guidelines](CONTRIBUTING.md). | ||
|
||
|
||
<a id="org88720e8"></a> | ||
|
||
## License | ||
|
||
Distributed under the [GNU General Public License v3](COPYING) | ||
|
||
|
||
<a id="contact"></a> | ||
<a id="orgbac8fb8"></a> | ||
|
||
## Contact | ||
|
||
[Marcel Arpogaus](https://github.com/MArpogaus/) - [[email protected]](mailto:marcel.arpogaus@gmail.com) | ||
[Marcel Arpogaus](https://github.com/MArpogaus/) - [[email protected]](mailto:[email protected]) (encrypted with [ROT13](<https://rot13.com/>)) | ||
|
||
Project Link: | ||
<https://github.com/MArpogaus/dvc-stage> | ||
|
||
|
||
<a id="acknowledgments"></a> | ||
<a id="orgd27aafc"></a> | ||
|
||
## Acknowledgments | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,75 +5,123 @@ | |
[[https://github.com/MArpogaus/dvc-stage/network/members][https://img.shields.io/github/forks/MArpogaus/dvc-stage.svg?style=flat-square]] | ||
[[https://github.com/MArpogaus/dvc-stage/stargazers][https://img.shields.io/github/stars/MArpogaus/dvc-stage.svg?style=flat-square]] | ||
[[https://github.com/MArpogaus/dvc-stage/issues][https://img.shields.io/github/issues/MArpogaus/dvc-stage.svg?style=flat-square]] | ||
[[https://github.com/MArpogaus/dvc-stage/blob/master/COPYING][https://img.shields.io/github/license/MArpogaus/dvc-stage.svg?style=flat-square]] | ||
[[https://github.com/MArpogaus/dvc-stage/blob/main/LICENSE][https://img.shields.io/github/license/MArpogaus/dvc-stage.svg?style=flat-square]] | ||
[[https://github.com/MArpogaus/dvc-stage/actions/workflows/test.yaml][https://img.shields.io/github/actions/workflow/status/MArpogaus/dvc-stage/test.yaml.svg?label=test&style=flat-square]] | ||
[[https://github.com/MArpogaus/dvc-stage/actions/workflows/release.yaml][https://img.shields.io/github/actions/workflow/status/MArpogaus/dvc-stage/release.yaml.svg?label=release&style=flat-square]] | ||
[[https://github.com/MArpogaus/dvc-stage/blob/main/.pre-commit-config.yaml][https://img.shields.io/badge/pre--commit-enabled-brightgreen.svg?logo=pre-commit&style=flat-square]] | ||
[[https://linkedin.com/in/MArpogaus][https://img.shields.io/badge/-LinkedIn-black.svg?style=flat-square&logo=linkedin&colorB=555]] | ||
|
||
[[https://pypi.org/project/dvc-stage][https://img.shields.io/pypi/v/dvc-stage.svg?style=flat-square]] | ||
|
||
* DVC-Stage | ||
|
||
#+TOC: headlines 2 local | ||
|
||
** About The Project | ||
:PROPERTIES: | ||
:CUSTOM_ID: about-the-project | ||
:END: | ||
|
||
This python script provides a easy and parameterizeable way of defining typical dvc stages for: | ||
This python script provides a easy and parameterizeable way of defining typical dvc (sub-)stages for: | ||
|
||
- data prepossessing | ||
- data transformation | ||
- data splitting | ||
- data validation | ||
|
||
|
||
** Getting Started | ||
:PROPERTIES: | ||
:CUSTOM_ID: getting-started | ||
:END: | ||
|
||
This is an example of how you may give instructions on setting up your | ||
project locally. To get a local copy up and running follow these simple | ||
example steps. | ||
|
||
*** Prerequisites | ||
:PROPERTIES: | ||
:CUSTOM_ID: prerequisites | ||
:END: | ||
|
||
- =pandas>=0.20.*= | ||
- =dvc>=2.12.*= | ||
- =pyyaml>=5= | ||
|
||
*** Installation | ||
:PROPERTIES: | ||
:CUSTOM_ID: installation | ||
:END: | ||
|
||
#+begin_src bash | ||
pip install git+https://github.com/MArpogaus/dvc-stage.git | ||
This package is available on [[https://pypi.org/project/dvc-stage/][PyPI]]. | ||
You install it and all of its dependencies using pip: | ||
|
||
#+begin_src bash :exports code | ||
pip install dvc-stage | ||
#+end_src | ||
|
||
** Usage | ||
:PROPERTIES: | ||
:CUSTOM_ID: usage | ||
:END: | ||
... | ||
|
||
DVC-Stage works ontop of two files: =dvc.yaml= and =params.yaml=. They | ||
are expected to be at the root of an initialized [[https://dvc.org/][dvc | ||
project]]. From there you can execute =dvc-stage -h= to see available | ||
commands or =dvc-stage get-config STAGE= to generate the dvc stages from | ||
the =params.yaml= file. The tool then generates the respective yaml | ||
which you can then manually paste into the =dvc.yaml= file. Existing | ||
stages can then be updated inplace using =dvc-stage update-stage STAGE=. | ||
|
||
Stages are defined inside =params.yaml= in the following schema: | ||
|
||
#+begin_src yaml | ||
STAGE_NAME: | ||
load: {} | ||
transformations: [] | ||
validations: [] | ||
write: {} | ||
#+end_src | ||
|
||
The =load= and =write= sections both require the yaml-keys =path= and | ||
=format= to read and save data respectively. | ||
|
||
The =transformations= and =validations= sections require a sequence of | ||
functions to apply, where =transformations= return data and | ||
=validations= return a truth value (derived from data). Functions are | ||
defined by the key =id= an can be either: | ||
|
||
- Methods defined on Pandas Dataframes, e.g. | ||
#+begin_src yaml | ||
transformations: | ||
- id: transpose | ||
#+end_src | ||
|
||
- Imported from any python module, e.g. | ||
#+begin_src yaml | ||
transformations: | ||
- id: custom | ||
description: duplikate rows | ||
import_from: demo.duplicate | ||
#+end_src | ||
|
||
- Predefined by DVC-Stage, e.g. | ||
#+begin_src yaml | ||
validations: | ||
- id: validate_pandera_schema | ||
schema: | ||
import_from: demo.get_schema | ||
#+end_src | ||
|
||
When writing a custom function, you need to make sure the function | ||
gracefully handles data being =None=, which is required for type | ||
inference. Data is passed as first argument. Further arguments can be | ||
provided as additional keys, as shown above for | ||
=validate_pandera_schema=, where schema is passed as second argument to | ||
the function. | ||
|
||
A working demonstration can be found at =examples/=. | ||
|
||
** Contributing | ||
|
||
Any Contributions are greatly appreciated! If you have a question, an issue or would like to contribute, please read our [[file:CONTRIBUTING.md][contributing guidelines]]. | ||
|
||
** License | ||
:PROPERTIES: | ||
:CUSTOM_ID: license | ||
:END: | ||
|
||
Distributed under the [[file:COPYING][GNU General Public License v3]] | ||
|
||
** Contact | ||
:PROPERTIES: | ||
:CUSTOM_ID: contact | ||
:END: | ||
[[https://github.com/MArpogaus/][Marcel Arpogaus]] - [[mailto:[email protected]][[email protected]]] | ||
|
||
[[https://github.com/MArpogaus/][Marcel Arpogaus]] - [[mailto:[email protected]][[email protected]]] (encrypted with [ROT13](https://rot13.com/)) | ||
|
||
Project Link: | ||
[[https://github.com/MArpogaus/dvc-stage]] | ||
|
||
** Acknowledgments | ||
:PROPERTIES: | ||
:CUSTOM_ID: acknowledgments | ||
:END: | ||
|
||
Parts of this work have been funded by the Federal Ministry for the Environment, Nature Conservation and Nuclear Safety due to a decision of the German Federal Parliament (AI4Grids: 67KI2012A). |