Skip to content

Commit

Permalink
docs: update README
Browse files Browse the repository at this point in the history
  • Loading branch information
MArpogaus committed Sep 13, 2024
1 parent a8f49b4 commit 9b41fc1
Show file tree
Hide file tree
Showing 2 changed files with 159 additions and 89 deletions.
140 changes: 81 additions & 59 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,30 @@
![example workflow](https://github.com/MArpogaus/dvc-stage/actions/workflows/pre-commit.yml/badge.svg)
![example workflow](https://github.com/MArpogaus/dvc-stage/actions/workflows/run_demo.yaml/badge.svg)
![example workflow](https://github.com/MArpogaus/dvc-stage/actions/workflows/tox-gh.yaml/badge.svg)

[![img](https://img.shields.io/github/contributors/MArpogaus/dvc-stage.svg?style=flat-square)](https://github.com/MArpogaus/dvc-stage/graphs/contributors)
[![img](https://img.shields.io/github/forks/MArpogaus/dvc-stage.svg?style=flat-square)](https://github.com/MArpogaus/dvc-stage/network/members)
[![img](https://img.shields.io/github/stars/MArpogaus/dvc-stage.svg?style=flat-square)](https://github.com/MArpogaus/dvc-stage/stargazers)
[![img](https://img.shields.io/github/issues/MArpogaus/dvc-stage.svg?style=flat-square)](https://github.com/MArpogaus/dvc-stage/issues)
[![img](https://img.shields.io/github/license/MArpogaus/dvc-stage.svg?style=flat-square)](https://github.com/MArpogaus/dvc-stage/blob/master/COPYING)
[![img](https://img.shields.io/github/license/MArpogaus/dvc-stage.svg?style=flat-square)](https://github.com/MArpogaus/dvc-stage/blob/main/LICENSE)
[![img](https://img.shields.io/github/actions/workflow/status/MArpogaus/dvc-stage/test.yaml.svg?label=test&style=flat-square)](https://github.com/MArpogaus/dvc-stage/actions/workflows/test.yaml)
[![img](https://img.shields.io/github/actions/workflow/status/MArpogaus/dvc-stage/release.yaml.svg?label=release&style=flat-square)](https://github.com/MArpogaus/dvc-stage/actions/workflows/release.yaml)
[![img](https://img.shields.io/badge/pre--commit-enabled-brightgreen.svg?logo=pre-commit&style=flat-square)](https://github.com/MArpogaus/dvc-stage/blob/main/.pre-commit-config.yaml)
[![img](https://img.shields.io/badge/-LinkedIn-black.svg?style=flat-square&logo=linkedin&colorB=555)](https://linkedin.com/in/MArpogaus)

[![img](https://img.shields.io/pypi/v/dvc-stage.svg?style=flat-square)](https://pypi.org/project/dvc-stage)


# DVC-Stage

1. [About The Project](#about-the-project)
2. [Getting Started](#getting-started)
1. [Prerequisites](#prerequisites)
2. [Installation](#installation)
3. [Usage](#usage)
4. [License](#license)
5. [Contact](#contact)
6. [Acknowledgments](#acknowledgments)
1. [About The Project](#org2ce5616)
2. [Getting Started](#orgd5c9d2a)
1. [Prerequisites](#orgb6c61f1)
2. [Installation](#org4a4a796)
3. [Usage](#org7523cf7)
4. [Contributing](#org06cba36)
5. [License](#org88720e8)
6. [Contact](#orgbac8fb8)
7. [Acknowledgments](#orgd27aafc)


<a id="about-the-project"></a>
<a id="org2ce5616"></a>

## About The Project

Expand All @@ -34,7 +36,7 @@ This python script provides a easy and parameterizeable way of defining typical
- data validation


<a id="getting-started"></a>
<a id="orgd5c9d2a"></a>

## Getting Started

Expand All @@ -43,7 +45,7 @@ project locally. To get a local copy up and running follow these simple
example steps.


<a id="prerequisites"></a>
<a id="orgb6c61f1"></a>

### Prerequisites

Expand All @@ -52,78 +54,98 @@ example steps.
- `pyyaml>=5`


<a id="installation"></a>
<a id="org4a4a796"></a>

### Installation

pip install git+https://github.com/MArpogaus/dvc-stage.git
This package is available on [PyPI](https://pypi.org/project/dvc-stage/).
You install it and all of its dependencies using pip:

pip install dvc-stage

<a id="usage"></a>

<a id="org7523cf7"></a>

## Usage

DVC-Stage works ontop of two files: `dvc.yaml` and `params.yaml`.
They are expected to be at the root of an initialized [dvc project](https://dvc.org/).
From there you can execute `dvc-stage -h` to see available commands or `dvc-stage get-config STAGE` to generate the dvc stages from the `params.yaml` file. The tool then generates the respective yaml which you can then manually paste into the `dvc.yaml` file. Existing stages can then be updated inplace using `dvc-stage update-stage STAGE`.
DVC-Stage works ontop of two files: `dvc.yaml` and `params.yaml`. They
are expected to be at the root of an initialized [dvc
project](https://dvc.org/). From there you can execute `dvc-stage -h` to see available
commands or `dvc-stage get-config STAGE` to generate the dvc stages from
the `params.yaml` file. The tool then generates the respective yaml
which you can then manually paste into the `dvc.yaml` file. Existing
stages can then be updated inplace using `dvc-stage update-stage STAGE`.

Stages are defined inside `params.yaml` in the following schema:
```yaml
STAGE_NAME:
load: {}
transformations: []
validations: []
write: {}
```
The `load` and `write` sections both require the yaml-keys `path` and `format` to read and save data respectively.

The `transformations` and `validations` sections require a sequence of functions to apply, where `transformations` return data and `validations` return a truth value (derived from data).
Functions are defined by the key `id` an can be either:
- Methods defined on Pandas Dataframes, e.g.
```yaml
transformations:
- id: transpose
```
- Imported from any python module, e.g.
```yaml
transformations:
- id: custom
description: duplikate rows
import_from: demo.duplicate
```
- Predefined by DVC-Stage, e.g.
```yaml
validations:
- id: validate_pandera_schema
schema:
import_from: demo.get_schema
```

When writing a custom function, you need to make sure the function gracefully handles data being `None`, which is required for type inference. Data is passed as first argument. Further arguments can be provided as additional keys, as shown above for `validate_pandera_schema`, where schema is passed as second argument to the function.

STAGE_NAME:
load: {}
transformations: []
validations: []
write: {}

The `load` and `write` sections both require the yaml-keys `path` and
`format` to read and save data respectively.

The `transformations` and `validations` sections require a sequence of
functions to apply, where `transformations` return data and
`validations` return a truth value (derived from data). Functions are
defined by the key `id` an can be either:

- Methods defined on Pandas Dataframes, e.g.

transformations:
- id: transpose

- Imported from any python module, e.g.

transformations:
- id: custom
description: duplikate rows
import_from: demo.duplicate

- Predefined by DVC-Stage, e.g.

validations:
- id: validate_pandera_schema
schema:
import_from: demo.get_schema

When writing a custom function, you need to make sure the function
gracefully handles data being `None`, which is required for type
inference. Data is passed as first argument. Further arguments can be
provided as additional keys, as shown above for
`validate_pandera_schema`, where schema is passed as second argument to
the function.

A working demonstration can be found at `examples/`.


<a id="license"></a>
<a id="org06cba36"></a>

## Contributing

Any Contributions are greatly appreciated! If you have a question, an issue or would like to contribute, please read our [contributing guidelines](CONTRIBUTING.md).


<a id="org88720e8"></a>

## License

Distributed under the [GNU General Public License v3](COPYING)


<a id="contact"></a>
<a id="orgbac8fb8"></a>

## Contact

[Marcel Arpogaus](https://github.com/MArpogaus/) - [[email protected]](mailto:marcel.arpogaus@gmail.com)
[Marcel Arpogaus](https://github.com/MArpogaus/) - [[email protected]](mailto:[email protected]) (encrypted with [ROT13](<https://rot13.com/>))

Project Link:
<https://github.com/MArpogaus/dvc-stage>


<a id="acknowledgments"></a>
<a id="orgd27aafc"></a>

## Acknowledgments

Expand Down
108 changes: 78 additions & 30 deletions README.org
Original file line number Diff line number Diff line change
Expand Up @@ -5,75 +5,123 @@
[[https://github.com/MArpogaus/dvc-stage/network/members][https://img.shields.io/github/forks/MArpogaus/dvc-stage.svg?style=flat-square]]
[[https://github.com/MArpogaus/dvc-stage/stargazers][https://img.shields.io/github/stars/MArpogaus/dvc-stage.svg?style=flat-square]]
[[https://github.com/MArpogaus/dvc-stage/issues][https://img.shields.io/github/issues/MArpogaus/dvc-stage.svg?style=flat-square]]
[[https://github.com/MArpogaus/dvc-stage/blob/master/COPYING][https://img.shields.io/github/license/MArpogaus/dvc-stage.svg?style=flat-square]]
[[https://github.com/MArpogaus/dvc-stage/blob/main/LICENSE][https://img.shields.io/github/license/MArpogaus/dvc-stage.svg?style=flat-square]]
[[https://github.com/MArpogaus/dvc-stage/actions/workflows/test.yaml][https://img.shields.io/github/actions/workflow/status/MArpogaus/dvc-stage/test.yaml.svg?label=test&style=flat-square]]
[[https://github.com/MArpogaus/dvc-stage/actions/workflows/release.yaml][https://img.shields.io/github/actions/workflow/status/MArpogaus/dvc-stage/release.yaml.svg?label=release&style=flat-square]]
[[https://github.com/MArpogaus/dvc-stage/blob/main/.pre-commit-config.yaml][https://img.shields.io/badge/pre--commit-enabled-brightgreen.svg?logo=pre-commit&style=flat-square]]
[[https://linkedin.com/in/MArpogaus][https://img.shields.io/badge/-LinkedIn-black.svg?style=flat-square&logo=linkedin&colorB=555]]

[[https://pypi.org/project/dvc-stage][https://img.shields.io/pypi/v/dvc-stage.svg?style=flat-square]]

* DVC-Stage

#+TOC: headlines 2 local

** About The Project
:PROPERTIES:
:CUSTOM_ID: about-the-project
:END:

This python script provides a easy and parameterizeable way of defining typical dvc stages for:
This python script provides a easy and parameterizeable way of defining typical dvc (sub-)stages for:

- data prepossessing
- data transformation
- data splitting
- data validation


** Getting Started
:PROPERTIES:
:CUSTOM_ID: getting-started
:END:

This is an example of how you may give instructions on setting up your
project locally. To get a local copy up and running follow these simple
example steps.

*** Prerequisites
:PROPERTIES:
:CUSTOM_ID: prerequisites
:END:

- =pandas>=0.20.*=
- =dvc>=2.12.*=
- =pyyaml>=5=

*** Installation
:PROPERTIES:
:CUSTOM_ID: installation
:END:

#+begin_src bash
pip install git+https://github.com/MArpogaus/dvc-stage.git
This package is available on [[https://pypi.org/project/dvc-stage/][PyPI]].
You install it and all of its dependencies using pip:

#+begin_src bash :exports code
pip install dvc-stage
#+end_src

** Usage
:PROPERTIES:
:CUSTOM_ID: usage
:END:
...

DVC-Stage works ontop of two files: =dvc.yaml= and =params.yaml=. They
are expected to be at the root of an initialized [[https://dvc.org/][dvc
project]]. From there you can execute =dvc-stage -h= to see available
commands or =dvc-stage get-config STAGE= to generate the dvc stages from
the =params.yaml= file. The tool then generates the respective yaml
which you can then manually paste into the =dvc.yaml= file. Existing
stages can then be updated inplace using =dvc-stage update-stage STAGE=.

Stages are defined inside =params.yaml= in the following schema:

#+begin_src yaml
STAGE_NAME:
load: {}
transformations: []
validations: []
write: {}
#+end_src

The =load= and =write= sections both require the yaml-keys =path= and
=format= to read and save data respectively.

The =transformations= and =validations= sections require a sequence of
functions to apply, where =transformations= return data and
=validations= return a truth value (derived from data). Functions are
defined by the key =id= an can be either:

- Methods defined on Pandas Dataframes, e.g.
#+begin_src yaml
transformations:
- id: transpose
#+end_src

- Imported from any python module, e.g.
#+begin_src yaml
transformations:
- id: custom
description: duplikate rows
import_from: demo.duplicate
#+end_src

- Predefined by DVC-Stage, e.g.
#+begin_src yaml
validations:
- id: validate_pandera_schema
schema:
import_from: demo.get_schema
#+end_src

When writing a custom function, you need to make sure the function
gracefully handles data being =None=, which is required for type
inference. Data is passed as first argument. Further arguments can be
provided as additional keys, as shown above for
=validate_pandera_schema=, where schema is passed as second argument to
the function.

A working demonstration can be found at =examples/=.

** Contributing

Any Contributions are greatly appreciated! If you have a question, an issue or would like to contribute, please read our [[file:CONTRIBUTING.md][contributing guidelines]].

** License
:PROPERTIES:
:CUSTOM_ID: license
:END:

Distributed under the [[file:COPYING][GNU General Public License v3]]

** Contact
:PROPERTIES:
:CUSTOM_ID: contact
:END:
[[https://github.com/MArpogaus/][Marcel Arpogaus]] - [[mailto:[email protected]][[email protected]]]

[[https://github.com/MArpogaus/][Marcel Arpogaus]] - [[mailto:[email protected]][[email protected]]] (encrypted with [ROT13](https://rot13.com/))

Project Link:
[[https://github.com/MArpogaus/dvc-stage]]

** Acknowledgments
:PROPERTIES:
:CUSTOM_ID: acknowledgments
:END:

Parts of this work have been funded by the Federal Ministry for the Environment, Nature Conservation and Nuclear Safety due to a decision of the German Federal Parliament (AI4Grids: 67KI2012A).

0 comments on commit 9b41fc1

Please sign in to comment.