Skip to content

Commit

Permalink
📝 Add the series of tutorials and trainings
Browse files Browse the repository at this point in the history
  • Loading branch information
veit committed Jul 29, 2024
1 parent 110aed6 commit 98ad7be
Show file tree
Hide file tree
Showing 8 changed files with 120 additions and 13 deletions.
4 changes: 4 additions & 0 deletions docs/clean-prep/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@ we also use several small, specialised libraries like :doc:`dedupe
systems like `Great Expectations <https://greatexpectations.io/>`_ or `MobyDQ
<https://ubisoft.github.io/mobydq/>`_.

.. tip::
`cusy seminar: Cleanse and validate data with Python
<https://cusy.io/en/our-training-courses/cleanse-and-validate-data-with-python>`_

Overview
--------

Expand Down
4 changes: 4 additions & 0 deletions docs/data-processing/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,10 @@ three tools in more detail that make data accessible:
* :doc:`httpx/index`
* :doc:`intake/index`

.. tip::
`Read, write and provide data with Python
<https://cusy.io/en/our-training-courses/read-write-and-provide-data-with-python>`_

.. seealso::
`pandas I/O API <https://pandas.pydata.org/docs/user_guide/io.html>`_
The pandas I/O API is a set of top level ``reader`` functions that
Expand Down
82 changes: 80 additions & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,86 @@ This tutorial is not intended to be an introduction to Python or programming in
general; for that there is the :doc:`python-basics:index` tutorial. Instead, it
is intended to show the Python data science stack – libraries such as
:doc:`/workspace/ipython/index`, :doc:`/workspace/numpy/index`,
:doc:`/workspace/pandas/index`, :doc:`pyviz:matplotlib/index` and related tools
– so that you can subsequently effectively analyse and visualise your data.
:doc:`/workspace/pandas/index`, and related tools – so that you can
subsequently effectively analyse your data. We also offer the `Jupyter Tutorial
<https://jupyter-tutorial.readthedocs.io/en/latest/index.html>`_ and the `PyViz
Tutorial <https://pyviz-tutorial.readthedocs.io/de/latest/index.html>`_ as well
as the instructions for `data visualisation
<https://www.cusy.design/de/latest/viz/index.html>`_ from the `cusy Design
System <https://www.cusy.design/de/latest/index.html>`_.

All tutorials serve as seminar documents for our harmonised training courses:

+---------------+--------------------------------------------------------------+
| Duration | Topic |
+===============+==============================================================+
| 3 days | `Introduction to Python`_ |
+---------------+--------------------------------------------------------------+
| 2 days | `Advanced Python`_ |
+---------------+--------------------------------------------------------------+
| 2 days | `Design patterns in Python`_ |
+---------------+--------------------------------------------------------------+
| 2 days | `Efficient testing with Python`_ |
+---------------+--------------------------------------------------------------+
| 1 day | `Software documentation with Sphinx`_ |
+---------------+--------------------------------------------------------------+
| 2 days | `Technical writing`_ |
+---------------+--------------------------------------------------------------+
| 3 days | `Jupyter notebooks for efficient data science workflows`_ |
+---------------+--------------------------------------------------------------+
| 2 days | `Numerical calculations with NumPy`_ |
+---------------+--------------------------------------------------------------+
| 2 days | `Analysing data with pandas`_ |
+---------------+--------------------------------------------------------------+
| 3 days | `Read, write and provide data with Python`_ |
+---------------+--------------------------------------------------------------+
| 2 days | `Cleanse and validate data with Python`_ |
+---------------+--------------------------------------------------------------+
| 5 days | `Visualising data with Python`_ |
+---------------+--------------------------------------------------------------+
| 1 days | `Designing data visualisations`_ |
+---------------+--------------------------------------------------------------+
| 2 days | `Create dashboards`_ |
+---------------+--------------------------------------------------------------+
| 3 days | `Versioned and reproducible storage of code and data`_ |
+---------------+--------------------------------------------------------------+
| Subscription | `News from Python for data science`_ |
| of 2 hours | |
| per quarter | |
+---------------+--------------------------------------------------------------+

.. _`Introduction to Python`:
https://cusy.io/en/our-training-courses/introduction-to-python
.. _`Advanced Python`:
https://cusy.io/en/our-training-courses/advanced-python
.. _`Design patterns in Python`:
https://cusy.io/en/our-training-courses/design-patterns-in-python
.. _`Efficient testing with Python`:
https://cusy.io/en/our-training-courses/efficient-testing-with-python
.. _`Software documentation with Sphinx`:
https://cusy.io/en/our-training-courses/software-documentation-with-sphinx
.. _`Technical writing`:
https://cusy.io/en/our-training-courses/technical-writing
.. _`Jupyter notebooks for efficient data science workflows`:
https://cusy.io/en/our-training-courses/jupyter-notebooks-for-efficient-data-science-workflows
.. _`Numerical calculations with NumPy`:
https://cusy.io/en/our-training-courses/numerical-calculations-with-numpy
.. _`Analysing data with pandas`:
https://cusy.io/en/our-training-courses/analysing-data-with-pandas
.. _`Read, write and provide data with Python`:
https://cusy.io/en/our-training-courses/read-write-and-provide-data-with-python
.. _`Cleanse and validate data with Python`:
https://cusy.io/en/our-training-courses/cleanse-and-validate-data-with-python
.. _`Visualising data with Python`:
https://cusy.io/en/our-training-courses/visualising-data-with-python
.. _`Designing data visualisations`:
https://cusy.io/en/our-training-courses/designing-data-visualisations
.. _`Create dashboards`:
https://cusy.io/en/our-training-courses/create-dashboards
.. _`Versioned and reproducible storage of code and data`:
https://cusy.io/en/our-training-courses/versioned-and-reproducible-storage-of-code-and-data
.. _`News from Python for data science`:
https://cusy.io/en/our-training-courses/news-from-python-for-data-science

.. toctree::
:hidden:
Expand Down
27 changes: 16 additions & 11 deletions docs/productive/dvc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,21 +9,22 @@ For data analysis, and especially machine learning, it is extremely valuable to
be able to reproduce different versions of analyses that have been carried out
with different data sets and parameters. However, in order to obtain
reproducible analyses, both the data and the model (including the algorithms,
parameters, etc.) must be versioned. Versioning data for reproducible analysis
is a bigger problem than versioning models because of the size of the data.
Tools like `DVC <https://dvc.org/>`_ help manage data by allowing users to
transfer it to a remote data store using a :doc:`Git <../git/index>` like
workflow. This simplifies the retrieval of certain versions of data in order to
reproduce an analysis.

DVC was developed to be able to use ML models and data sets together and to
manage them in a comprehensible manner. It works with different version
managements, but does not need them. In contrast to `DataLad
parameters, :abbr:`etc. (et cetera)`) must be versioned. Versioning data for
reproducible analysis is a bigger problem than versioning models because of the
size of the data. Tools like `DVC <https://dvc.org/>`_ help manage data by
allowing users to transfer it to a remote data store using a :doc:`Git
<../git/index>` like workflow. This simplifies the retrieval of certain versions
of data in order to reproduce an analysis.

DVC was developed to be able to use :abbr:`ML (Machine Learning)` models and
data sets together and to manage them in a comprehensible manner. It works with
different version managements, but does not need them. In contrast to `DataLad
<https://www.datalad.org/>`_/`git-annex <https://git-annex.branchable.com/>`_,
for example, it is not limited to Git as version management, but can also be
used together with Mercurial, see `github.com/crobarcro/dvc/dvc/scm.py
<https://github.com/crobarcro/dvc/blob/master/dvc/scm.py>`_. It also uses its
own system for storing files with support for SSH and HDFS, among others.
own system for storing files with support for :abbr:`SSH /Secure Shell)` and
:abbr:`HDFS (Hadoop Distributed File System)`, among others.

DataLad, on the other hand, focuses more on discovering and consuming datasets,
which are then easily managed with Git. DVC, on the other hand, stores each step
Expand All @@ -35,6 +36,10 @@ visualizing DAGs, see, for example, :doc:`visualisation of DAGs

External dependencies can also be specified with :ref:`dvc remote <dvc-remote>`.

.. tip::
`Versioned and reproducible storage of code and data
<https://cusy.io/en/our-training-courses/versioned-and-reproducible-storage-of-code-and-data>`_

.. seealso::
* `Tutorial <https://dvc.org/doc/tutorial>`_
* `Documentation <https://dvc.org/doc>`_
Expand Down
4 changes: 4 additions & 0 deletions docs/productive/git/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,10 @@ local repository can contain specific changes.
However, Git can not only be used in a distributed way, it is also performant,
secure and flexible.

.. tip::
`Versioned and reproducible storage of code and data
<https://cusy.io/en/our-training-courses/versioned-and-reproducible-storage-of-code-and-data>`_

Performance
-----------

Expand Down
4 changes: 4 additions & 0 deletions docs/productive/qa/code-smells.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@ design of a programme. For example, the overuse of isinstance checks against
concrete classes is a code smell, as it makes the programme more difficult to
extend to deal with new types in the future.

.. tip::
`Design patterns in Python
<https://cusy.io/en/our-training-courses/design-patterns-in-python>`_

Recognising code smells
-----------------------

Expand Down
4 changes: 4 additions & 0 deletions docs/workspace/numpy/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,10 @@ the main functionality of NumPy:
array-oriented programming and thinking is an important step on the way to becoming a
data scientist.

.. tip::
`cusy seminar: Numerical calculations with NumPy
<https://cusy.io/en/our-training-courses/numerical-calculations-with-numpy>`_

.. seealso::
* `Home
<https://numpy.org/>`_
Expand Down
4 changes: 4 additions & 0 deletions docs/workspace/pandas/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,10 @@ Python code. Mostly pandas is used to
:doc:`/data-processing/serialisation-formats/json/index` data
* prepare machine learning

.. tip::
`Analysing data with pandas
<https://cusy.io/en/our-training-courses/analysing-data-with-pandas>`_

.. seealso::
* `Home
<https://pandas.pydata.org/>`_
Expand Down

0 comments on commit 98ad7be

Please sign in to comment.