Skip to content

Commit

Permalink
Merge pull request #863 from openml/develop
Browse files Browse the repository at this point in the history
Release OpenML 0.10.1
  • Loading branch information
mfeurer committed Nov 5, 2019
2 parents 0f36642 + 34d54d9 commit 949515f
Show file tree
Hide file tree
Showing 76 changed files with 3,154 additions and 1,059 deletions.
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
[![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)

A python interface for [OpenML](http://openml.org). You can find the documentation on the [openml-python website](https://openml.github.io/openml-python).

Please commit to the right branches following the gitflow pattern:
http://nvie.com/posts/a-successful-git-branching-model/
A python interface for [OpenML](http://openml.org), an online platform for open science collaboration in machine learning.
It can be used to download or upload OpenML data such as datasets and machine learning experiment results.
You can find the documentation on the [openml-python website](https://openml.github.io/openml-python).
If you wish to contribute to the package, please see our [contribution guidelines](https://github.com/openml/openml-python/blob/develop/CONTRIBUTING.md).

Master branch:

[![Build Status](https://travis-ci.org/openml/openml-python.svg?branch=master)](https://travis-ci.org/openml/openml-python)
[![Code Health](https://landscape.io/github/openml/openml-python/master/landscape.svg)](https://landscape.io/github/openml/openml-python/master)
[![Build status](https://ci.appveyor.com/api/projects/status/blna1eip00kdyr25?svg=true)](https://ci.appveyor.com/project/OpenML/openml-python)
[![Coverage Status](https://coveralls.io/repos/github/openml/openml-python/badge.svg?branch=master)](https://coveralls.io/github/openml/openml-python?branch=master)

Development branch:

[![Build Status](https://travis-ci.org/openml/openml-python.svg?branch=develop)](https://travis-ci.org/openml/openml-python)
[![Code Health](https://landscape.io/github/openml/openml-python/master/landscape.svg)](https://landscape.io/github/openml/openml-python/master)
[![Build status](https://ci.appveyor.com/api/projects/status/blna1eip00kdyr25/branch/develop?svg=true)](https://ci.appveyor.com/project/OpenML/openml-python/branch/develop)
[![Coverage Status](https://coveralls.io/repos/github/openml/openml-python/badge.svg?branch=develop)](https://coveralls.io/github/openml/openml-python?branch=develop)
2 changes: 1 addition & 1 deletion appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,4 +43,4 @@ build: false

test_script:
- "cd C:\\projects\\openml-python"
- "%CMD_IN_ENV% pytest -n 4 --timeout=600 --timeout-method=thread -sv --ignore='test_OpenMLDemo.py'"
- "%CMD_IN_ENV% pytest -n 4 --timeout=600 --timeout-method=thread -sv"
9 changes: 6 additions & 3 deletions ci_scripts/install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -36,12 +36,13 @@ pip install -e '.[test]'
python -c "import numpy; print('numpy %s' % numpy.__version__)"
python -c "import scipy; print('scipy %s' % scipy.__version__)"

if [[ "$EXAMPLES" == "true" ]]; then
pip install -e '.[examples]'
fi
if [[ "$DOCTEST" == "true" ]]; then
pip install sphinx_bootstrap_theme
fi
if [[ "$DOCPUSH" == "true" ]]; then
conda install --yes gxx_linux-64 gcc_linux-64 swig
pip install -e '.[examples,examples_unix]'
fi
if [[ "$COVERAGE" == "true" ]]; then
pip install codecov pytest-cov
fi
Expand All @@ -52,3 +53,5 @@ fi
# Install scikit-learn last to make sure the openml package installation works
# from a clean environment without scikit-learn.
pip install scikit-learn==$SKLEARN_VERSION

conda list
2 changes: 1 addition & 1 deletion ci_scripts/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ run_tests() {
PYTEST_ARGS=''
fi

pytest -n 4 --durations=20 --timeout=600 --timeout-method=thread -sv --ignore='test_OpenMLDemo.py' $PYTEST_ARGS $test_dir
pytest -n 4 --durations=20 --timeout=600 --timeout-method=thread -sv $PYTEST_ARGS $test_dir
}

if [[ "$RUN_FLAKE8" == "true" ]]; then
Expand Down
1 change: 1 addition & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ Modules

list_evaluations
list_evaluation_measures
list_evaluations_setups

:mod:`openml.flows`: Flow Functions
-----------------------------------
Expand Down
74 changes: 68 additions & 6 deletions doc/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,20 +21,20 @@ you can use github's assign feature, otherwise you can just leave a comment.
Scope of the package
====================

The scope of the OpenML python package is to provide a python interface to
the OpenML platform which integrates well with pythons scientific stack, most
The scope of the OpenML Python package is to provide a Python interface to
the OpenML platform which integrates well with Python's scientific stack, most
notably `numpy <http://www.numpy.org/>`_ and `scipy <https://www.scipy.org/>`_.
To reduce opportunity costs and demonstrate the usage of the package, it also
implements an interface to the most popular machine learning package written
in python, `scikit-learn <http://scikit-learn.org/stable/index.html>`_.
in Python, `scikit-learn <http://scikit-learn.org/stable/index.html>`_.
Thereby it will automatically be compatible with many machine learning
libraries written in Python.

We aim to keep the package as light-weight as possible and we will try to
keep the number of potential installation dependencies as low as possible.
Therefore, the connection to other machine learning libraries such as
*pytorch*, *keras* or *tensorflow* should not be done directly inside this
package, but in a separate package using the OpenML python connector.
package, but in a separate package using the OpenML Python connector.

.. _issues:

Expand All @@ -52,7 +52,7 @@ contains longer-term goals.
How to contribute
=================

There are many ways to contribute to the development of the OpenML python
There are many ways to contribute to the development of the OpenML Python
connector and OpenML in general. We welcome all kinds of contributions,
especially:

Expand Down Expand Up @@ -158,5 +158,67 @@ Happy testing!
Connecting new machine learning libraries
=========================================

Coming soon - please stay tuned!
Content of the Library
~~~~~~~~~~~~~~~~~~~~~~

To leverage support from the community and to tap in the potential of OpenML, interfacing
with popular machine learning libraries is essential. However, the OpenML-Python team does
not have the capacity to develop and maintain such interfaces on its own. For this, we
have built an extension interface to allows others to contribute back. Building a suitable
extension for therefore requires an understanding of the current OpenML-Python support.

`This example <examples/flows_and_runs_tutorial.html>`_
shows how scikit-learn currently works with OpenML-Python as an extension. The *sklearn*
extension packaged with the `openml-python <https://github.com/openml/openml-python>`_
repository can be used as a template/benchmark to build the new extension.


API
+++
* The extension scripts must import the `openml` package and be able to interface with
any function from the OpenML-Python `API <api.html>`_.
* The extension has to be defined as a Python class and must inherit from
:class:`openml.extensions.Extension`.
* This class needs to have all the functions from `class Extension` overloaded as required.
* The redefined functions should have adequate and appropriate docstrings. The
`Sklearn Extension API :class:`openml.extensions.sklearn.SklearnExtension.html`
is a good benchmark to follow.


Interfacing with OpenML-Python
++++++++++++++++++++++++++++++
Once the new extension class has been defined, the openml-python module to
:meth:`openml.extensions.register_extension.html` must be called to allow OpenML-Python to
interface the new extension.


Hosting the library
~~~~~~~~~~~~~~~~~~~

Each extension created should be a stand-alone repository, compatible with the
`OpenML-Python repository <https://github.com/openml/openml-python>`_.
The extension repository should work off-the-shelf with *OpenML-Python* installed.

Create a `public Github repo <https://help.github.com/en/articles/create-a-repo>`_ with
the following directory structure:

::

| [repo name]
| |-- [extension name]
| | |-- __init__.py
| | |-- extension.py
| | |-- config.py (optionally)


Recommended
~~~~~~~~~~~
* Test cases to keep the extension up to date with the `openml-python` upstream changes.
* Documentation of the extension API, especially if any new functionality added to OpenML-Python's
extension design.
* Examples to show how the new extension interfaces and works with OpenML-Python.
* Create a PR to add the new extension to the OpenML-Python API documentation.


Happy contributing!
2 changes: 1 addition & 1 deletion doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Example
# Publish the experiment on OpenML (optional, requires an API key.
# You can get your own API key by signing up to OpenML.org)
run.publish()
print('View the run online: %s/run/%d' % (openml.config.server, run.run_id))
print(f'View the run online: {openml.config.server}/run/{run.run_id}')
You can find more examples in our `examples gallery <examples/index.html>`_.

Expand Down
50 changes: 50 additions & 0 deletions doc/progress.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,57 @@
Changelog
=========

0.10.1
~~~~~~
* ADD #175: Automatically adds the docstring of scikit-learn objects to flow and its parameters.
* ADD #737: New evaluation listing call that includes the hyperparameter settings.
* ADD #744: It is now possible to only issue a warning and not raise an exception if the package
versions for a flow are not met when deserializing it.
* ADD #783: The URL to download the predictions for a run is now stored in the run object.
* ADD #790: Adds the uploader name and id as new filtering options for ``list_evaluations``.
* ADD #792: New convenience function ``openml.flow.get_flow_id``.
* ADD #861: Debug-level log information now being written to a file in the cache directory (at most 2 MB).
* DOC #778: Introduces instructions on how to publish an extension to support other libraries
than scikit-learn.
* DOC #785: The examples section is completely restructured into simple simple examples, advanced
examples and examples showcasing the use of OpenML-Python to reproduce papers which were done
with OpenML-Python.
* DOC #788: New example on manually iterating through the split of a task.
* DOC #789: Improve the usage of dataframes in the examples.
* DOC #791: New example for the paper *Efficient and Robust Automated Machine Learning* by Feurer
et al. (2015).
* DOC #803: New example for the paper *Don’t Rule Out Simple Models Prematurely:
A Large Scale Benchmark Comparing Linear and Non-linear Classifiers in OpenML* by Benjamin
Strang et al. (2018).
* DOC #808: New example demonstrating basic use cases of a dataset.
* DOC #810: New example demonstrating the use of benchmarking studies and suites.
* DOC #832: New example for the paper *Scalable Hyperparameter Transfer Learning* by
Valerio Perrone et al. (2019)
* DOC #834: New example showing how to plot the loss surface for a support vector machine.
* FIX #305: Do not require the external version in the flow XML when loading an object.
* FIX #734: Better handling of *"old"* flows.
* FIX #736: Attach a StreamHandler to the openml logger instead of the root logger.
* FIX #758: Fixes an error which made the client API crash when loading a sparse data with
categorical variables.
* FIX #779: Do not fail on corrupt pickle
* FIX #782: Assign the study id to the correct class attribute.
* FIX #819: Automatically convert column names to type string when uploading a dataset.
* FIX #820: Make ``__repr__`` work for datasets which do not have an id.
* MAINT #796: Rename an argument to make the function ``list_evaluations`` more consistent.
* MAINT #811: Print the full error message given by the server.
* MAINT #828: Create base class for OpenML entity classes.
* MAINT #829: Reduce the number of data conversion warnings.
* MAINT #831: Warn if there's an empty flow description when publishing a flow.
* MAINT #837: Also print the flow XML if a flow fails to validate.
* FIX #838: Fix list_evaluations_setups to work when evaluations are not a 100 multiple.
* FIX #847: Fixes an issue where the client API would crash when trying to download a dataset
when there are no qualities available on the server.
* MAINT #849: Move logic of most different ``publish`` functions into the base class.
* MAINt #850: Remove outdated test code.

0.10.0
~~~~~~

* ADD #737: Add list_evaluations_setups to return hyperparameters along with list of evaluations.
* FIX #261: Test server is cleared of all files uploaded during unit testing.
* FIX #447: All files created by unit tests no longer persist in local.
Expand All @@ -25,6 +74,7 @@ Changelog
* ADD #412: The scikit-learn extension populates the short name field for flows.
* MAINT #726: Update examples to remove deprecation warnings from scikit-learn
* MAINT #752: Update OpenML-Python to be compatible with sklearn 0.21
* ADD #790: Add user ID and name to list_evaluations


0.9.0
Expand Down
10 changes: 5 additions & 5 deletions doc/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,11 @@ Installation & Set up
~~~~~~~~~~~~~~~~~~~~~~

The OpenML Python package is a connector to `OpenML <https://www.openml.org/>`_.
It allows to use and share datasets and tasks, run
It allows you to use and share datasets and tasks, run
machine learning algorithms on them and then share the results online.

The following tutorial gives a short introduction on how to install and set up
the OpenML python connector, followed up by a simple example.
the OpenML Python connector, followed up by a simple example.

* `Introduction <examples/introduction_tutorial.html>`_

Expand All @@ -52,7 +52,7 @@ Working with tasks
~~~~~~~~~~~~~~~~~~

You can think of a task as an experimentation protocol, describing how to apply
a machine learning model to a dataset in a way that it is comparable with the
a machine learning model to a dataset in a way that is comparable with the
results of others (more on how to do that further down). Tasks are containers,
defining which dataset to use, what kind of task we're solving (regression,
classification, clustering, etc...) and which column to predict. Furthermore,
Expand Down Expand Up @@ -86,7 +86,7 @@ predictions of that run. When a run is uploaded to the server, the server
automatically calculates several metrics which can be used to compare the
performance of different flows to each other.

So far, the OpenML python connector works only with estimator objects following
So far, the OpenML Python connector works only with estimator objects following
the `scikit-learn estimator API <http://scikit-learn.org/dev/developers/contributing.html#apis-of-scikit-learn-objects>`_.
Those can be directly run on a task, and a flow will automatically be created or
downloaded from the server if it already exists.
Expand Down Expand Up @@ -114,7 +114,7 @@ requirements and how to download a dataset:
OpenML is about sharing machine learning results and the datasets they were
obtained on. Learn how to share your datasets in the following tutorial:

* `Upload a dataset <examples/create_upload_tutorial.html>`_
* `Upload a dataset <examples/30_extended/create_upload_tutorial.html>`_

~~~~~~~~~~~~~~~~~~~~~~~
Extending OpenML-Python
Expand Down
4 changes: 4 additions & 0 deletions examples/20_basic/README.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Introductory Examples
=====================

Introductory examples to the usage of the OpenML python connector.
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
"""
Introduction
============
Setup
=====
An introduction to OpenML, followed up by a simple example.
An example how to set up OpenML-Python followed up by a simple example.
"""
############################################################################
# OpenML is an online collaboration platform for machine learning which allows
Expand Down Expand Up @@ -61,7 +61,7 @@
openml.config.start_using_configuration_for_example()

############################################################################
# When using the main server, instead make sure your apikey is configured.
# When using the main server instead, make sure your apikey is configured.
# This can be done with the following line of code (uncomment it!).
# Never share your apikey with others.

Expand Down Expand Up @@ -96,7 +96,7 @@
# For this tutorial, our configuration publishes to the test server
# as to not crowd the main server with runs created by examples.
myrun = run.publish()
print("kNN on %s: http://test.openml.org/r/%d" % (data.name, myrun.run_id))
print(f"kNN on {data.name}: http://test.openml.org/r/{myrun.run_id}")

############################################################################
openml.config.stop_using_configuration_for_example()
68 changes: 68 additions & 0 deletions examples/20_basic/simple_datasets_tutorial.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
"""
========
Datasets
========
A basic tutorial on how to list, load and visualize datasets.
"""
############################################################################
# In general, we recommend working with tasks, so that the results can
# be easily reproduced. Furthermore, the results can be compared to existing results
# at OpenML. However, for the purposes of this tutorial, we are going to work with
# the datasets directly.

import openml
############################################################################
# List datasets
# =============

datasets_df = openml.datasets.list_datasets(output_format='dataframe')
print(datasets_df.head(n=10))

############################################################################
# Download a dataset
# ==================

# Iris dataset https://www.openml.org/d/61
dataset = openml.datasets.get_dataset(61)

# Print a summary
print(f"This is dataset '{dataset.name}', the target feature is "
f"'{dataset.default_target_attribute}'")
print(f"URL: {dataset.url}")
print(dataset.description[:500])

############################################################################
# Load a dataset
# ==============

# X - An array/dataframe where each row represents one example with
# the corresponding feature values.
# y - the classes for each example
# categorical_indicator - an array that indicates which feature is categorical
# attribute_names - the names of the features for the examples (X) and
# target feature (y)
X, y, categorical_indicator, attribute_names = dataset.get_data(
dataset_format='dataframe',
target=dataset.default_target_attribute
)
############################################################################
# Visualize the dataset
# =====================

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_style("darkgrid")


def hide_current_axis(*args, **kwds):
plt.gca().set_visible(False)


# We combine all the data so that we can map the different
# examples to different colors according to the classes.
combined_data = pd.concat([X, y], axis=1)
iris_plot = sns.pairplot(combined_data, hue="class")
iris_plot.map_upper(hide_current_axis)
plt.show()
Loading

0 comments on commit 949515f

Please sign in to comment.