Skip to content

Commit

Permalink
deploy: 6148721
Browse files Browse the repository at this point in the history
  • Loading branch information
ErikBavenstrand committed Jul 4, 2024
0 parents commit 6749d6d
Show file tree
Hide file tree
Showing 293 changed files with 70,917 additions and 0 deletions.
4 changes: 4 additions & 0 deletions .buildinfo
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 195918d839e8cbd73b4d11462f31364e
tags: 645f666f9bcd5a90fca523b33c5a78b7
Binary file added .doctrees/autoapi/index.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/autoapi/mleko/cache/index.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/autoapi/mleko/dataset/index.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/autoapi/mleko/index.doctree
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/autoapi/mleko/model/index.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/autoapi/mleko/model/tune/index.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/autoapi/mleko/pipeline/index.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/autoapi/mleko/utils/index.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/changelog.doctree
Binary file not shown.
Binary file added .doctrees/contributing.doctree
Binary file not shown.
Binary file added .doctrees/environment.pickle
Binary file not shown.
Binary file added .doctrees/index.doctree
Binary file not shown.
Binary file added .doctrees/license.doctree
Binary file not shown.
Binary file added .doctrees/usage.doctree
Binary file not shown.
Empty file added .nojekyll
Empty file.
11 changes: 11 additions & 0 deletions _sources/autoapi/index.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
API Reference
=============

This page contains auto-generated API reference documentation [#f1]_.

.. toctree::
:titlesonly:

/autoapi/mleko/index

.. [#f1] Created with `sphinx-autoapi <https://github.com/readthedocs/sphinx-autoapi>`_
225 changes: 225 additions & 0 deletions _sources/autoapi/mleko/cache/cache_mixin/index.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
:py:mod:`mleko.cache.cache_mixin`
=================================

.. py:module:: mleko.cache.cache_mixin
.. autoapi-nested-parse::

This module contains the basic `CacheMixin` class for caching the results of method calls.

This class can be used as a mixin to add caching functionality to a class. It provides the basic
functionality for caching the results of method calls based on user-defined cache keys and fingerprints.

Combining this class with the format mixins can be used to add support for caching different data
formats, such as Vaex DataFrames in Arrow format.



Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

mleko.cache.cache_mixin.CacheMixin



Functions
~~~~~~~~~

.. autoapisummary::

mleko.cache.cache_mixin.get_qualified_name_from_frame
mleko.cache.cache_mixin.get_qualified_name_of_caller



Attributes
~~~~~~~~~~

.. autoapisummary::

mleko.cache.cache_mixin.logger


.. py:data:: logger
A module-level logger instance.

.. py:function:: get_qualified_name_from_frame(frame: inspect.FrameInfo) -> str
Gets the fully qualified name of the function or method associated with the provided frame.

:param frame: A `FrameInfo` object containing the information of the function or method call.

:returns: A string representing the fully qualified name, in the format "module.class.method" for class methods or
"module.function" for functions.


.. py:function:: get_qualified_name_of_caller(frame_depth: int) -> str
Gets the fully qualified name of the calling function or method.

The fully qualified name is in the format "module.class.method" for class methods or "module.function" for
functions.

:param frame_depth: The depth of the frame to inspect. The default value is 2, which is the frame of the calling
function or method. For each nested function or method, the frame depth should be increased by 1.

:returns: A string representing the fully qualified name of the calling function or method.


.. py:class:: CacheMixin(cache_directory: str | pathlib.Path, disable_cache: bool)
A mixin class for caching the results of method calls based on user-defined cache keys and fingerprints.

The basic functionality of this class is to cache the results of method calls based on user-defined cache keys and
fingerprints. The cache keys can be a mix of hashable values and tuples containing a value and a BaseFingerprinter
instance for generating fingerprints. The `CacheMixin` class will save cache files in the specified cache directory
using the cache key as the filename and the cache file suffix as the file extension. The cache files will be saved
in the cache directory as pickle files.

.. warning::

This class maintains an ever-growing cache, which means that the cache size may increase indefinitely
with new method calls, possibly consuming a large amount of disk space. It does not implement any
cache eviction strategy. It is recommended to either clear the cache manually when needed or
use the LRUCacheMixin class, which extends this class to provide an LRU cache mechanism with
eviction of least recently used cache entries based on a specified maximum number of cache entries.

Initializes the `CacheMixin` with the provided cache directory.

.. note:: The cache directory will be created if it does not exist.

:param cache_directory: The directory where cache files will be stored.
:param disable_cache: Whether to disable the cache.

.. rubric:: Examples

>>> from mleko.cache.cache_mixin import CacheMixin
>>> class MyClass(CacheMixin):
... def __init__(self):
... super().__init__(".cache", "pkl")
...
... def my_method(self, x):
... return self._cached_execute(lambda: x ** 2, [x])
...
>>> my_class = MyClass()
>>> my_class.my_method(2)
4 # This will be computed and cached
>>> my_class.my_method(2)
4 # This will be loaded from the cache
>>> my_class.my_method(3)
9 # This will be recomputed and cached

.. py:method:: _cached_execute(lambda_func: Callable[[], Any], cache_key_inputs: list[Hashable | tuple[Any, mleko.cache.fingerprinters.base_fingerprinter.BaseFingerprinter]], cache_group: str | None = None, force_recompute: bool = False, cache_handlers: mleko.cache.handlers.CacheHandler | list[mleko.cache.handlers.CacheHandler] | None = None, disable_cache: bool = False) -> Any
Executes the given function, caching the results based on the provided cache keys and fingerprints.

.. warning::

The cache group is used to group related cache keys together to prevent collisions between cache keys
originating from the same method. For example, if a method is called during the training and testing
phases of a machine learning pipeline, the cache keys for the training and testing phases should be
using different cache groups to prevent collisions between the cache keys for the two phases. Otherwise,
the later cache keys might overwrite the earlier cache entries.

:param lambda_func: A lambda function to execute.
:param cache_key_inputs: A list of cache keys that can be a mix of hashable values and tuples containing
a value and a BaseFingerprinter instance for generating fingerprints.
:param cache_group: A string representing the cache group, used to group related cache keys together when methods
are called independently.
:param force_recompute: A boolean indicating whether to force recompute the result and update the cache, even if a
cached result is available.
:param cache_handlers: A CacheHandler instance or a list of CacheHandler instances. If None, the cache files will
be read using pickle. If a single CacheHandler instance is provided, it will be used for all cache
files. If a list of CacheHandler instances is provided, each CacheHandler instance will be used for
each cache file.
:param disable_cache: Overrides the class-level `disable_cache` attribute. If set to True, disables the cache.

:returns: A tuple containing a boolean indicating whether the cached result was used, and the result of executing the
given function. If a cached result is available and `force_recompute` is False, the cached result will be
returned instead of recomputing the result.


.. py:method:: _compute_cache_key(cache_key_inputs: list[Hashable | tuple[Any, mleko.cache.fingerprinters.base_fingerprinter.BaseFingerprinter]], cache_group: str | None = None, frame_depth: int = 3) -> str
Computes the cache key based on the provided cache keys and the calling function's fully qualified name.

:param cache_key_inputs: A list of cache keys that can be a mix of hashable values and tuples containing a
value and a BaseFingerprinter instance for generating fingerprints.
:param cache_group: A string representing the cache group.
:param frame_depth: The depth of the frame to inspect. The default value is 2, which is the frame of the calling
function or method. For each nested function or method, the frame depth should be increased by 1.

:raises ValueError: If the computed cache key is too long.

:returns: A string representing the computed cache key, which is the MD5 hash of the fully qualified name of the
calling function or method, along with the fingerprints of the provided cache keys.


.. py:method:: _load_from_cache(cache_key: str, cache_handlers: mleko.cache.handlers.CacheHandler | list[mleko.cache.handlers.CacheHandler]) -> Any | None
Loads data from the cache based on the provided cache key.

:param cache_key: A string representing the cache key.
:param cache_handlers: A CacheHandler instance or a list of CacheHandler instances. If a single CacheHandler
instance is provided, it will be used for all cache files. If a list of CacheHandler instances is
provided, each CacheHandler instance will be used for each cache file.

:returns: The cached data if it exists, or None if there is no data for the given cache key.


.. py:method:: _get_handler(cache_handlers: mleko.cache.handlers.CacheHandler | list[mleko.cache.handlers.CacheHandler], index: int = 0) -> mleko.cache.handlers.CacheHandler
Gets the cache handler at the given index.

:param cache_handlers: A CacheHandler instance or a list of CacheHandler instances.
:param index: The index of the cache handler to get.

:returns: Handler at the given index. If a single CacheHandler instance is provided, it will be returned.


.. py:method:: _save_to_cache(cache_key: str, output: Any | Sequence[Any], cache_handlers: mleko.cache.handlers.CacheHandler | list[mleko.cache.handlers.CacheHandler]) -> None
Saves the given data to the cache using the provided cache key.

If the output is a sequence, each element will be saved to a separate cache file. Otherwise, the output will be
saved to a single cache file. The cache file will be saved in the cache directory with the cache key as the
filename and the cache file suffix as the file extension.

:param cache_key: A string representing the cache key.
:param output: The data to be saved to the cache.
:param cache_handlers: A CacheHandler instance or a list of CacheHandler instances. If a single CacheHandler
instance is provided, it will be used for all cache files. If a list of CacheHandler instances is
provided, each CacheHandler instance will be used for each cache file.


.. py:method:: _write_to_cache_file(cache_key: str, output_item: Any, index: int, cache_handlers: mleko.cache.handlers.CacheHandler | list[mleko.cache.handlers.CacheHandler], is_sequence_output: bool) -> None
Writes the given data to the cache file using the provided cache key.

If the output is None and the cache handler cannot handle None, the output will be saved using the pickle
cache handler. Otherwise, the output will be saved to a cache file using the provided cache handler.

:param cache_key: A string representing the cache key.
:param output_item: The data to be saved to the cache.
:param index: The index of the cache handler to use.
:param cache_handlers: A CacheHandler instance or a list of CacheHandler instances.
:param is_sequence_output: Whether the output is a sequence or not. If True, the cache file will be saved with the
index appended to the cache key.


.. py:method:: _find_cache_type_name(cls: type) -> str | None
Recursively searches the class hierarchy for the name of the class that inherits from `CacheMixin`.

:param cls: The class to search.

:returns: The name of the class that inherits from `CacheMixin`, or None if no such class exists.



Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
:py:mod:`mleko.cache.fingerprinters.base_fingerprinter`
=======================================================

.. py:module:: mleko.cache.fingerprinters.base_fingerprinter
.. autoapi-nested-parse::

This module contains the abstract base class for creating specialized fingerprinters.

The fingerprinter is used to generate a unique identifier for the given data, which is used
to detect changes in the data. The fingerprinter is used by the cache to determine whether
the data has changed since the last time it was cached.



Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

mleko.cache.fingerprinters.base_fingerprinter.BaseFingerprinter




.. py:class:: BaseFingerprinter
Bases: :py:obj:`abc.ABC`

Abstract base class for creating specialized fingerprinters.

.. py:method:: fingerprint(data: Any) -> str
:abstractmethod:

Generate a fingerprint for the given data.

The fingerprint should be a unique identifier for the given data, across different
runs of the program, i.e. the fingerprint should be the same for the same data
regardless of when the program is run.

:param data: Data that should be fingerprinted.

:raises NotImplementedError: The method has to be implemented by the subclass.

:returns: The fingerprint as a hexadecimal string.
:rtype: str



Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
:py:mod:`mleko.cache.fingerprinters.callable_source_fingerprinter`
==================================================================

.. py:module:: mleko.cache.fingerprinters.callable_source_fingerprinter
.. autoapi-nested-parse::

The module containing the CallableSourceFingerprinter class.



Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

mleko.cache.fingerprinters.callable_source_fingerprinter.CallableSourceFingerprinter




.. py:class:: CallableSourceFingerprinter
Bases: :py:obj:`mleko.cache.fingerprinters.base_fingerprinter.BaseFingerprinter`

A fingerprinter for Callables.

.. py:method:: fingerprint(data: Callable) -> str
Generate a fingerprint for a Python Callable.

.. note::

The fingerprint is generated by hashing the source code of the Callable.
A side effect of this is that the fingerprint will change if the source code
of the Callable changes. However, any changes to variables outside of the
Callable's scope will not affect the fingerprint.

:param data: The Callable to be fingerprinted.

:returns: The fingerprint as a hexadecimal string.



Loading

0 comments on commit 6749d6d

Please sign in to comment.