-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 6749d6d
Showing
293 changed files
with
70,917 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Sphinx build info version 1 | ||
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. | ||
config: 195918d839e8cbd73b4d11462f31364e | ||
tags: 645f666f9bcd5a90fca523b33c5a78b7 |
Binary file not shown.
Binary file not shown.
Binary file added
BIN
+14 KB
.doctrees/autoapi/mleko/cache/fingerprinters/base_fingerprinter/index.doctree
Binary file not shown.
Binary file added
BIN
+13.4 KB
.doctrees/autoapi/mleko/cache/fingerprinters/callable_source_fingerprinter/index.doctree
Binary file not shown.
Binary file added
BIN
+27.3 KB
.doctrees/autoapi/mleko/cache/fingerprinters/csv_fingerprinter/index.doctree
Binary file not shown.
Binary file not shown.
Binary file added
BIN
+16.2 KB
.doctrees/autoapi/mleko/cache/fingerprinters/json_fingerprinter/index.doctree
Binary file not shown.
Binary file added
BIN
+47.1 KB
.doctrees/autoapi/mleko/cache/fingerprinters/optuna_pruner_fingerprinter/index.doctree
Binary file not shown.
Binary file added
BIN
+76 KB
.doctrees/autoapi/mleko/cache/fingerprinters/optuna_sampler_fingerprinter/index.doctree
Binary file not shown.
Binary file added
BIN
+14.3 KB
.doctrees/autoapi/mleko/cache/fingerprinters/vaex_fingerprinter/index.doctree
Binary file not shown.
Binary file added
BIN
+16.1 KB
.doctrees/autoapi/mleko/cache/handlers/base_cache_handler/index.doctree
Binary file not shown.
Binary file not shown.
Binary file added
BIN
+20.3 KB
.doctrees/autoapi/mleko/cache/handlers/joblib_cache_handler/index.doctree
Binary file not shown.
Binary file added
BIN
+21.1 KB
.doctrees/autoapi/mleko/cache/handlers/json_cache_handler/index.doctree
Binary file not shown.
Binary file added
BIN
+20.3 KB
.doctrees/autoapi/mleko/cache/handlers/pickle_cache_handler/index.doctree
Binary file not shown.
Binary file added
BIN
+20.2 KB
.doctrees/autoapi/mleko/cache/handlers/string_cache_handler/index.doctree
Binary file not shown.
Binary file added
BIN
+20.9 KB
.doctrees/autoapi/mleko/cache/handlers/vaex_cache_handler/index.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added
BIN
+125 KB
.doctrees/autoapi/mleko/dataset/convert/csv_to_vaex_converter/index.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added
BIN
+119 KB
.doctrees/autoapi/mleko/dataset/feature_select/base_feature_selector/index.doctree
Binary file not shown.
Binary file added
BIN
+110 KB
.doctrees/autoapi/mleko/dataset/feature_select/composite_feature_selector/index.doctree
Binary file not shown.
Binary file not shown.
Binary file added
BIN
+54.9 KB
.doctrees/autoapi/mleko/dataset/feature_select/invariance_feature_selector/index.doctree
Binary file not shown.
Binary file added
BIN
+117 KB
.doctrees/autoapi/mleko/dataset/feature_select/missing_rate_feature_selector/index.doctree
Binary file not shown.
Binary file added
BIN
+57.4 KB
...s/autoapi/mleko/dataset/feature_select/pearson_correlation_feature_selector/index.doctree
Binary file not shown.
Binary file added
BIN
+117 KB
.doctrees/autoapi/mleko/dataset/feature_select/variance_feature_selector/index.doctree
Binary file not shown.
Binary file not shown.
Binary file added
BIN
+39.2 KB
.doctrees/autoapi/mleko/dataset/filter/expression_filter/index.doctree
Binary file not shown.
Binary file added
BIN
+48.8 KB
.doctrees/autoapi/mleko/dataset/filter/imblearn_resampling_filter/index.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added
BIN
+39.2 KB
.doctrees/autoapi/mleko/dataset/split/expression_splitter/index.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added
BIN
+97.6 KB
.doctrees/autoapi/mleko/dataset/transform/base_transformer/index.doctree
Binary file not shown.
Binary file added
BIN
+54.6 KB
.doctrees/autoapi/mleko/dataset/transform/composite_transformer/index.doctree
Binary file not shown.
Binary file added
BIN
+64.5 KB
.doctrees/autoapi/mleko/dataset/transform/expression_transformer/index.doctree
Binary file not shown.
Binary file added
BIN
+45.3 KB
.doctrees/autoapi/mleko/dataset/transform/frequency_encoder_transformer/index.doctree
Binary file not shown.
Binary file not shown.
Binary file added
BIN
+80.4 KB
.doctrees/autoapi/mleko/dataset/transform/label_encoder_transformer/index.doctree
Binary file not shown.
Binary file added
BIN
+40.8 KB
.doctrees/autoapi/mleko/dataset/transform/max_abs_scaler_transformer/index.doctree
Binary file not shown.
Binary file added
BIN
+44.6 KB
.doctrees/autoapi/mleko/dataset/transform/min_max_scaler_transformer/index.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added
BIN
+146 KB
.doctrees/autoapi/mleko/pipeline/steps/feature_select_step/index.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
API Reference | ||
============= | ||
|
||
This page contains auto-generated API reference documentation [#f1]_. | ||
|
||
.. toctree:: | ||
:titlesonly: | ||
|
||
/autoapi/mleko/index | ||
|
||
.. [#f1] Created with `sphinx-autoapi <https://github.com/readthedocs/sphinx-autoapi>`_ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,225 @@ | ||
:py:mod:`mleko.cache.cache_mixin` | ||
================================= | ||
|
||
.. py:module:: mleko.cache.cache_mixin | ||
.. autoapi-nested-parse:: | ||
|
||
This module contains the basic `CacheMixin` class for caching the results of method calls. | ||
|
||
This class can be used as a mixin to add caching functionality to a class. It provides the basic | ||
functionality for caching the results of method calls based on user-defined cache keys and fingerprints. | ||
|
||
Combining this class with the format mixins can be used to add support for caching different data | ||
formats, such as Vaex DataFrames in Arrow format. | ||
|
||
|
||
|
||
Module Contents | ||
--------------- | ||
|
||
Classes | ||
~~~~~~~ | ||
|
||
.. autoapisummary:: | ||
|
||
mleko.cache.cache_mixin.CacheMixin | ||
|
||
|
||
|
||
Functions | ||
~~~~~~~~~ | ||
|
||
.. autoapisummary:: | ||
|
||
mleko.cache.cache_mixin.get_qualified_name_from_frame | ||
mleko.cache.cache_mixin.get_qualified_name_of_caller | ||
|
||
|
||
|
||
Attributes | ||
~~~~~~~~~~ | ||
|
||
.. autoapisummary:: | ||
|
||
mleko.cache.cache_mixin.logger | ||
|
||
|
||
.. py:data:: logger | ||
A module-level logger instance. | ||
|
||
.. py:function:: get_qualified_name_from_frame(frame: inspect.FrameInfo) -> str | ||
Gets the fully qualified name of the function or method associated with the provided frame. | ||
|
||
:param frame: A `FrameInfo` object containing the information of the function or method call. | ||
|
||
:returns: A string representing the fully qualified name, in the format "module.class.method" for class methods or | ||
"module.function" for functions. | ||
|
||
|
||
.. py:function:: get_qualified_name_of_caller(frame_depth: int) -> str | ||
Gets the fully qualified name of the calling function or method. | ||
|
||
The fully qualified name is in the format "module.class.method" for class methods or "module.function" for | ||
functions. | ||
|
||
:param frame_depth: The depth of the frame to inspect. The default value is 2, which is the frame of the calling | ||
function or method. For each nested function or method, the frame depth should be increased by 1. | ||
|
||
:returns: A string representing the fully qualified name of the calling function or method. | ||
|
||
|
||
.. py:class:: CacheMixin(cache_directory: str | pathlib.Path, disable_cache: bool) | ||
A mixin class for caching the results of method calls based on user-defined cache keys and fingerprints. | ||
|
||
The basic functionality of this class is to cache the results of method calls based on user-defined cache keys and | ||
fingerprints. The cache keys can be a mix of hashable values and tuples containing a value and a BaseFingerprinter | ||
instance for generating fingerprints. The `CacheMixin` class will save cache files in the specified cache directory | ||
using the cache key as the filename and the cache file suffix as the file extension. The cache files will be saved | ||
in the cache directory as pickle files. | ||
|
||
.. warning:: | ||
|
||
This class maintains an ever-growing cache, which means that the cache size may increase indefinitely | ||
with new method calls, possibly consuming a large amount of disk space. It does not implement any | ||
cache eviction strategy. It is recommended to either clear the cache manually when needed or | ||
use the LRUCacheMixin class, which extends this class to provide an LRU cache mechanism with | ||
eviction of least recently used cache entries based on a specified maximum number of cache entries. | ||
|
||
Initializes the `CacheMixin` with the provided cache directory. | ||
|
||
.. note:: The cache directory will be created if it does not exist. | ||
|
||
:param cache_directory: The directory where cache files will be stored. | ||
:param disable_cache: Whether to disable the cache. | ||
|
||
.. rubric:: Examples | ||
|
||
>>> from mleko.cache.cache_mixin import CacheMixin | ||
>>> class MyClass(CacheMixin): | ||
... def __init__(self): | ||
... super().__init__(".cache", "pkl") | ||
... | ||
... def my_method(self, x): | ||
... return self._cached_execute(lambda: x ** 2, [x]) | ||
... | ||
>>> my_class = MyClass() | ||
>>> my_class.my_method(2) | ||
4 # This will be computed and cached | ||
>>> my_class.my_method(2) | ||
4 # This will be loaded from the cache | ||
>>> my_class.my_method(3) | ||
9 # This will be recomputed and cached | ||
|
||
.. py:method:: _cached_execute(lambda_func: Callable[[], Any], cache_key_inputs: list[Hashable | tuple[Any, mleko.cache.fingerprinters.base_fingerprinter.BaseFingerprinter]], cache_group: str | None = None, force_recompute: bool = False, cache_handlers: mleko.cache.handlers.CacheHandler | list[mleko.cache.handlers.CacheHandler] | None = None, disable_cache: bool = False) -> Any | ||
Executes the given function, caching the results based on the provided cache keys and fingerprints. | ||
|
||
.. warning:: | ||
|
||
The cache group is used to group related cache keys together to prevent collisions between cache keys | ||
originating from the same method. For example, if a method is called during the training and testing | ||
phases of a machine learning pipeline, the cache keys for the training and testing phases should be | ||
using different cache groups to prevent collisions between the cache keys for the two phases. Otherwise, | ||
the later cache keys might overwrite the earlier cache entries. | ||
|
||
:param lambda_func: A lambda function to execute. | ||
:param cache_key_inputs: A list of cache keys that can be a mix of hashable values and tuples containing | ||
a value and a BaseFingerprinter instance for generating fingerprints. | ||
:param cache_group: A string representing the cache group, used to group related cache keys together when methods | ||
are called independently. | ||
:param force_recompute: A boolean indicating whether to force recompute the result and update the cache, even if a | ||
cached result is available. | ||
:param cache_handlers: A CacheHandler instance or a list of CacheHandler instances. If None, the cache files will | ||
be read using pickle. If a single CacheHandler instance is provided, it will be used for all cache | ||
files. If a list of CacheHandler instances is provided, each CacheHandler instance will be used for | ||
each cache file. | ||
:param disable_cache: Overrides the class-level `disable_cache` attribute. If set to True, disables the cache. | ||
|
||
:returns: A tuple containing a boolean indicating whether the cached result was used, and the result of executing the | ||
given function. If a cached result is available and `force_recompute` is False, the cached result will be | ||
returned instead of recomputing the result. | ||
|
||
|
||
.. py:method:: _compute_cache_key(cache_key_inputs: list[Hashable | tuple[Any, mleko.cache.fingerprinters.base_fingerprinter.BaseFingerprinter]], cache_group: str | None = None, frame_depth: int = 3) -> str | ||
Computes the cache key based on the provided cache keys and the calling function's fully qualified name. | ||
|
||
:param cache_key_inputs: A list of cache keys that can be a mix of hashable values and tuples containing a | ||
value and a BaseFingerprinter instance for generating fingerprints. | ||
:param cache_group: A string representing the cache group. | ||
:param frame_depth: The depth of the frame to inspect. The default value is 2, which is the frame of the calling | ||
function or method. For each nested function or method, the frame depth should be increased by 1. | ||
|
||
:raises ValueError: If the computed cache key is too long. | ||
|
||
:returns: A string representing the computed cache key, which is the MD5 hash of the fully qualified name of the | ||
calling function or method, along with the fingerprints of the provided cache keys. | ||
|
||
|
||
.. py:method:: _load_from_cache(cache_key: str, cache_handlers: mleko.cache.handlers.CacheHandler | list[mleko.cache.handlers.CacheHandler]) -> Any | None | ||
Loads data from the cache based on the provided cache key. | ||
|
||
:param cache_key: A string representing the cache key. | ||
:param cache_handlers: A CacheHandler instance or a list of CacheHandler instances. If a single CacheHandler | ||
instance is provided, it will be used for all cache files. If a list of CacheHandler instances is | ||
provided, each CacheHandler instance will be used for each cache file. | ||
|
||
:returns: The cached data if it exists, or None if there is no data for the given cache key. | ||
|
||
|
||
.. py:method:: _get_handler(cache_handlers: mleko.cache.handlers.CacheHandler | list[mleko.cache.handlers.CacheHandler], index: int = 0) -> mleko.cache.handlers.CacheHandler | ||
Gets the cache handler at the given index. | ||
|
||
:param cache_handlers: A CacheHandler instance or a list of CacheHandler instances. | ||
:param index: The index of the cache handler to get. | ||
|
||
:returns: Handler at the given index. If a single CacheHandler instance is provided, it will be returned. | ||
|
||
|
||
.. py:method:: _save_to_cache(cache_key: str, output: Any | Sequence[Any], cache_handlers: mleko.cache.handlers.CacheHandler | list[mleko.cache.handlers.CacheHandler]) -> None | ||
Saves the given data to the cache using the provided cache key. | ||
|
||
If the output is a sequence, each element will be saved to a separate cache file. Otherwise, the output will be | ||
saved to a single cache file. The cache file will be saved in the cache directory with the cache key as the | ||
filename and the cache file suffix as the file extension. | ||
|
||
:param cache_key: A string representing the cache key. | ||
:param output: The data to be saved to the cache. | ||
:param cache_handlers: A CacheHandler instance or a list of CacheHandler instances. If a single CacheHandler | ||
instance is provided, it will be used for all cache files. If a list of CacheHandler instances is | ||
provided, each CacheHandler instance will be used for each cache file. | ||
|
||
|
||
.. py:method:: _write_to_cache_file(cache_key: str, output_item: Any, index: int, cache_handlers: mleko.cache.handlers.CacheHandler | list[mleko.cache.handlers.CacheHandler], is_sequence_output: bool) -> None | ||
Writes the given data to the cache file using the provided cache key. | ||
|
||
If the output is None and the cache handler cannot handle None, the output will be saved using the pickle | ||
cache handler. Otherwise, the output will be saved to a cache file using the provided cache handler. | ||
|
||
:param cache_key: A string representing the cache key. | ||
:param output_item: The data to be saved to the cache. | ||
:param index: The index of the cache handler to use. | ||
:param cache_handlers: A CacheHandler instance or a list of CacheHandler instances. | ||
:param is_sequence_output: Whether the output is a sequence or not. If True, the cache file will be saved with the | ||
index appended to the cache key. | ||
|
||
|
||
.. py:method:: _find_cache_type_name(cls: type) -> str | None | ||
Recursively searches the class hierarchy for the name of the class that inherits from `CacheMixin`. | ||
|
||
:param cls: The class to search. | ||
|
||
:returns: The name of the class that inherits from `CacheMixin`, or None if no such class exists. | ||
|
||
|
||
|
52 changes: 52 additions & 0 deletions
52
_sources/autoapi/mleko/cache/fingerprinters/base_fingerprinter/index.rst.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
:py:mod:`mleko.cache.fingerprinters.base_fingerprinter` | ||
======================================================= | ||
|
||
.. py:module:: mleko.cache.fingerprinters.base_fingerprinter | ||
.. autoapi-nested-parse:: | ||
|
||
This module contains the abstract base class for creating specialized fingerprinters. | ||
|
||
The fingerprinter is used to generate a unique identifier for the given data, which is used | ||
to detect changes in the data. The fingerprinter is used by the cache to determine whether | ||
the data has changed since the last time it was cached. | ||
|
||
|
||
|
||
Module Contents | ||
--------------- | ||
|
||
Classes | ||
~~~~~~~ | ||
|
||
.. autoapisummary:: | ||
|
||
mleko.cache.fingerprinters.base_fingerprinter.BaseFingerprinter | ||
|
||
|
||
|
||
|
||
.. py:class:: BaseFingerprinter | ||
Bases: :py:obj:`abc.ABC` | ||
|
||
Abstract base class for creating specialized fingerprinters. | ||
|
||
.. py:method:: fingerprint(data: Any) -> str | ||
:abstractmethod: | ||
|
||
Generate a fingerprint for the given data. | ||
|
||
The fingerprint should be a unique identifier for the given data, across different | ||
runs of the program, i.e. the fingerprint should be the same for the same data | ||
regardless of when the program is run. | ||
|
||
:param data: Data that should be fingerprinted. | ||
|
||
:raises NotImplementedError: The method has to be implemented by the subclass. | ||
|
||
:returns: The fingerprint as a hexadecimal string. | ||
:rtype: str | ||
|
||
|
||
|
47 changes: 47 additions & 0 deletions
47
.../autoapi/mleko/cache/fingerprinters/callable_source_fingerprinter/index.rst.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
:py:mod:`mleko.cache.fingerprinters.callable_source_fingerprinter` | ||
================================================================== | ||
|
||
.. py:module:: mleko.cache.fingerprinters.callable_source_fingerprinter | ||
.. autoapi-nested-parse:: | ||
|
||
The module containing the CallableSourceFingerprinter class. | ||
|
||
|
||
|
||
Module Contents | ||
--------------- | ||
|
||
Classes | ||
~~~~~~~ | ||
|
||
.. autoapisummary:: | ||
|
||
mleko.cache.fingerprinters.callable_source_fingerprinter.CallableSourceFingerprinter | ||
|
||
|
||
|
||
|
||
.. py:class:: CallableSourceFingerprinter | ||
Bases: :py:obj:`mleko.cache.fingerprinters.base_fingerprinter.BaseFingerprinter` | ||
|
||
A fingerprinter for Callables. | ||
|
||
.. py:method:: fingerprint(data: Callable) -> str | ||
Generate a fingerprint for a Python Callable. | ||
|
||
.. note:: | ||
|
||
The fingerprint is generated by hashing the source code of the Callable. | ||
A side effect of this is that the fingerprint will change if the source code | ||
of the Callable changes. However, any changes to variables outside of the | ||
Callable's scope will not affect the fingerprint. | ||
|
||
:param data: The Callable to be fingerprinted. | ||
|
||
:returns: The fingerprint as a hexadecimal string. | ||
|
||
|
||
|
Oops, something went wrong.