Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixed effects model to convert irregular data to basis expansion #618

Open
wants to merge 62 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
239bfd3
minimize converter
pcuestas Mar 4, 2024
4b917ec
Change 'conversion' module to 'representation' and reorganize interna…
pcuestas Mar 8, 2024
9d2a2d4
WIP: include EM converter
pcuestas Mar 8, 2024
c28f4bf
Add EM converter
pcuestas Mar 15, 2024
21c7f17
Naming and comments
pcuestas Mar 16, 2024
5166b5c
Fix docstring
pcuestas Mar 16, 2024
2b7fd7c
Comment and remove unused import
pcuestas Mar 16, 2024
2a63e9b
Merge branch 'develop' into feature/irregular_to_basis_mixed_effects_…
pcuestas Mar 18, 2024
ddadefc
Remove space and adapt FDataIrregular.integrate to correctly override…
pcuestas Mar 23, 2024
9f17ff3
Update signature of FDataIrregular.cov to match superclass'
pcuestas Apr 1, 2024
9730d51
Add FDataIrregular to skfda like FDataBasis and FDataGrid
pcuestas Apr 1, 2024
951dea3
Implement scores for `FDatairregular` objects as described in #609
pcuestas Apr 1, 2024
30c807f
Fix ugly comment
pcuestas Apr 1, 2024
a5b7617
Merge branch 'develop' into feature/scoring-fdatairregular
vnmabus Apr 4, 2024
30e11ce
conversion documentation
pcuestas Apr 7, 2024
d3b26b5
Rename classes to make them public and add `FDataIrregular.to_basis` …
pcuestas Apr 7, 2024
e18bc49
Fix tests
pcuestas Apr 12, 2024
5d3addc
Remove global variables from tests
pcuestas Apr 12, 2024
e91419d
Replace Union and Optional
pcuestas Apr 13, 2024
313a001
Fix imports
pcuestas Apr 13, 2024
8b0a972
Merge branch 'feature/scoring-fdatairregular' into feature/irregular_…
pcuestas Apr 13, 2024
4bd8713
Fix possible division by zero
pcuestas Apr 13, 2024
bb5e1cb
Merge branch 'feature/scoring-fdatairregular' into feature/irregular_…
pcuestas Apr 13, 2024
0bb641b
irregularly sample FData objects
pcuestas Apr 13, 2024
2030dd3
Examples
pcuestas Apr 13, 2024
36a8092
Fix irregular datasets slicing and add fetch_bone_density to document…
pcuestas Apr 13, 2024
2fa07cd
examples
pcuestas Apr 17, 2024
10ed91f
irregular_sample can receive an fdatairreuglar and a list of n_points…
pcuestas Apr 20, 2024
7104d2a
example removing points from fdatairregular
pcuestas Apr 20, 2024
5f58016
irregular to basis doctest example with plot
pcuestas Apr 20, 2024
2eff88e
update examples
pcuestas Apr 22, 2024
1fac99e
jupyter plot for to_basis
pcuestas Apr 22, 2024
c23064d
update converters
pcuestas Apr 24, 2024
f3663d8
remove commented code from tests
pcuestas Apr 24, 2024
e90eac7
design of mixed effects converters and bib references
pcuestas Apr 26, 2024
c5bd7a5
comments and remove unnecessary test
pcuestas Apr 26, 2024
a9c2ee4
example
pcuestas Apr 27, 2024
de2cf1e
.
pcuestas Apr 27, 2024
abb53aa
fix previous commit which was a mistake
pcuestas Apr 27, 2024
5cb6dbb
Merge branch 'fix/fdatairregular_getitem' into feature/irregular_to_b…
pcuestas Jun 8, 2024
702d47a
simple conversion with real data too in example/tutorial
pcuestas Jun 8, 2024
e2a4466
remove extra comment line
pcuestas Jun 8, 2024
e6f00da
isort
pcuestas Jun 8, 2024
ba34522
wip example of decimation
pcuestas Jun 8, 2024
31d69f2
decimation example
pcuestas Jun 9, 2024
6c5e622
decimation example
pcuestas Jun 9, 2024
16c513a
decimation example
pcuestas Jun 9, 2024
9e1a566
remove extra example
pcuestas Jun 11, 2024
509acd4
adapting mixed effects for higher dimensions
pcuestas Jun 11, 2024
9e4a612
isort
pcuestas Jun 12, 2024
f442a58
test sample_irregular
pcuestas Jun 12, 2024
84e92d0
irregular_sample for multidimensional datasets (domain and codomain)
pcuestas Jun 12, 2024
704d254
comment and clean the tests for irregular_sample
pcuestas Jun 12, 2024
c4151e9
Tests for the multidimensional case (_mixed_effects)
pcuestas Jun 12, 2024
4073501
documentation and review
pcuestas Jun 13, 2024
0b9d37f
simplify test for mixed_effets so that it runs faster
pcuestas Jun 14, 2024
523a9ea
documentation and small changes
pcuestas Jun 14, 2024
20bd812
change order of examples in documentation
pcuestas Jun 14, 2024
6ad8998
remove file that should not have been commited
pcuestas Jun 14, 2024
6ab190d
Merge branch 'develop' into feature/irregular_to_basis_mixed_effects_…
pcuestas Jun 16, 2024
1d696e7
Make example more visually interesting
pcuestas Jun 16, 2024
9e591d0
Merge branch 'develop' into feature/irregular_to_basis_mixed_effects_…
vnmabus Jun 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/modules/datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ The following functions are used to retrieve specific functional datasets:
skfda.datasets.fetch_phoneme
skfda.datasets.fetch_tecator
skfda.datasets.fetch_weather
skfda.datasets.fetch_bone_density

Those functions return a dictionary with at least a "data" field containing the
instance data, and a "target" field containing the class labels or regression values,
Expand Down
9 changes: 9 additions & 0 deletions docs/modules/representation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,15 @@ interval using extrapolation methods.

representation/extrapolation

Conversion
------------
Convert irregular data to basis representation using mixed effects models.

.. toctree::
:maxdepth: 4

representation/conversion

Deprecated Classes
----------------------

Expand Down
18 changes: 18 additions & 0 deletions docs/modules/representation/conversion.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
Conversion between representations
==================================

This module contains classes (converters) for converting between different
representations. Currently only the conversion between :class:`FDataIrregular`
and :class:`FDataBasis` has been implemented via converters.

:class:`FDataIrregular` to :class:`FDataBasis`
----------------------------------------------

These are the submodules that contain the converters for the conversion between
:class:`FDataIrregular` and :class:`FDataBasis`:

.. toctree::
:maxdepth: 2

conversion/mixed_effects

16 changes: 16 additions & 0 deletions docs/modules/representation/conversion/mixed_effects.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
Mixed effects converters
########################

The following classes can be used for converting irregular functional
data to basis representation using the mixed effects model.

.. autosummary::
:toctree: autosummary

skfda.representation.conversion.MinimizeMixedEffectsConverter
skfda.representation.conversion.EMMixedEffectsConverter


.. automodule:: skfda.representation.conversion._mixed_effects
:no-members:

31 changes: 31 additions & 0 deletions docs/refs.bib
Original file line number Diff line number Diff line change
Expand Up @@ -659,3 +659,34 @@ @inbook{wasserman_2006_nonparametric
langid = {english}
}

@article{james_2018_sparsenessfda,
title = {Sparseness and functional data analysis},
author = {Gareth M. James},
journal = {Oxford Handbooks Online},
year = {2018},
url = {https://api.semanticscholar.org/CorpusID:14265225}
}

@article{Lindstrom_1988,
doi = {10.1080/01621459.1988.10478693},
title = {{N}ewton—{R}aphson and {EM} {A}lgorithms for {L}inear {M}ixed-{E}ffects {M}odels for {R}epeated-{M}easures {D}ata},
author = {Lindstrom, Mary J. and Bates, Douglas M.},
journal = {Journal of the American Statistical Association 1988-dec vol. 83 iss. 404},
year = {1988},
month = {dec},
volume = {83},
issue = {404},
page = {1014--1022},
}

@article{laird+lange+stram_1987_emmixedeffects,
author = {Nan Laird, Nicholas Lange and Daniel Stram},
title = {Maximum Likelihood Computations with Repeated Measures: Application of the EM Algorithm},
journal = {Journal of the American Statistical Association},
volume = {82},
number = {397},
pages = {97--105},
year = {1987},
publisher = {Taylor \& Francis},
doi = {10.1080/01621459.1987.10478395},
}
233 changes: 233 additions & 0 deletions examples/plot_irregular_mixed_effects_robustness.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,233 @@
"""
Mixed effects model for irregular data: robustness of the conversion by decimation
=======================================================================

This example converts irregular data to a basis representation using a mixed
effects model and checks the robustness of the method by fitting
the model with decreasing number of measurement points per curve.
"""
# Author: Pablo Cuesta Sierra
# License: MIT

# sphinx_gallery_thumbnail_number = -1

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

from skfda import FDataIrregular
from skfda.datasets import fetch_weather, irregular_sample
from skfda.misc.scoring import mean_squared_error, r2_score
from skfda.representation.basis import FourierBasis
from skfda.representation.conversion import EMMixedEffectsConverter

# %%
# For this example, we are going to check the robustness of
# the mixed effects method for converting irregular data to basis
# representation by removing some measurement points from the test and train
# sets and comparing the resulting conversions.
#
# The temperatures from the Canadian weather dataset are used to generate
# the irregular data.
# We use a Fourier basis due to the periodic nature of the data.
fd_temperatures = fetch_weather().data.coordinates[0]
basis = FourierBasis(n_basis=5, domain_range=fd_temperatures.domain_range)

# %%
# We plot the original data and the basis functions.
fig = plt.figure(figsize=(10, 4))

axes = plt.subplot(1, 2, 1)
fd_temperatures.plot(axes=axes)
ylim = axes.get_ylim()
xlabel = axes.get_xlabel()
plt.title(fd_temperatures.dataset_name)

axes = plt.subplot(1, 2, 2)
basis.plot(axes=axes)
axes.set_xlabel(xlabel)
plt.title("Basis functions")

plt.suptitle("")
plt.show()

# %%
# We split the data into train and test sets:
random_state = np.random.RandomState(seed=13627798)
train_original, test_original = train_test_split(
fd_temperatures,
test_size=0.3,
random_state=random_state,
)

# %%
# Then, we create datasets with decreasing number of measurement points per
# curve, by removing measurement points from the previous dataset iteratively.
n_points_list = [365, 40, 10, 7, 5, 4, 3]
train_irregular_datasets = {}
test_irregular_datasets = {}
current_train = train_original
current_test = test_original
for n_points in n_points_list:
current_train = irregular_sample(current_train, n_points, random_state)
current_test = irregular_sample(current_test, n_points, random_state)
train_irregular_datasets[n_points] = current_train
test_irregular_datasets[n_points] = current_test

# %%
# We convert the irregular data to basis representation and compute the scores.
# To do so, we fit the converter once per train set. After fitting the
# the converter with a train set that has :math:`k` points per curve, we
# use it to transform that train set, the test set with :math:`k` points per
# curve and the original test set with 365 points per curve.
score_functions = {"R^2": r2_score, "MSE": mean_squared_error}
converted_data = {"Train-sparse": {}, "Test-sparse": {}, "Test-original": {}}
scores = {
score_name: {
"n_points_per_curve": n_points_list,
**{data_name: [] for data_name in converted_data},
}
for score_name in score_functions
}
converter = EMMixedEffectsConverter(basis)
for n_points, train_irregular, test_irregular in zip(
n_points_list,
train_irregular_datasets.values(),
test_irregular_datasets.values(),
):
converter = converter.fit(train_irregular)
transformed = {
"Train-sparse": converter.transform(train_irregular),
"Test-sparse": converter.transform(test_irregular),
"Test-original": converter.transform(
FDataIrregular.from_fdatagrid(test_original),
),
}
# Store the converted data
for key, data in transformed.items():
converted_data[key][n_points] = data
# Calculate and store the scores
for score_name, score_fun in score_functions.items():
for key in converted_data:
scores[score_name][key].append(score_fun(
test_original if "Test" in key else train_original,
transformed[key].to_grid(test_original.grid_points),
))

# %%
# Finally, we have the scores for the train and test sets with decreasing
# number of measurement points per curve.
for score_name in scores.keys():
print(f"{score_name} scores:")
print("-" * 62)
print((
pd.DataFrame(scores[score_name])
.round(3).set_index("n_points_per_curve").sort_index()
), end="\n\n\n")


# %%
# Plot the scores.
plt.figure(figsize=(12, 5))
for i, (score_name, values) in enumerate(scores.items()):
df = (
pd.DataFrame(values)
.sort_values("n_points_per_curve").set_index("n_points_per_curve")
)
plt.subplot(1, 2, i + 1)
label_start = r"Fit $\mathcal{D}_{train}^{\ j}$; "
plt.plot(
df.index,
df["Train-sparse"],
label=label_start + r"ransform $\mathcal{D}_{train}^{\ j}$",
marker=".",
)
plt.plot(
df.index,
df["Test-sparse"],
label=label_start + r"transform $\mathcal{D}_{test}^{\ j}$",
marker=".",
)
plt.plot(
df.index,
df["Test-original"],
label=label_start + r"transform $\mathcal{D}_{test}^{\ 0}$",
marker=".",
)
if score_name == "MSE":
plt.yscale("log")
plt.ylabel(f"${score_name}$ score (logscale)")
else:
plt.ylabel(f"${score_name}$ score")

plt.xscale("log")
plt.xlabel(r"Measurements per function (logscale)")
plt.legend()
plt.plot()


# %%
# Show the original curves along with the converted
# test curves for the conversions with 7, 5, 4 and 3 points per curve.
def plot_conversion_evolution(index: int):
plt.figure(figsize=(8, 8.5))
i = 0
for n_points_per_curve in n_points_list[3:]:
axes = plt.subplot(2, 2, i + 1)
i += 1

test_irregular_datasets[n_points_per_curve][index].scatter(
axes=axes, color="C0",
)
fd_temperatures.mean().plot(
axes=axes, color=[0.4] * 3, label="Original dataset mean",
)
fd_temperatures.plot(
axes=axes, color=[0.7] * 3, linewidth=0.2,
)
test_original[index].plot(
axes=axes, color="C0", linewidth=0.65, label="Original test curve",
)
converted_data["Test-sparse"][n_points_per_curve][index].plot(
axes=axes,
color="C0",
linestyle="--",
label=f"Test curve transformed",
)
plt.title(f"Transform of test curves with {n_points_per_curve} points")
plt.ylim(ylim)

plt.suptitle(
"Evolution of the conversion of a curve with decreasing measurements "
f"({test_original.sample_names[index]} station)"
)

# Add common legend at the bottom:
handles, labels = plt.gca().get_legend_handles_labels()
plt.tight_layout(h_pad=0, rect=[0, 0.1, 1, 1])
plt.legend(
handles=handles,
loc="lower center",
ncols=3,
bbox_to_anchor=(-.1, -0.3),
)

plt.show()


# %%
# Toronto station's temperature curve conversion evolution:
plot_conversion_evolution(index=7)

# %%
# Iqaluit station's temperature curve conversion evolution:
plot_conversion_evolution(index=8)

# %%
# As can be seen in the figures, the fewer the measurements, the closer
# the converted curve is to the mean of the original dataset.
# This leads us to believe that when the amount of measurements is too low,
# the mixed-effects model is able to capture the general trend of the data,
# but it is not able to properly capture the individual variation of each
# curve.
Loading
Loading