Skip to content

Commit

Permalink
Added additive vs. multiplicative model options to decomposition
Browse files Browse the repository at this point in the history
  • Loading branch information
[email protected] committed Nov 4, 2020
1 parent be089df commit f1fceab
Show file tree
Hide file tree
Showing 4 changed files with 185 additions and 53 deletions.
63 changes: 60 additions & 3 deletions docs/HTML/DecomposeVisitor.html
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,31 @@
<html>
<body>

<table align="center" border="1">

<tr bgcolor="lightblue">
<th>Signature</th> <th>Description</th>
</tr>
<tr bgcolor="lightgrey">
<td bgcolor="maroon"> <font color="white">
<PRE><B>
enum class decompose_type : unsigned char {
additive = 1, // Y<sub>t</sub> = Trend + Seasonal + Residual
multiplicative = 2, // Y<sub>t</sub> = Trend * Seasonal * Residual
};</B></PRE> </font>
</td>
<td>
The additive decomposition is the most appropriate if the magnitude of the seasonal fluctuations, or the variation around the trend-cycle, does not vary with the level of the time series. When the variation in the seasonal pattern, or the variation around the trend-cycle, appears to be proportional to the level of the time series, then a multiplicative decomposition is more appropriate. Multiplicative decompositions are common with economic time series.<BR>

An alternative to using a multiplicative decomposition is to first transform the data until the variation in the series appears to be stable over time, then use an additive decomposition. When a log transformation has been used, this is equivalent to using a multiplicative decomposition because:<BR>
<I>Y<sub>t</sub> = T * S * R is equivalent to log(Y<sub>t</sub>) = log(T) + logt(S) + log(R)</I>
</td>
</tr>

</table>

<BR>

<table align="center" border="1">

<tr bgcolor="lightblue">
Expand All @@ -44,17 +69,19 @@
</td>
<td>
This is a “single action visitor”, meaning it is passed the whole data vector in one call and you must use the single_act_visit() interface.<BR><BR>
This visitor creates a seasonal-trend (with Loess, aka "STL") decomposition of observed time series data. This implementation is modeled after the <I>statsmodels.tsa.seasonal_decompose</I> method but substitutes a Lowess regression for a convolution in its trend estimation. This is an additive model, <I>Y[t] = T[t] + S[t] + e[t]</I><BR>
This visitor creates a seasonal-trend (with Loess, aka "STL") decomposition of observed time series data. This implementation is modeled after the <I>statsmodels.tsa.seasonal_decompose</I> method but substitutes a Lowess regression for a convolution in its trend estimation.<BR>
<I>get_result()</I> returns the <B>Trend</B> vector. There are 3 get result methods: <I>get_trend(), get_seasonal(), get_residual()</I>. They return the corresponding vectors
<PRE>
explicit
DecomposeVisitor (size_type s_period,
value_type frac,
value_type delta);
value_type delta,
decompose_type type = decompose_type::additive);
<B>s_period</B>: Length of a season in untis of one observation.
There must be at least two seasons in the data
<B>frac</B>: Between 0 and 1. The fraction of the data used when estimating each y-value.
<B>delta</B>: Distance within which to use linear-interpolation instead of weighted regression.</PRE>
<B>delta</B>: Distance within which to use linear-interpolation instead of weighted regression.
<B>type</B>: Model type. See above.</PRE>
</td>
<td width="12%">
<B>T</B>: Column data type.<BR>
Expand Down Expand Up @@ -129,6 +156,36 @@
</pre>
<!--Created using ToHtml.com on 2020-11-02 18:13:51 UTC -->


<pre style='color:#000000;background:#ffffff;'>
<span style='color:#696969; '>// -----------------------------------------------------------------------------</span>

<span style='color:#800000; font-weight:bold; '>static</span> <span style='color:#800000; font-weight:bold; '>void</span> test_IBM_data<span style='color:#808030; '>(</span><span style='color:#808030; '>)</span> <span style='color:#800080; '>{</span>

<span style='color:#666616; '>std</span><span style='color:#800080; '>::</span><span style='color:#603000; '>cout</span> <span style='color:#808030; '>&lt;</span><span style='color:#808030; '>&lt;</span> <span style='color:#800000; '>"</span><span style='color:#0f69ff; '>\n</span><span style='color:#0000e6; '>Testing IBM_data( ) ...</span><span style='color:#800000; '>"</span> <span style='color:#808030; '>&lt;</span><span style='color:#808030; '>&lt;</span> <span style='color:#666616; '>std</span><span style='color:#800080; '>::</span><span style='color:#603000; '>endl</span><span style='color:#800080; '>;</span>

<span style='color:#800000; font-weight:bold; '>typedef</span> StdDataFrame<span style='color:#800080; '>&lt;</span><span style='color:#666616; '>std</span><span style='color:#800080; '>::</span><span style='color:#603000; '>string</span><span style='color:#800080; '>></span> StrDataFrame<span style='color:#800080; '>;</span>

StrDataFrame df<span style='color:#800080; '>;</span>

df<span style='color:#808030; '>.</span>read<span style='color:#808030; '>(</span><span style='color:#800000; '>"</span><span style='color:#0000e6; '>IBM.csv</span><span style='color:#800000; '>"</span><span style='color:#808030; '>,</span> io_format<span style='color:#800080; '>::</span>csv2<span style='color:#808030; '>)</span><span style='color:#800080; '>;</span>

DecomposeVisitor<span style='color:#800080; '>&lt;</span><span style='color:#800000; font-weight:bold; '>double</span><span style='color:#808030; '>,</span> <span style='color:#666616; '>std</span><span style='color:#800080; '>::</span><span style='color:#603000; '>string</span><span style='color:#800080; '>></span> d_v<span style='color:#808030; '>(</span><span style='color:#008c00; '>178</span><span style='color:#808030; '>,</span> <span style='color:#008000; '>2.0</span> <span style='color:#808030; '>/</span> <span style='color:#008000; '>3.0</span><span style='color:#808030; '>,</span> <span style='color:#008c00; '>0</span><span style='color:#808030; '>,</span> decompose_type<span style='color:#800080; '>::</span>multiplicative<span style='color:#808030; '>)</span><span style='color:#800080; '>;</span>
<span style='color:#696969; '>// decompose_type::additive);</span>

df<span style='color:#808030; '>.</span>single_act_visit<span style='color:#800080; '>&lt;</span><span style='color:#800000; font-weight:bold; '>double</span><span style='color:#800080; '>></span><span style='color:#808030; '>(</span><span style='color:#800000; '>"</span><span style='color:#0000e6; '>IBM_Adj_Close</span><span style='color:#800000; '>"</span><span style='color:#808030; '>,</span> d_v<span style='color:#808030; '>)</span><span style='color:#800080; '>;</span>

<span style='color:#800000; font-weight:bold; '>const</span> <span style='color:#800000; font-weight:bold; '>auto</span> <span style='color:#808030; '>&amp;</span>ibm_closes <span style='color:#808030; '>=</span> df<span style='color:#808030; '>.</span>get_column<span style='color:#800080; '>&lt;</span><span style='color:#800000; font-weight:bold; '>double</span><span style='color:#800080; '>></span><span style='color:#808030; '>(</span><span style='color:#800000; '>"</span><span style='color:#0000e6; '>IBM_Adj_Close</span><span style='color:#800000; '>"</span><span style='color:#808030; '>)</span><span style='color:#800080; '>;</span>

<span style='color:#800000; font-weight:bold; '>for</span> <span style='color:#808030; '>(</span><span style='color:#666616; '>std</span><span style='color:#800080; '>::</span><span style='color:#603000; '>size_t</span> i <span style='color:#808030; '>=</span> <span style='color:#008c00; '>0</span><span style='color:#800080; '>;</span> i <span style='color:#808030; '>&lt;</span> ibm_closes<span style='color:#808030; '>.</span>size<span style='color:#808030; '>(</span><span style='color:#808030; '>)</span><span style='color:#800080; '>;</span> <span style='color:#808030; '>+</span><span style='color:#808030; '>+</span>i<span style='color:#808030; '>)</span>
<span style='color:#666616; '>std</span><span style='color:#800080; '>::</span><span style='color:#603000; '>cout</span> <span style='color:#808030; '>&lt;</span><span style='color:#808030; '>&lt;</span> ibm_closes<span style='color:#808030; '>[</span>i<span style='color:#808030; '>]</span> <span style='color:#808030; '>&lt;</span><span style='color:#808030; '>&lt;</span> <span style='color:#800000; '>"</span><span style='color:#0000e6; '>,</span><span style='color:#800000; '>"</span>
<span style='color:#808030; '>&lt;</span><span style='color:#808030; '>&lt;</span> d_v<span style='color:#808030; '>.</span>get_trend<span style='color:#808030; '>(</span><span style='color:#808030; '>)</span><span style='color:#808030; '>[</span>i<span style='color:#808030; '>]</span> <span style='color:#808030; '>&lt;</span><span style='color:#808030; '>&lt;</span> <span style='color:#800000; '>"</span><span style='color:#0000e6; '>,</span><span style='color:#800000; '>"</span>
<span style='color:#808030; '>&lt;</span><span style='color:#808030; '>&lt;</span> d_v<span style='color:#808030; '>.</span>get_seasonal<span style='color:#808030; '>(</span><span style='color:#808030; '>)</span><span style='color:#808030; '>[</span>i<span style='color:#808030; '>]</span> <span style='color:#808030; '>&lt;</span><span style='color:#808030; '>&lt;</span> <span style='color:#800000; '>"</span><span style='color:#0000e6; '>,</span><span style='color:#800000; '>"</span>
<span style='color:#808030; '>&lt;</span><span style='color:#808030; '>&lt;</span> d_v<span style='color:#808030; '>.</span>get_residual<span style='color:#808030; '>(</span><span style='color:#808030; '>)</span><span style='color:#808030; '>[</span>i<span style='color:#808030; '>]</span> <span style='color:#808030; '>&lt;</span><span style='color:#808030; '>&lt;</span> <span style='color:#666616; '>std</span><span style='color:#800080; '>::</span><span style='color:#603000; '>endl</span><span style='color:#800080; '>;</span>
<span style='color:#800080; '>}</span>
</pre>
<!--Created using ToHtml.com on 2020-11-04 17:49:01 UTC -->

</body>
</html>

Expand Down
141 changes: 99 additions & 42 deletions include/DataFrame/DataFrameStatsVisitors.h
Original file line number Diff line number Diff line change
Expand Up @@ -112,16 +112,16 @@ struct GeometricMeanVisitor {

if (skip_nan_ && is_nan__(val)) return;

mean_ *= val;
mean_ += std::log(val);
cnt_ += 1;
}
inline void pre () { mean_ = value_type(1); cnt_ = 0; }
inline void pre () { mean_ = 0; cnt_ = 0; }
inline void post () { }
inline size_type get_count () const { return (cnt_); }
inline value_type get_sum () const { return (mean_); }
inline result_type get_result () const {

return (std::pow(mean_, value_type(1) / value_type(cnt_)));
return (std::exp(mean_ / value_type(cnt_)));
}
template <typename K, typename H>
inline void
Expand Down Expand Up @@ -3642,41 +3642,35 @@ struct DecomposeVisitor {

DEFINE_VISIT_BASIC_TYPES_3

private:

template<typename K, typename H>
inline void
operator() (const K &idx_begin, const K &idx_end,
const H &y_begin, const H &y_end) {

const size_type col_s = std::distance(y_begin, y_end);

assert(s_period_ <= col_s / 2);

std::vector<value_type> xvals (col_s);
do_trend_(const K &idx_begin, const K &idx_end,
const H &y_begin, const H &y_end,
size_type col_s,
std::vector<value_type> &xvals) {

std::iota(xvals.begin(), xvals.end(), 0);

LowessVisitor<T, I> l_v (3, frac_, delta_ * value_type(col_s), true);

// Calculate trend and remove it from observations in y
// Calculate trend
l_v.pre();
l_v (idx_begin, idx_end, y_begin, y_end, xvals.begin(), xvals.end());
l_v.post();
trend_ = std::move(l_v.get_result());
}

// We want to resue the vector, so just rename it.
// This way nobody gets confused
std::vector<value_type> &detrended = xvals;

std::transform(y_begin, y_end,
trend_.begin(),
detrended.begin(),
std::minus<value_type>());
template<typename MEAN, typename K>
inline void
do_seasonal_(size_type col_s, const K &idx_begin, const K &idx_end,
const std::vector<value_type> &detrended) {

StepRollAdopter<MeanVisitor<T, I>, value_type, I> sr_mean (
MeanVisitor<T, I>(), s_period_);
StepRollAdopter<MEAN, value_type, I> sr_mean (MEAN(), s_period_);

seasonal_.resize(col_s, 0);
// Calculate one-period seasonality
seasonal_.resize(col_s, 0);
for (size_type i = 0; i < s_period_; ++i) {
sr_mean.pre();
sr_mean (idx_begin + i, idx_end,
Expand All @@ -3685,26 +3679,86 @@ struct DecomposeVisitor {
seasonal_[i] = sr_mean.get_result();
}

MeanVisitor<T, I> m_v;
// [01]-center the period means depending on the type
MEAN m_v;

// 0-center the period means
m_v.pre();
m_v (idx_begin, idx_end + s_period_,
m_v (idx_begin, idx_begin + s_period_,
seasonal_.begin(), seasonal_.begin() + s_period_);
m_v.post();
for (size_type i = 0; i < s_period_; ++i)
seasonal_[i] -= m_v.get_result();

const value_type result = m_v.get_result();

if (type_ == decompose_type::additive) {
for (size_type i = 0; i < s_period_; ++i)
seasonal_[i] -= result;
}
else {
for (size_type i = 0; i < s_period_; ++i)
seasonal_[i] /= result;
}

// Tile the one-time seasone over the seasonal_ vector
for (size_type i = s_period_; i < col_s; ++i)
seasonal_[i] = seasonal_[i % s_period_];
}

inline void
do_residual_(const std::vector<value_type> &detrended, size_type col_s) {

// What is left is residual
residual_.resize(col_s, 0);
std::transform(detrended.begin(), detrended.end(),
seasonal_.begin(),
residual_.begin(),
std::minus<value_type>());
if (type_ == decompose_type::additive)
std::transform(detrended.begin(), detrended.end(),
seasonal_.begin(),
residual_.begin(),
std::minus<value_type>());
else
std::transform(detrended.begin(), detrended.end(),
seasonal_.begin(),
residual_.begin(),
std::divides<value_type>());
}

public:

template<typename K, typename H>
inline void
operator() (const K &idx_begin, const K &idx_end,
const H &y_begin, const H &y_end) {

const size_type col_s = std::distance(y_begin, y_end);

assert(s_period_ <= col_s / 2);

std::vector<value_type> xvals (col_s);

do_trend_(idx_begin, idx_end, y_begin, y_end, col_s, xvals);

// We want to reuse the vector, so just rename it.
// This way nobody gets confused
std::vector<value_type> &detrended = xvals;

// Remove trend from observations in y
if (type_ == decompose_type::additive)
std::transform(y_begin, y_end,
trend_.begin(),
detrended.begin(),
std::minus<value_type>());
else
std::transform(y_begin, y_end,
trend_.begin(),
detrended.begin(),
std::divides<value_type>());

if (type_ == decompose_type::additive)
do_seasonal_<MeanVisitor<T, I>>
(col_s, idx_begin, idx_end, detrended);
else
do_seasonal_<GeometricMeanVisitor<T, I>>
(col_s, idx_begin, idx_end, detrended);

do_residual_(detrended, col_s);
}

inline void pre () {
Expand All @@ -3727,25 +3781,28 @@ struct DecomposeVisitor {
inline const result_type &get_residual () const { return (residual_); }
inline result_type &get_residual () { return (residual_); }

explicit
DecomposeVisitor (size_type s_period, value_type frac, value_type delta)
: frac_(frac), s_period_(s_period), delta_(delta) { }
DecomposeVisitor (size_type s_period,
value_type frac,
value_type delta,
decompose_type t = decompose_type::additive)
: frac_(frac), s_period_(s_period), delta_(delta), type_(t) { }

private:

// Between 0 and 1. The fraction of the data used when estimating
// each y-value.
const value_type frac_;
const value_type frac_;
// Seasonal period in unit of one observation. There must be at least
// two seasons in the data
const size_type s_period_;
const size_type s_period_;
// Distance within which to use linear-interpolation instead of weighted
// regression.
const value_type delta_;
// regression. 0 or small values cause longer/more accurate processing
const value_type delta_;
const decompose_type type_;

result_type trend_ { };
result_type seasonal_ { };
result_type residual_ { };
result_type trend_ { };
result_type seasonal_ { };
result_type residual_ { };
};

} // namespace hmdf
Expand Down
21 changes: 21 additions & 0 deletions include/DataFrame/DataFrameTypes.h
Original file line number Diff line number Diff line change
Expand Up @@ -297,6 +297,27 @@ enum class sigmoid_type : unsigned char {

// ----------------------------------------------------------------------------

// The additive decomposition is the most appropriate if the magnitude of the
// seasonal fluctuations, or the variation around the trend-cycle, does not
// vary with the level of the time series. When the variation in the seasonal
// pattern, or the variation around the trend-cycle, appears to be
// proportional to the level of the time series, then a multiplicative
// decomposition is more appropriate. Multiplicative decompositions are common
// with economic time series.

// An alternative to using a multiplicative decomposition is to first
// transform the data until the variation in the series appears to be stable
// over time, then use an additive decomposition. When a log transformation
// has been used, this is equivalent to using a multiplicative decomposition
// because:
// Y[t] = T * S * R is equivalent to log(Y[t]) = log(T) + logt(S) + log(R)
enum class decompose_type : unsigned char {
additive = 1, // Y(t) = Trend + Seasonal + Residual
multiplicative = 2, // Y(t) = Trend * Seasonal * Residual
};

// ----------------------------------------------------------------------------

enum class box_cox_type : unsigned char {
// y(λ) = (y^λ - 1) / λ, if λ != 0
// y(λ) = log(y), if λ == 0
Expand Down
13 changes: 5 additions & 8 deletions test/dataframe_tester_2.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1904,7 +1904,7 @@ static void test_DecomposeVisitor() {
146.783, 147.173, 146.939, 147.290, 145.946, 142.624, 138.027,
136.118, 129.650, 126.767, 130.809, 125.550, 130.732, 126.183,
124.410, 114.748, 121.527, 114.904, 100.138, 105.144,
};
};
MyDataFrame df;

df.load_data(std::move(MyDataFrame::gen_sequence_index(0, y_vec.size(), 1)),
Expand Down Expand Up @@ -1985,14 +1985,11 @@ static void test_IBM_data() {

StrDataFrame df;

try {
df.read("IBM.csv", io_format::csv2);
}
catch (const DataFrameError &ex) {
std::cout << ex.what() << std::endl;
}
df.read("IBM.csv", io_format::csv2);

DecomposeVisitor<double, std::string> d_v (178, 2.0 / 3.0, 0);
DecomposeVisitor<double, std::string> d_v(178, 2.0 / 3.0, 0,
decompose_type::multiplicative);
// decompose_type::additive);

df.single_act_visit<double>("IBM_Adj_Close", d_v);

Expand Down

0 comments on commit f1fceab

Please sign in to comment.