Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model type LassoModel doesn't support intercept #74

Open
ForceBru opened this issue Jan 1, 2023 · 1 comment
Open

Model type LassoModel doesn't support intercept #74

ForceBru opened this issue Jan 1, 2023 · 1 comment

Comments

@ForceBru
Copy link

ForceBru commented Jan 1, 2023

Code that doesn't work

julia> using DataFrames, Lasso

julia> df = DataFrame(x=randn(100), y=3randn(100) .+ 1);

julia> fit(LassoModel, @formula(x ~ 1 + y), df)
ERROR: ArgumentError: Model type LassoModel doesn't support intercept specified in formula x ~ 1 + y
Stacktrace:
 [1] apply_schema(t::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term}}, schema::StatsModels.Schema, Mod::Type{LassoModel})
   @ StatsModels ~/.julia/packages/StatsModels/fK0P3/src/schema.jl:288
 [2] ModelFrame(f::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term}}, data::NamedTuple{(:x, :y), Tuple{Vector{Float64}, Vector{Float64}}}; model::Type{LassoModel}, contrasts::Dict{Symbol, Any})
   @ StatsModels ~/.julia/packages/StatsModels/fK0P3/src/modelframe.jl:84
 [3] kwcall(::NamedTuple{(:model, :contrasts), Tuple{UnionAll, Dict{Symbol, Any}}}, ::Type{ModelFrame}, f::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term}}, data::NamedTuple{(:x, :y), Tuple{Vector{Float64}, Vector{Float64}}})
   @ StatsModels ~/.julia/packages/StatsModels/fK0P3/src/modelframe.jl:73
 [4] fit(::Type{LassoModel}, ::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term}}, ::DataFrame; contrasts::Dict{Symbol, Any}, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ StatsModels ~/.julia/packages/StatsModels/fK0P3/src/statsmodel.jl:85
 [5] fit(::Type{LassoModel}, ::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term}}, ::DataFrame)
   @ StatsModels ~/.julia/packages/StatsModels/fK0P3/src/statsmodel.jl:78
 [6] top-level scope
   @ REPL[7]:1

Why can I not manually specify an intercept like @formula(x ~ 1 + y)? The documentation ?@formula says:

1, 0, and -1 indicate the presence (for 1) or absence (for 0 and -1) of an intercept column.

So 1 is a valid intercept specification, like in R. This @formula also works in GLM.lm.

Code that works

If I write @formula(x ~ y), Lasso.jl will automatically fit a model with an intercept:

julia> fit(LassoModel, @formula(x ~ y), df)
StatsModels.TableRegressionModel{LassoModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, MinAICc}, Matrix{Float64}}

x ~ y

Coefficients:
LassoModel using MinAICc(2) segment of the regularization path.

Coefficients:
──────────────
      Estimate
──────────────
x1  -0.132743
x2   0.0497596
──────────────

I assume the first coefficient is the intercept and the second one is multiplied by y, so the model is:

x = -0.132743 + 0.0497596 * y

So, intercepts are supported, but I can't manually specify that I want an intercept.

More code that doesn't work

Let's fit a model without an intercept. I specify this with the 0 in @formula(x ~ 0 + y).

julia> fit(LassoModel, @formula(x ~ 0 + y), df)
StatsModels.TableRegressionModel{LassoModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, MinAICc}, Matrix{Float64}}

x ~ 0 + y

Coefficients:
LassoModel using MinAICc(2) segment of the regularization path.

Coefficients:
──────────────
      Estimate
──────────────
x1  -0.132743
x2   0.0497596
──────────────

It seems like the package ignored the zero in the formula, fitted an intercept -0.132743 anyway and produced the same model as above, even though the @formula is different. R's glmnet supports fitting without an intercept since 2013.


It would be nice if it were possible to specify the intercept in the formula.

Versions

  • Julia v1.9-beta2
  • Lasso v0.7.0
@patrickm663
Copy link

Hi @ForceBru

When using Lasso.jl, I noticed that to exclude the intercept, it needs to be specified as an argument in fit() as in fit(LassoModel,...; intercept=false) -- rather than in @formula(...) like with GLM.jl. I haven't stepped through the source code to understand why.

I hope this helps.

Regards
Patrick

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants