Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SVD Imputation - Inexact Error #135

Closed
RaSi96 opened this issue Feb 16, 2023 · 2 comments
Closed

SVD Imputation - Inexact Error #135

RaSi96 opened this issue Feb 16, 2023 · 2 comments

Comments

@RaSi96
Copy link

RaSi96 commented Feb 16, 2023

Hi all,

Apologies for the bother but I was quite curious about the SVD method provided by this package and decided to try it out on a seemingly harmless dataframe:

df = DataFrame(:a => [2, 3, missing, 5], :b => [missing, 9, 16, 25])
4×2 DataFrame
Row │ a        b
    │ Int64?   Int64?
────┼──────────────────
1   │       2  missing
2   │       3        9
3   │ missing       16
4   │       5       25

Where :b is intended to be :a.^2 (the square of :a). After getting a couple of errors trying to use the raw dataframe, I came across #128 and tried using a matrix-converted dataframe, but interestingly ended up with this error:

julia> Impute.svd(Matrix(df))
ERROR: InexactError: Int64(35.5287074347614)
Stacktrace:
[1] Int64
@ ./float.jl:788 [inlined]
[2] convert
@ ./number.jl:7 [inlined]
[3] setindex!
@ ./array.jl:966 [inlined]
[4] setindex!
@ ./subarray.jl:347 [inlined]
[5] copyto_unaliased!(deststyle::IndexLinear, dest::SubArray{Int64, 1, Vector{Int64}, Tuple{UnitRange{Int64}}, true}, srcstyle::IndexLinear, src::Vector{Float64})
@ Base ./abstractarray.jl:1038
[6] copyto!
@ ./abstractarray.jl:1018 [inlined]
[7] copyto!
@ ./broadcast.jl:954 [inlined]
[8] copyto!
@ ./broadcast.jl:913 [inlined]
[9] materialize!
@ ./broadcast.jl:871 [inlined]
[10] materialize!
@ ./broadcast.jl:868 [inlined]
[11] impute!(data::Matrix{Union{Missing, Int64}}, imp::Impute.SVD; dims::Nothing)
@ Impute ~/.julia/packages/Impute/vw7rh/src/imputors/svd.jl:64
[12] impute!
@ ~/.julia/packages/Impute/vw7rh/src/imputors/svd.jl:33 [inlined]
[13] #impute#32
@ ~/.julia/packages/Impute/vw7rh/src/imputors/svd.jl:97 [inlined]
[14] impute
@ ~/.julia/packages/Impute/vw7rh/src/imputors/svd.jl:96 [inlined]
[15] svd(data::Matrix{Union{Missing, Int64}}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ Impute ~/.julia/packages/Impute/vw7rh/src/functional.jl:76
[16] svd(data::Matrix{Union{Missing, Int64}})
@ Impute ~/.julia/packages/Impute/vw7rh/src/functional.jl:74
[17] top-level scope
@ REPL[8]:1

From what I gleaned from a quick Google-Fu (2 separate links), somewhere in the chain of function calls there maybe isn't a round-off check? I think it might be due to the small size of my test dataframe, but I can't be sure because I also tried using it with Turing.jl's Bayesian Linear Regression for imputation purposes and it worked (to some degree of accuracy). Is there anything wrong here or have I overlooked something?

Thanks for your time!

@rofinn
Copy link
Member

rofinn commented Feb 16, 2023

This is likely because Impute.jl is careful not to change your element type for two reasons:

  1. Some imputation methods may fail to impute all values (LOCF/NOCB at the ends)
  2. We don't want to accidentally change the precision on you (Impute.jl should respect if you want to operate on Float32s)

My guess is that Turing.jl is converting your ints to floats for you. With Impute.jl you just need to be explicit about what you want.

julia> Impute.svd(Matrix{Union{Float64, Missing}}(df))
4×2 Matrix{Union{Missing, Float64}}:
 2.0       9.43342
 3.0       9.0
 3.44377  16.0
 5.0      25.0

@RaSi96
Copy link
Author

RaSi96 commented Feb 26, 2023

Hi @rofinn, thanks for responding! Apologies for the late reply, I somehow completely missed being notified about this. Your explanation makes a lot of sense; I tried it again using floats only and it worked:

julia> a = DataFrame(:a => [2.0, 3.0, missing, 5.0], :b => [missing, 9.0, 16, 25.0])
4×2 DataFrame
Row │ a          b
    │ Float64?   Float64?
────┼──────────────────────
1   │      2.0   missing
2   │      3.0       9.0
3   │  missing      16.0
4   │      5.0      25.0

julia> Impute.svd(Matrix(a))
4×2 Matrix{Union{Missing, Float64}}:
2.0       9.43342
3.0       9.0
3.44377  16.0
5.0      25.0

I also tried using it on a different amount of data that I originally intended to use this for, and it worked on that without a problem - drawing on what you've said, it might be because all of that information was by default read as floats. Thanks for your help, I'll close this issue.

@RaSi96 RaSi96 closed this as completed Feb 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants