Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to Tables.jl API #20

Merged
merged 34 commits into from
Jul 15, 2019
Merged

Switch to Tables.jl API #20

merged 34 commits into from
Jul 15, 2019

Conversation

rofinn
Copy link
Member

@rofinn rofinn commented Jul 2, 2019

Looks like we needed to do some cleanup of the Context type, but otherwise this transition was pretty smooth. It also looks like using the Tables interface has actually improved performance by minimizing data copies. Closes #21 #22 #24 #6

Dataset:

julia> data = dataset("boot", "neuro")
469×6 DataFrame
│ Row │ V1       │ V2       │ V3      │ V4       │ V5       │ V6       │
│     │ Float64⍰ │ Float64⍰ │ Float64 │ Float64⍰ │ Float64⍰ │ Float64⍰ │
├─────┼──────────┼──────────┼─────────┼──────────┼──────────┼──────────┤
│ 1   │ missing  │ -203.7   │ -84.1   │ 18.5     │ missing  │ missing  │
│ 2   │ missing  │ -203.0   │ -97.8   │ 25.8     │ 134.7    │ missing  │
│ 3   │ missing  │ -249.0   │ -92.1   │ 27.8     │ 177.1    │ missing  │
│ 4   │ missing  │ -231.5   │ -97.5   │ 27.0     │ 150.3    │ missing  │
│ 5   │ missing  │ missing  │ -130.1  │ 25.8     │ 160.0    │ missing  │
...
│ 465 │ missing  │ -140.8   │ -38.7   │ 58.1     │ 186.3    │ missing  │
│ 466 │ missing  │ -149.5   │ -40.3   │ 62.8     │ 139.7    │ 242.5    │
│ 467 │ -247.6   │ -157.8   │ -53.3   │ 28.3     │ 122.9    │ 227.6    │
│ 468 │ missing  │ -154.9   │ -50.8   │ 28.1     │ 119.9    │ 201.1    │
│ 469 │ missing  │ -180.7   │ -70.9   │ 33.7     │ 114.8    │ 222.5    │

Original:

julia> result = chain(
           data,
           Impute.Interpolate(),
           Impute.LOCF(),
           Impute.NOCB();
           limit=1.0
       )
┌ Warning: `colwise(f, d::AbstractDataFrame)` is deprecated, use `[f(col) for col = eachcol(d)]` instead.
│   caller = impute!(::Impute.Interpolate, ::Impute.Context, ::DataFrame) at imputors.jl:70
└ @ Impute ~/.playground/share/tmp-impute/depot/packages/Impute/UX99F/src/imputors.jl:70
┌ Warning: `colwise(f, d::AbstractDataFrame)` is deprecated, use `[f(col) for col = eachcol(d)]` instead.
│   caller = impute!(::Impute.LOCF, ::Impute.Context, ::DataFrame) at imputors.jl:70
└ @ Impute ~/.playground/share/tmp-impute/depot/packages/Impute/UX99F/src/imputors.jl:70
┌ Warning: `colwise(f, d::AbstractDataFrame)` is deprecated, use `[f(col) for col = eachcol(d)]` instead.
│   caller = impute!(::Impute.NOCB, ::Impute.Context, ::DataFrame) at imputors.jl:70
└ @ Impute ~/.playground/share/tmp-impute/depot/packages/Impute/UX99F/src/imputors.jl:70
469×6 DataFrame
│ Row │ V1       │ V2       │ V3      │ V4       │ V5       │ V6       │
│     │ Float64⍰ │ Float64⍰ │ Float64 │ Float64⍰ │ Float64⍰ │ Float64⍰ │
├─────┼──────────┼──────────┼─────────┼──────────┼──────────┼──────────┤
│ 1   │ -233.6   │ -203.7   │ -84.1   │ 18.5     │ 134.7    │ 222.7    │
│ 2   │ -233.6   │ -203.0   │ -97.8   │ 25.8     │ 134.7    │ 222.7    │
│ 3   │ -233.6   │ -249.0   │ -92.1   │ 27.8     │ 177.1    │ 222.7    │
│ 4   │ -233.6   │ -231.5   │ -97.5   │ 27.0     │ 150.3    │ 222.7    │
│ 5   │ -233.6   │ -227.3   │ -130.1  │ 25.8     │ 160.0    │ 222.7    │
...
│ 465 │ -239.8   │ -140.8   │ -38.7   │ 58.1     │ 186.3    │ 236.375  │
│ 466 │ -243.7   │ -149.5   │ -40.3   │ 62.8     │ 139.7    │ 242.5    │
│ 467 │ -247.6   │ -157.8   │ -53.3   │ 28.3     │ 122.9    │ 227.6    │
│ 468 │ -247.6   │ -154.9   │ -50.8   │ 28.1     │ 119.9    │ 201.1    │
│ 469 │ -247.6   │ -180.7   │ -70.9   │ 33.7     │ 114.8    │ 222.5    │

julia> @benchmark chain(
           $data,
           Impute.Interpolate(),
           Impute.LOCF(),
           Impute.NOCB();
           limit=1.0
       )
BenchmarkTools.Trial:
  memory estimate:  241.88 KiB
  allocs estimate:  11304
  --------------
  minimum time:     753.368 μs (0.00% GC)
  median time:      767.009 μs (0.00% GC)
  mean time:        815.656 μs (2.97% GC)
  maximum time:     46.291 ms (98.26% GC)
  --------------
  samples:          6114
  evals/sample:     1

New:

julia> result = chain(
           data,
           Impute.Interpolate(),
           Impute.LOCF(),
           Impute.NOCB();
           limit=1.0
       )
469×6 DataFrame
│ Row │ V1       │ V2       │ V3      │ V4       │ V5       │ V6       │
│     │ Float64⍰ │ Float64⍰ │ Float64 │ Float64⍰ │ Float64⍰ │ Float64⍰ │
├─────┼──────────┼──────────┼─────────┼──────────┼──────────┼──────────┤
│ 1   │ -233.6   │ -203.7   │ -84.1   │ 18.5     │ 134.7    │ 222.7    │
│ 2   │ -233.6   │ -203.0   │ -97.8   │ 25.8     │ 134.7    │ 222.7    │
│ 3   │ -233.6   │ -249.0   │ -92.1   │ 27.8     │ 177.1    │ 222.7    │
│ 4   │ -233.6   │ -231.5   │ -97.5   │ 27.0     │ 150.3    │ 222.7    │
│ 5   │ -233.6   │ -227.3   │ -130.1  │ 25.8     │ 160.0    │ 222.7    │
...
│ 465 │ -239.8   │ -140.8   │ -38.7   │ 58.1     │ 186.3    │ 236.375  │
│ 466 │ -243.7   │ -149.5   │ -40.3   │ 62.8     │ 139.7    │ 242.5    │
│ 467 │ -247.6   │ -157.8   │ -53.3   │ 28.3     │ 122.9    │ 227.6    │
│ 468 │ -247.6   │ -154.9   │ -50.8   │ 28.1     │ 119.9    │ 201.1    │
│ 469 │ -247.6   │ -180.7   │ -70.9   │ 33.7     │ 114.8    │ 222.5    │

julia> @benchmark chain(
           $data,
           Impute.Interpolate(),
           Impute.LOCF(),
           Impute.NOCB();
           limit=1.0
       )
BenchmarkTools.Trial:
  memory estimate:  196.75 KiB
  allocs estimate:  10853
  --------------
  minimum time:     307.765 μs (0.00% GC)
  median time:      318.367 μs (0.00% GC)
  mean time:        347.685 μs (4.29% GC)
  maximum time:     45.272 ms (98.96% GC)
  --------------
  samples:          10000
  evals/sample:     1

TODO: Tag new releases of

@rofinn rofinn mentioned this pull request Jul 3, 2019
8 tasks
src/Impute.jl Outdated Show resolved Hide resolved
Copy link
Member

@oxinabox oxinabox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty cool.

I have not reviewed the tests.
Also it is quiet long so I started to flack out towards the end.

I think we probably should support taking the obsdim as a kwarg.
It is what StatsBase is moving towards if i recall discussions with @nalimilan
correctly

src/Impute.jl Outdated Show resolved Hide resolved
src/Impute.jl Outdated Show resolved Hide resolved
src/Impute.jl Outdated Show resolved Hide resolved
src/Impute.jl Outdated Show resolved Hide resolved
src/Impute.jl Outdated Show resolved Hide resolved
src/imputors/locf.jl Outdated Show resolved Hide resolved
src/imputors/locf.jl Show resolved Hide resolved
src/imputors/interp.jl Outdated Show resolved Hide resolved
src/imputors/nocb.jl Outdated Show resolved Hide resolved
src/imputors/nocb.jl Outdated Show resolved Hide resolved
@nalimilan
Copy link
Contributor

I think we probably should support taking the obsdim as a kwarg.
It is what StatsBase is moving towards if i recall discussions with @nalimilan
correctly

Unfortunately I don't think we have clearly agreed on the standard keyword argument for this. pairwise in Distances uses dims, like cov and co. in Statistics, but that's not super obvious. So we could use obsdim or vardim, but we need to discuss that somewhere. One issue is that for pairwise the question isn't really where are observations and where are variables, but rather what you want to compute (both could make sense). But maybe that's a special function and in most cases obsdim or vardim is OK.

@oxinabox
Copy link
Member

oxinabox commented Jul 9, 2019

but we have at least settled that where possible it should be a kwarg
and not just baked into the function itself. (and mentioned in doc string).

@nickrobinson251
Copy link
Contributor

fwiw CovarianceEstimation.jl also takes a dims kwarg for specifying observations

@rofinn
Copy link
Member Author

rofinn commented Jul 9, 2019

That's true, but the annoying thing is that dims only makes sense when the input is a matrix. I'd also rather not need to pass kwargs to the internal impute functions because our handy functions extract the kwargs into an imputor type internally. If there was some nice pattern for consuming and forwarding kwargs then that might work... even if it is a little inconsistent.

@rofinn
Copy link
Member Author

rofinn commented Jul 9, 2019

Based on a recommendation from Curtis I've introduced a vardim kwarg to the Imputor constructors and imputation convenience functions. The vardim field in each Imputor is ignored unless we're operating on matrices.

@invenia invenia deleted a comment from codecov bot Jul 9, 2019
@rofinn
Copy link
Member Author

rofinn commented Jul 11, 2019

Okay, I've resolved and applied most of the recommendations. I've commented on things that should be handled in a separate PR and responded if I disagree (e.g., I think |> makes sense here and you can always use explicit function calls if you don't like that).

@codecov
Copy link

codecov bot commented Jul 11, 2019

Codecov Report

Merging #20 into master will increase coverage by 5.73%.
The diff coverage is 98.22%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #20      +/-   ##
==========================================
+ Coverage   92.07%   97.81%   +5.73%     
==========================================
  Files           9       10       +1     
  Lines         101      183      +82     
==========================================
+ Hits           93      179      +86     
+ Misses          8        4       -4
Impacted Files Coverage Δ
src/imputors/nocb.jl 100% <100%> (ø) ⬆️
src/imputors/interp.jl 100% <100%> (ø) ⬆️
src/Impute.jl 90% <100%> (+23.33%) ⬆️
src/imputors/chain.jl 100% <100%> (ø) ⬆️
src/deprecated.jl 100% <100%> (ø)
src/context.jl 100% <100%> (+6.25%) ⬆️
src/imputors/locf.jl 100% <100%> (ø) ⬆️
src/imputors.jl 100% <100%> (ø) ⬆️
src/imputors/fill.jl 100% <100%> (ø) ⬆️
src/imputors/drop.jl 92.5% <92.5%> (-7.5%) ⬇️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 255ac57...051d6ce. Read the comment docs.

wv::AbstractWeights;
limit::Float64=1.0,
is_missing::Function=ismissing,
on_complete::Function=complete
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we decided we were getting rid of this?

Copy link
Member Author

@rofinn rofinn Jul 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'll be useful for handling custom imputation failure logic without needing a new context type. For example, you could change the on_complete function to throw a warning instead of needing a try/catch blocks everywhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not extatic about it about it, but Ok

Copy link
Member

@oxinabox oxinabox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, cool
round 2 of review done.

This PR is big enough it certainly deserves multiple rounds of through review.

src/context.jl Outdated Show resolved Hide resolved

for imputor in imputors
imp = typeof(imputor)(
(isa(x, AbstractContext) ? ctx : x for x in fieldvalues(imputor))...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say

Lets define:
Base.similar on imputors,
that takes an Imputor,
and a new context
and does this.

But since this is bring removed 🤷‍♂

Base.depwarn(
"""
chain(data, args...) is deprecated.
Please use result = imp1(data) |> imp2 |> imp3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An aside: does work on these?

Can we do:
(imp3 ∘ imp2 ∘ imp1)(data)
?

If one must use |> then at least use it fully

Suggested change
Please use result = imp1(data) |> imp2 |> imp3
Please use result = data |> imp1 |> imp2 |> imp3

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes works. I'm not a fan of bare piping of data.

julia> df = dataset("boot", "neuro")
469×6 DataFrame
│ Row │ V1       │ V2       │ V3      │ V4       │ V5       │ V6       │
│     │ Float64⍰ │ Float64⍰ │ Float64 │ Float64⍰ │ Float64⍰ │ Float64⍰ │
├─────┼──────────┼──────────┼─────────┼──────────┼──────────┼──────────┤
│ 1   │ missing  │ -203.7   │ -84.1   │ 18.5     │ missing  │ missing  │
│ 2   │ missing  │ -203.0   │ -97.8   │ 25.8     │ 134.7    │ missing  │
│ 3   │ missing  │ -249.0   │ -92.1   │ 27.8     │ 177.1    │ missing  │
│ 4   │ missing  │ -231.5   │ -97.5   │ 27.0     │ 150.3    │ missing  │
│ 5   │ missing  │ missing  │ -130.1  │ 25.8     │ 160.0    │ missing  │
│ 6   │ missing  │ -223.1   │ -70.7   │ 62.1     │ 197.5    │ missing  │
│ 7   │ missing  │ -164.8   │ -12.2   │ 76.8     │ 202.8    │ missing  │
│ 8   │ missing  │ -221.6   │ -81.9   │ 27.5     │ 144.5    │ missing  │
│ 9   │ missing  │ -153.7   │ -17.0   │ 76.1     │ 222.4    │ missing  │
│ 10  │ missing  │ -184.7   │ -47.3   │ 74.4     │ 208.9    │ missing  │
│ 11  │ missing  │ missing  │ -148.8  │ 11.4     │ 137.7    │ missing  │
│ 12  │ missing  │ -197.6   │ -6.4    │ 137.1    │ missing  │ missing  │
│ 13  │ missing  │ -247.8   │ -35.4   │ 80.9     │ 229.5    │ missing  │
│ 14  │ missing  │ -227.0   │ -104.7  │ 20.2     │ 140.2    │ missing  │
│ 15  │ -233.6   │ -115.9   │ -10.5   │ 70.0     │ 202.6    │ missing  │
│ 16  │ missing  │ -232.4   │ -100.6  │ 16.8     │ 145.1    │ missing  │
│ 17  │ missing  │ -199.4   │ -58.2   │ 29.1     │ 184.4    │ missing  │
│ 18  │ missing  │ -195.7   │ -89.5   │ 26.4     │ 142.7    │ missing  │
│ 19  │ missing  │ -180.1   │ -65.0   │ 27.3     │ 171.1    │ missing  │
│ 20  │ missing  │ missing  │ -85.2   │ 27.1     │ missing  │ missing  │
│ 21  │ missing  │ -217.3   │ -77.1   │ 27.6     │ 151.5    │ missing  │
│ 22  │ missing  │ -139.7   │ -15.8   │ 83.0     │ 215.5    │ missing  │
│ 23  │ -249.6   │ -132.8   │ -14.1   │ 78.1     │ 205.7    │ missing  │
│ 24  │ missing  │ -152.7   │ -36.9   │ 29.7     │ 149.8    │ missing  │
│ 25  │ missing  │ -224.1   │ -81.9   │ 29.1     │ 172.2    │ missing  │
│ 26  │ missing  │ missing  │ -235.8  │ 6.0      │ 144.4    │ missing  │
│ 27  │ missing  │ -202.8   │ -45.1   │ 84.0     │ 227.3    │ missing  │
│ 28  │ -240.9   │ -138.4   │ -21.5   │ 73.4     │ 210.6    │ missing  │
│ 29  │ -247.1   │ -128.2   │ -31.3   │ 29.2     │ 143.1    │ missing  │
│ 30  │ missing  │ -185.4   │ -80.3   │ 23.9     │ 115.8    │ 222.7    │
│ 31  │ missing  │ -182.5   │ -75.8   │ 27.5     │ 165.2    │ missing  │
│ 32  │ missing  │ -202.2   │ -99.1   │ 23.8     │ 136.3    │ 242.5    │
│ 33  │ missing  │ -193.3   │ -82.6   │ 26.3     │ 160.5    │ missing  │
│ 34  │ missing  │ -189.4   │ -63.3   │ 27.6     │ 136.8    │ missing  │
│ 35  │ missing  │ -149.0   │ -31.0   │ 73.5     │ 187.8    │ missing  │
│ 36  │ missing  │ -162.4   │ -26.5   │ 72.6     │ missing  │ missing  │
⋮
│ 433 │ missing  │ -220.6   │ -114.2  │ 9.7      │ 106.4    │ 227.9    │
│ 434 │ -219.9   │ -120.9   │ -1.3    │ 99.5     │ 207.6    │ missing  │
│ 435 │ missing  │ -240.5   │ -110.3  │ 26.1     │ 142.8    │ missing  │
│ 436 │ missing  │ -239.6   │ -121.4  │ 2.9      │ 124.9    │ missing  │
│ 437 │ missing  │ -139.8   │ -7.3    │ 121.0    │ missing  │ missing  │
│ 438 │ missing  │ -212.0   │ -66.2   │ 50.4     │ 178.2    │ missing  │
│ 439 │ missing  │ -232.7   │ -109.2  │ 18.4     │ 127.5    │ missing  │
│ 440 │ missing  │ -236.3   │ -115.1  │ 5.1      │ 109.0    │ 212.0    │
│ 441 │ -241.2   │ -107.1   │ -9.1    │ 95.1     │ 198.6    │ missing  │
│ 442 │ -226.7   │ -143.8   │ -30.4   │ 75.8     │ 196.6    │ missing  │
│ 443 │ missing  │ -131.8   │ -26.5   │ 64.7     │ 177.2    │ missing  │
│ 444 │ missing  │ -144.9   │ -0.9    │ 105.3    │ 230.9    │ missing  │
│ 445 │ missing  │ -214.0   │ -81.8   │ 66.1     │ 191.3    │ missing  │
│ 446 │ missing  │ -210.6   │ -94.3   │ 16.7     │ 125.5    │ 239.7    │
│ 447 │ -215.8   │ -114.8   │ -18.4   │ 65.3     │ 171.6    │ 249.7    │
│ 448 │ missing  │ -156.0   │ -14.0   │ 113.7    │ 249.3    │ missing  │
│ 449 │ missing  │ -210.5   │ -41.9   │ missing  │ missing  │ missing  │
│ 450 │ missing  │ -189.2   │ -72.0   │ 56.8     │ 133.8    │ 246.7    │
│ 451 │ missing  │ -214.2   │ -102.2  │ 5.5      │ 75.6     │ 154.3    │
│ 452 │ -219.6   │ -107.9   │ -16.0   │ 101.7    │ 186.0    │ missing  │
│ 453 │ missing  │ -153.0   │ -38.0   │ 61.3     │ 144.4    │ 245.9    │
│ 454 │ missing  │ -179.8   │ -63.4   │ 56.0     │ 157.5    │ missing  │
│ 455 │ missing  │ -174.5   │ -44.8   │ 73.3     │ 179.7    │ missing  │
│ 456 │ missing  │ -206.8   │ -108.9  │ 3.7      │ 102.1    │ 210.3    │
│ 457 │ missing  │ -169.5   │ -79.7   │ 27.9     │ 129.4    │ 242.8    │
│ 458 │ -222.2   │ -104.6   │ -2.4    │ 84.3     │ 204.7    │ missing  │
│ 459 │ -236.3   │ -124.0   │ -6.8    │ 95.7     │ 196.0    │ missing  │
│ 460 │ missing  │ -216.5   │ -90.2   │ 27.8     │ 138.9    │ missing  │
│ 461 │ missing  │ -163.2   │ -43.6   │ 69.5     │ 173.9    │ missing  │
│ 462 │ missing  │ -207.3   │ -88.3   │ 9.6      │ 104.1    │ 218.0    │
│ 463 │ -242.6   │ -142.0   │ -21.8   │ 69.8     │ 148.7    │ missing  │
│ 464 │ -235.9   │ -128.8   │ -33.1   │ 68.8     │ 177.1    │ missing  │
│ 465 │ missing  │ -140.8   │ -38.7   │ 58.1     │ 186.3    │ missing  │
│ 466 │ missing  │ -149.5   │ -40.3   │ 62.8     │ 139.7    │ 242.5    │
│ 467 │ -247.6   │ -157.8   │ -53.3   │ 28.3     │ 122.9    │ 227.6    │
│ 468 │ missing  │ -154.9   │ -50.8   │ 28.1     │ 119.9    │ 201.1    │
│ 469 │ missing  │ -180.7   │ -70.9   │ 33.7     │ 114.8    │ 222.5    │

julia> imp = Impute.interp() ∘ Impute.locf() ∘ Impute.nocb()
#52 (generic function with 1 method)

julia> imp(df)
469×6 DataFrame
│ Row │ V1       │ V2       │ V3      │ V4       │ V5       │ V6       │
│     │ Float64⍰ │ Float64⍰ │ Float64 │ Float64⍰ │ Float64⍰ │ Float64⍰ │
├─────┼──────────┼──────────┼─────────┼──────────┼──────────┼──────────┤
│ 1   │ -233.6   │ -203.7   │ -84.1   │ 18.5     │ 134.7    │ 222.7    │
│ 2   │ -233.6   │ -203.0   │ -97.8   │ 25.8     │ 134.7    │ 222.7    │
│ 3   │ -233.6   │ -249.0   │ -92.1   │ 27.8     │ 177.1    │ 222.7    │
│ 4   │ -233.6   │ -231.5   │ -97.5   │ 27.0     │ 150.3    │ 222.7    │
│ 5   │ -233.6   │ -223.1   │ -130.1  │ 25.8     │ 160.0    │ 222.7    │
│ 6   │ -233.6   │ -223.1   │ -70.7   │ 62.1     │ 197.5    │ 222.7    │
│ 7   │ -233.6   │ -164.8   │ -12.2   │ 76.8     │ 202.8    │ 222.7    │
│ 8   │ -233.6   │ -221.6   │ -81.9   │ 27.5     │ 144.5    │ 222.7    │
│ 9   │ -233.6   │ -153.7   │ -17.0   │ 76.1     │ 222.4    │ 222.7    │
│ 10  │ -233.6   │ -184.7   │ -47.3   │ 74.4     │ 208.9    │ 222.7    │
│ 11  │ -233.6   │ -197.6   │ -148.8  │ 11.4     │ 137.7    │ 222.7    │
│ 12  │ -233.6   │ -197.6   │ -6.4    │ 137.1    │ 229.5    │ 222.7    │
│ 13  │ -233.6   │ -247.8   │ -35.4   │ 80.9     │ 229.5    │ 222.7    │
│ 14  │ -233.6   │ -227.0   │ -104.7  │ 20.2     │ 140.2    │ 222.7    │
│ 15  │ -233.6   │ -115.9   │ -10.5   │ 70.0     │ 202.6    │ 222.7    │
│ 16  │ -249.6   │ -232.4   │ -100.6  │ 16.8     │ 145.1    │ 222.7    │
│ 17  │ -249.6   │ -199.4   │ -58.2   │ 29.1     │ 184.4    │ 222.7    │
│ 18  │ -249.6   │ -195.7   │ -89.5   │ 26.4     │ 142.7    │ 222.7    │
│ 19  │ -249.6   │ -180.1   │ -65.0   │ 27.3     │ 171.1    │ 222.7    │
│ 20  │ -249.6   │ -217.3   │ -85.2   │ 27.1     │ 151.5    │ 222.7    │
│ 21  │ -249.6   │ -217.3   │ -77.1   │ 27.6     │ 151.5    │ 222.7    │
│ 22  │ -249.6   │ -139.7   │ -15.8   │ 83.0     │ 215.5    │ 222.7    │
│ 23  │ -249.6   │ -132.8   │ -14.1   │ 78.1     │ 205.7    │ 222.7    │
│ 24  │ -240.9   │ -152.7   │ -36.9   │ 29.7     │ 149.8    │ 222.7    │
│ 25  │ -240.9   │ -224.1   │ -81.9   │ 29.1     │ 172.2    │ 222.7    │
│ 26  │ -240.9   │ -202.8   │ -235.8  │ 6.0      │ 144.4    │ 222.7    │
│ 27  │ -240.9   │ -202.8   │ -45.1   │ 84.0     │ 227.3    │ 222.7    │
│ 28  │ -240.9   │ -138.4   │ -21.5   │ 73.4     │ 210.6    │ 222.7    │
│ 29  │ -247.1   │ -128.2   │ -31.3   │ 29.2     │ 143.1    │ 222.7    │
│ 30  │ -247.0   │ -185.4   │ -80.3   │ 23.9     │ 115.8    │ 222.7    │
│ 31  │ -247.0   │ -182.5   │ -75.8   │ 27.5     │ 165.2    │ 242.5    │
│ 32  │ -247.0   │ -202.2   │ -99.1   │ 23.8     │ 136.3    │ 242.5    │
│ 33  │ -247.0   │ -193.3   │ -82.6   │ 26.3     │ 160.5    │ 237.9    │
│ 34  │ -247.0   │ -189.4   │ -63.3   │ 27.6     │ 136.8    │ 237.9    │
│ 35  │ -247.0   │ -149.0   │ -31.0   │ 73.5     │ 187.8    │ 237.9    │
│ 36  │ -247.0   │ -162.4   │ -26.5   │ 72.6     │ 158.5    │ 237.9    │
⋮
│ 433 │ -219.9   │ -220.6   │ -114.2  │ 9.7      │ 106.4    │ 227.9    │
│ 434 │ -219.9   │ -120.9   │ -1.3    │ 99.5     │ 207.6    │ 212.0    │
│ 435 │ -241.2   │ -240.5   │ -110.3  │ 26.1     │ 142.8    │ 212.0    │
│ 436 │ -241.2   │ -239.6   │ -121.4  │ 2.9      │ 124.9    │ 212.0    │
│ 437 │ -241.2   │ -139.8   │ -7.3    │ 121.0    │ 178.2    │ 212.0    │
│ 438 │ -241.2   │ -212.0   │ -66.2   │ 50.4     │ 178.2    │ 212.0    │
│ 439 │ -241.2   │ -232.7   │ -109.2  │ 18.4     │ 127.5    │ 212.0    │
│ 440 │ -241.2   │ -236.3   │ -115.1  │ 5.1      │ 109.0    │ 212.0    │
│ 441 │ -241.2   │ -107.1   │ -9.1    │ 95.1     │ 198.6    │ 239.7    │
│ 442 │ -226.7   │ -143.8   │ -30.4   │ 75.8     │ 196.6    │ 239.7    │
│ 443 │ -215.8   │ -131.8   │ -26.5   │ 64.7     │ 177.2    │ 239.7    │
│ 444 │ -215.8   │ -144.9   │ -0.9    │ 105.3    │ 230.9    │ 239.7    │
│ 445 │ -215.8   │ -214.0   │ -81.8   │ 66.1     │ 191.3    │ 239.7    │
│ 446 │ -215.8   │ -210.6   │ -94.3   │ 16.7     │ 125.5    │ 239.7    │
│ 447 │ -215.8   │ -114.8   │ -18.4   │ 65.3     │ 171.6    │ 249.7    │
│ 448 │ -219.6   │ -156.0   │ -14.0   │ 113.7    │ 249.3    │ 246.7    │
│ 449 │ -219.6   │ -210.5   │ -41.9   │ 56.8     │ 133.8    │ 246.7    │
│ 450 │ -219.6   │ -189.2   │ -72.0   │ 56.8     │ 133.8    │ 246.7    │
│ 451 │ -219.6   │ -214.2   │ -102.2  │ 5.5      │ 75.6     │ 154.3    │
│ 452 │ -219.6   │ -107.9   │ -16.0   │ 101.7    │ 186.0    │ 245.9    │
│ 453 │ -222.2   │ -153.0   │ -38.0   │ 61.3     │ 144.4    │ 245.9    │
│ 454 │ -222.2   │ -179.8   │ -63.4   │ 56.0     │ 157.5    │ 210.3    │
│ 455 │ -222.2   │ -174.5   │ -44.8   │ 73.3     │ 179.7    │ 210.3    │
│ 456 │ -222.2   │ -206.8   │ -108.9  │ 3.7      │ 102.1    │ 210.3    │
│ 457 │ -222.2   │ -169.5   │ -79.7   │ 27.9     │ 129.4    │ 242.8    │
│ 458 │ -222.2   │ -104.6   │ -2.4    │ 84.3     │ 204.7    │ 218.0    │
│ 459 │ -236.3   │ -124.0   │ -6.8    │ 95.7     │ 196.0    │ 218.0    │
│ 460 │ -242.6   │ -216.5   │ -90.2   │ 27.8     │ 138.9    │ 218.0    │
│ 461 │ -242.6   │ -163.2   │ -43.6   │ 69.5     │ 173.9    │ 218.0    │
│ 462 │ -242.6   │ -207.3   │ -88.3   │ 9.6      │ 104.1    │ 218.0    │
│ 463 │ -242.6   │ -142.0   │ -21.8   │ 69.8     │ 148.7    │ 242.5    │
│ 464 │ -235.9   │ -128.8   │ -33.1   │ 68.8     │ 177.1    │ 242.5    │
│ 465 │ -247.6   │ -140.8   │ -38.7   │ 58.1     │ 186.3    │ 242.5    │
│ 466 │ -247.6   │ -149.5   │ -40.3   │ 62.8     │ 139.7    │ 242.5    │
│ 467 │ -247.6   │ -157.8   │ -53.3   │ 28.3     │ 122.9    │ 227.6    │
│ 468 │ -247.6   │ -154.9   │ -50.8   │ 28.1     │ 119.9    │ 201.1    │
│ 469 │ -247.6   │ -180.7   │ -70.9   │ 33.7     │ 114.8    │ 222.5    │

src/imputors.jl Show resolved Hide resolved
src/imputors.jl Outdated Show resolved Hide resolved
end

table = Tables.select(table, cnames...) |> materializer(table)
return table
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See again, I agree with Nick but will not block the PR over it

wv::AbstractWeights;
limit::Float64=1.0,
is_missing::Function=ismissing,
on_complete::Function=complete
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not extatic about it about it, but Ok

test/runtests.jl Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
@rofinn
Copy link
Member Author

rofinn commented Jul 12, 2019

@oxinabox Except having lots of comments makes it hard for github to load the page. If your comments can be added to a separate PR I'd appreciate if you made an issue. I also wasn't expecting people to review.

@oxinabox oxinabox mentioned this pull request Jul 12, 2019
* `on_complete::Function`: a function to run when imputation is complete
"""
function Context(;
limit::Float64=0.1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the default here 0.1 (for WeightedContext it is 1.0)?

They should be consistent, at least

Suggested change
limit::Float64=0.1,
limit::Float64=1.0,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because that change is breaking.
so it can't be changed til the next major release

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having them be inconsistent seems uncomfortable, but then again having the default be not 1.0 is also weird... so 🤷‍♂

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default behaviour should be not to error since the threshold is somewhat arbitrary and data dependent.

end

"""
Context
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be consistent with the docstring for WeightedContext

Suggested change
Context
Context(; limit=1.0, is_missing=ismissing, on_complete=complete)

Althought presuming complete is not exported, this should be Impute.complete in the docstring

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not. I might be inclined to add an @ref once I'm happy with this, but I'm not sure it's worth investing a lot of time to make the API nice and well documented before we decide that we want to use it.

# since Tables.rows is just an iterator
table = Iterators.filter(rows) do r
!any(x -> ismissing(c, x), propertyvalues(r))
end |> materializer(table)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it's "de facto" the style, because it's not used anywhere (becuase a bunch of people dislike it) -- with that in mind it would be a kindness to just move the materializer call to the next line

(but obviously not gonna not-approve over this)

try
imp.context() do c
for x in var
ismissing(c, x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am also confused by this

test/deprecated.jl Show resolved Hide resolved
src/context.jl Outdated Show resolved Hide resolved
rofinn and others added 2 commits July 15, 2019 11:34
Co-Authored-By: Nick Robinson <[email protected]>
Co-Authored-By: Nick Robinson <[email protected]>
@nickrobinson251
Copy link
Contributor

I do not know how to approve on Github but if someone points me to the button i'll approve :)

@rofinn
Copy link
Member Author

rofinn commented Jul 15, 2019

Alright, since I don't have as strong of an argument for using a pipe to the materializer I've opted to change that. I still have a strong preference for using julia's |> and operators for composing imputation pipelines though.

@rofinn rofinn merged commit 77b9fa3 into master Jul 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tables.jl
4 participants