Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

separate pl.list() and pl.concat_list #17307

Closed
mcrumiller opened this issue Jun 30, 2024 · 4 comments
Closed

separate pl.list() and pl.concat_list #17307

mcrumiller opened this issue Jun 30, 2024 · 4 comments
Labels
enhancement New feature or an improvement of an existing feature

Comments

@mcrumiller
Copy link
Contributor

mcrumiller commented Jun 30, 2024

Description

Edit: I just noticed there is more to the documentation in 1.0 which isn't on the current 0.20 documentation, which clarifies that non-list dtypes are cast to lists prior to concatenation, but my proposal still stands.

Edit 2: found #8510 which seems to be the same issue/request. I'll wait for @stinodego's feedback before closing.


The description of pl.concat_list is:

Horizontally concatenate columns into a single list column.

This is confusing, as discussed in #17294, since the name might imply concatenating existing lists into a single list. This is the current behavior on lists:

import polars as pl
df = pl.DataFrame({
    "a": [[1]],    <-- pl.List(pl.Int64)
    "b": [[2]],
})

df.select(pl.concat_list("a", "b"))
shape: (1, 1)
# ┌───────────┐
# │ a         │
# │ ---       │
# │ list[i64] │
# ╞═══════════╡
# │ [1, 2]    │  <--lists concatenated together
# └───────────┘

However, concat_list also concatenates the values in columns into lists:

import polars as pl
df = pl.DataFrame({
    "a": [1],      <-- pl.Int64
    "b": [2],
})

df.select(pl.concat_list("a", "b"))
shape: (1, 1)
# ┌───────────┐
# │ a         │
# │ ---       │
# │ list[i64] │
# ╞═══════════╡
# │ [1, 2]    │  <--columns concatenated together
# └───────────┘

Note that the result of the operation in both cases is identical. Instead, I propose that we have:

  • pl.list(a, b, ...) which creates a new pl.List column out of the expressions a, b, .... The dtypes must have a common supertype.
  • pl.concat_list(a, b, ...) where a, b, ... must all be pl.List columns, and they are concatenated into a single list. The inner dtypes must have a common supertype.
@mcrumiller mcrumiller added the enhancement New feature or an improvement of an existing feature label Jun 30, 2024
@NickCrews
Copy link

This is the exact semantics I want!

Except may I recommend the signature of list() be
Def list(vals: Iterable, *, dtype=None) this would match the API of pythons list(), it would allow for idempotency, and the dtype allows you to pass in a 0 length array and still know the type.

We are solving this exact same thing in ibis-project/ibis#9473

@stinodego
Copy link
Member

This was discussed in the linked issue. I don't believe adding a separate list function adds much, besides some additional input validation. I'll close this as not planned for now.

@stinodego stinodego closed this as not planned Won't fix, can't repro, duplicate, stale Jun 30, 2024
@mcrumiller
Copy link
Contributor Author

mcrumiller commented Jun 30, 2024

@stinodego how would one collect a list of pl.List(...) columns into an outer list? As in take:

df = pl.DataFrame({
    "a": [[1, 2, 3]],
    "b": [[4, 5, 6]],
})

concatenating the columns into a list:

shape: (1, 1)
┌────────────────────────┐
│ literal                │
│ ---                    │
│ list[list[i64]]        │
╞════════════════════════╡
│ [[1, 2, 3], [4, 5, 6]] │
└────────────────────────┘

AFAIK there's no way to do this, because concat_list is designed to flatten lists, and there is no general method for converting columns into list elements.

@NickCrews
Copy link

NickCrews commented Jun 30, 2024

See my comment in the linked issue, it describes exactly what @mcrumiller talks about, we would need to add special-casing code to ibis in order to be able to support this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

3 participants