Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: draft allow data_color to take a palette function #192

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

machow
Copy link
Collaborator

@machow machow commented Feb 13, 2024

This is a very rough draft of what data_color might look like, if the palette argument accepted an arbitrary function, mapping values -> hex colors.

Example:

from great_tables import GT, exibble

small_exibble = exibble[["date", "currency"]]
GT(small_exibble).data_color("currency", palette=lambda vals: ["#FFFFFF" if x > 0 else "#000000" for x in vals])
image

Note two important features of the current implementation:

  • the function takes a list of values, and returns a list of strings
  • the function must return hex strings (e.g. #FFFFFF)

I wonder if there's a nice way to work in polars expressions? The challenge is there's no this expression in polars. We could add a surrogate columns, like _this_ etc.. This would allow people to use pl.when(pl.col("_this_") ...).then(...). However, it doesn't seem ideal.....

An alternative to this might be users using GT.tab_style and passing a polars expression to style.fill() etc...

Edit: I wonder if a nice move could be...

  • Whenever palette is a polars expression, then
  • Select the columns and rows specified to data color (currently, only columns is supported) as a polars DataFrame
  • Run the expression on the DataFrame. The result must be...
    • a DataFrame of the same dimensions.
    • each value is a hex string or null
  • Use the result as the color values

(In a sense, this means that polars.selectors.all() is equivalent to a this construct)

Big questions

  • Does this play well with existing color palette tools? matplotlib.cm.coolwarm() and friends return an array of N_obs x 4 (rgba values). We don't necessarily need to support it, but I'm curious what else is out there!

@rich-iannone
Copy link
Member

I think we ought to support at least hex colors (in all their variations, I think we have regex functions to check for the different representations) and rgba. I'm seeing a lot more of the latter on GitHub mostly in palette repos. Then normalize to hex (I don't believe any of this is lossy).

@rich-iannone
Copy link
Member

Also, I love the surrogate _this_ column idea.

@rich-iannone
Copy link
Member

rich-iannone commented Feb 14, 2024

OTOH, thinking about the requirement for hex colors in the return of the callable, it would be interesting to have the option to perform the validation as a default, but also have the other option to turn that validation off. Only because you can do some pretty sophisticated things with color in HTML like define gradients and even use animation (and this would definitely fail the proposed validation check). Just more food for thought.

@ChristopherRussell
Copy link

Re: what else is out there

colorcet (seems to also be available via matplotlib. claims to provide better best colormaps for continuous data)
https://colorcet.holoviz.org/user_guide/index.html

@ChristopherRussell
Copy link

ChristopherRussell commented Mar 9, 2024

I think it would be really great to be able to do things like this:

https://matplotlib.org/stable/users/explain/colors/colormapnorms.html

One could do that by passing a callable that handles both normalization (mapping values to [0,1]) and the color map (numeric -> hex). Maybe we want to allow that, but in general it is nice if these things are separate, so you can outsource one or the other to pre-existing collections of color maps and normalization tools!

If you are interested in going that route, it seems like the current domain argument overlaps with what matplotlib normalization is trying to do.

Edit: I guess this is similar to transforms in mizani, which i see you are using already.
https://mizani.readthedocs.io/en/stable/transforms.html

@ChristopherRussell
Copy link

ChristopherRussell commented Mar 9, 2024

This is how I would use matplotlib normalizers + a cmap to get hex values:


from typing import Callable

import matplotlib.colors as mcolors
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

def _color_series(
    values: pd.Series,
    colormap: plt.cm,
    norm: mcolors.Normalize | Callable | None = None,
    na_color: str | None = None,
) -> pd.Series:
    """
    Color a Series of numeric values using a Matplotlib colormap and normalization.

    Parameters:
    values : pd.Series
        The Series to color.
    colormap : plt.cm
        The Matplotlib colormap to use.
    norm : mcolors.Normalize | Callable | None
        A matplotlib normalization object (e.g., Normalize, LogNorm) or a function that returns
        normalized valeus within the range [0, 1]. If None, defaults to linear normalization
        between the minimum and maximum values.
    na_color : str, optional
        The color to use for NaN values.

    Returns:
    pd.Series
        A new Series with hex color strings.
    """
    # Normalize the data
    if norm is None:  # Defaults to linear normalization between min and max
        norm = mcolors.Normalize(values.min(), values.max())
    normalized = norm(values.values)
    colors = colormap(normalized)

    # Convert RGBA colors to hex, handling NaN values
    hex_colors = [mcolors.to_hex(color) if not np.isnan(color[0]) else na_color for color in colors]

    return pd.Series(hex_colors, index=values.index)

I have also attached some example use cases to show off where they can be useful.

color_series.pdf

EDIT: can add the alpha like this, setting the -th column of RGBA color rep and using the keep_alpha=True arg in to_hex.

    normalized = norm(values.values)
    colors = colormap(normalized)
    colors[:, -1] = alpha  # in RGBA format, so -1th column is alpha

    # Convert RGBA colors to hex, handling NaN values
    hex_colors = [mcolors.to_hex(color, keep_alpha=True) if not np.isnan(color[0]) else na_color for color in colors]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants