You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With Ibis I could be writing more consistent dataframe code regardless of the backend (e.g. Polars, MSSQL, or PostgreSQL) while having faster performance than Pandas, and also solving the parametrization problem that comes with integrating Python and SQL. With Kedro I get consistent data science project structures. With Pandera I get dataframe data validation. Everyone that cares about those things will similarly benefit from Kedro-Pandera integration with Ibis.
I would like something highly similar to what I see in the Kedro-Pandera plugin's documentation, except to also support Ibis datasets.
Possible Implementation
I'm not currently familiar with the internals of the Kedro-Pandera, so my suggestion will be somewhat limited to that lack of understanding.
Because Kedro-Pandera is responsible for an integration of Kedro and Pandera, the implementation should depend on current behaviour Kedro, Pandera, and Ibis rather than modifying their behaviour.
I've noted that Pandera supports Polars in addition to Pandas, however Ibis has its own classes that I do not expect Pandera to have support for. Rather, the implementation could take advantage of the fact that the Ibis dataframe objects will have either of to_pandas or to_polars.
Here is a summary of the logic I have in mind:
If a dataset is annotated to be one of the already-supported datasets, proceed as usual.
If a dataset is a kedro_datasets.ibis.TableDataset then load that dataset, convert it to polars/pandas, then run the Pandera validator on it.
Possible Alternatives
Another option is for me to have a Kedro pipeline for this type of validation instead. This would involve casting the Ibis table dataset to a polars dataframe myself, and loading the schema itself as a yaml Kedro dataset, and running the Pandera validator against the Polars dataset.
The text was updated successfully, but these errors were encountered:
galenseilis
changed the title
[QUESTION] Is integration with Ibis supported?
[QUESTION] Could integration with Ibis be supported?
Sep 19, 2024
This is definitely valuable and should be added to the roadmap.
TBH I have hard times recently to maintain the plugins, and kedro-pandera is quite inactive. I plan to resume working on it one day, but I can't provide a time when I will resume development of kedro-pandera.
Description
I am exploring using a combo of Ibis, Kedro, and Pandera if that's possible.
Context
With Ibis I could be writing more consistent dataframe code regardless of the backend (e.g. Polars, MSSQL, or PostgreSQL) while having faster performance than Pandas, and also solving the parametrization problem that comes with integrating Python and SQL. With Kedro I get consistent data science project structures. With Pandera I get dataframe data validation. Everyone that cares about those things will similarly benefit from Kedro-Pandera integration with Ibis.
I would like something highly similar to what I see in the Kedro-Pandera plugin's documentation, except to also support Ibis datasets.
Possible Implementation
I'm not currently familiar with the internals of the Kedro-Pandera, so my suggestion will be somewhat limited to that lack of understanding.
Because Kedro-Pandera is responsible for an integration of Kedro and Pandera, the implementation should depend on current behaviour Kedro, Pandera, and Ibis rather than modifying their behaviour.
I've noted that Pandera supports Polars in addition to Pandas, however Ibis has its own classes that I do not expect Pandera to have support for. Rather, the implementation could take advantage of the fact that the Ibis dataframe objects will have either of
to_pandas
orto_polars
.Here is a summary of the logic I have in mind:
kedro_datasets.ibis.TableDataset
then load that dataset, convert it to polars/pandas, then run the Pandera validator on it.Possible Alternatives
Another option is for me to have a Kedro pipeline for this type of validation instead. This would involve casting the Ibis table dataset to a polars dataframe myself, and loading the schema itself as a yaml Kedro dataset, and running the Pandera validator against the Polars dataset.
The text was updated successfully, but these errors were encountered: