Automatic retry with larger schema inference length if errors occur #789

lars-reimann · 2024-05-18T17:49:14Z

Is your feature request related to a problem?

The issue with lazy evaluation of data is that errors only occur when we collect the data. At this point, it's no longer possible to fix errors that were caused by previous steps.

For example, if later rows don't match the inferred schema, an error is thrown. Users must then change e.g. their call of Table.from_csv_file and set the inference length (#749) or override parts of the schema (#754).

Ideally, we should automatically recover from such errors.

Desired solution

In Table, don't store a lazy frame directly. Instead, store a factory function that produces a lazy frame. This allows

passing arguments from later steps to produce the lazy frame,
trying again (with different arguments).

When the lazy frame is collected, catch relevant errors, and rebuild the lazy frame

with a larger schema inference length,
if that fails, some columns forced to string type.

We need to be cautious that this works properly with memoization, though.

Possible alternatives (optional)

No response

Screenshots (optional)

No response

Additional Context (optional)

No response

The text was updated successfully, but these errors were encountered:

lars-reimann added the enhancement 💡 New feature or request label May 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic retry with larger schema inference length if errors occur #789

Automatic retry with larger schema inference length if errors occur #789

lars-reimann commented May 18, 2024 •

edited

Loading

Automatic retry with larger schema inference length if errors occur #789

Automatic retry with larger schema inference length if errors occur #789

Comments

lars-reimann commented May 18, 2024 • edited Loading

Is your feature request related to a problem?

Desired solution

Possible alternatives (optional)

Screenshots (optional)

Additional Context (optional)

lars-reimann commented May 18, 2024 •

edited

Loading