Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolved error handling in sql_translator #697 #703

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

NEC-Vishal
Copy link
Contributor

Proposed changes

#697 improve error handling in sql_translator.py

Types of changes

What types of changes does your code introduce to the project?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Maintenance (update of libraries or other dependences)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • I have read the CONTRIBUTING doc
  • I have signed the CLA
  • I have added tests that prove my fix is effective or that my feature works
  • I have run all the existing tests locally (not just those related to my feature) and there are no errors
  • After the last push to the PR branch, I have run the lint script locally and there are no changes to the code base
  • I have updated the RELEASE NOTES
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in downstream modules

Further comments

If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc...

@github-actions
Copy link
Contributor

github-actions bot commented Nov 21, 2022

CLA Assistant Lite bot All contributors have signed the CLA ✍️

Copy link
Member

@c0c0n3 c0c0n3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially thought your approach was great, but then I realised that unfortunately b/c of the spaghetti code we've got, we'll need a more involved approach.

# using batches, we don't need to fail the whole set
# but only failing batches.
res_list.append(res)
for res in res_list:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach will only work for Crate, not Timescale which I think throws an exception instead. In general, what to do here will depend on the database at hand, so we should actually push this code down to the translator subclasses.

@c0c0n3
Copy link
Member

c0c0n3 commented Jan 5, 2023

@NEC-Vishal this TODO is harder to get right than I initially thought.

There are two problems with the code as it stands in the master branch:

  1. If any batch fails, all the following ones won't be inserted. This is what the TODO mentioned.
  2. The error handling block below the TODO is Crate-specific.

I propose we do the following. (Please feel free to propose a different approach if you don't agree.)

First off, we need an abstract method in the SQLTranslator class to insert each and every batch regardless of whether any previous inserts failed.

class SQLTranslator:

    def _insert_row_batches(stmt: str, batches: [[Tuple]]):
        raise NotImplementedError

    def _insert_entity_rows(...
        ...
        try:
            batches = to_insert_batches(rows)
            self._insert_row_batches(stmt, batches)
        except Exception as e:
            ...

In the case of one or multiple batches failing, _insert_row_batches will throw an exception. If there are multiple failures, they all get wrapped up in a single exception. This is obviously far from optimal, but at least the existing error handling code in _insert_entity_rows will keep on working as expected, in particular the last-ditch attempt to save entity data in the original format.

Then this should be the Crate-specific implementation.

class CrateTranslator:

    def _insert_row_batches(stmt: str, batches: [[Tuple]]):
        has_error = False
        for batch in batches:
            res = self.cursor.executemany(stmt, batch)
            # recent Crate versions don't bomb out anymore when
            # something goes wrong with a multi-row insert, instead
            # the driver returns -2 for each row that has an issue
            if isinstance(res, list):
                for i in range(len(res)):
                    if res[i]['rowcount'] < 0:
                        has_error = True
        if has_error:
            raise Exception('An insert failed')

Whereas for Timescale, we could have something like this

class PostgresTranslator:

    def _insert_row_batches(stmt: str, batches: [[Tuple]]):
        error = None
        for batch in batches:
            try:
                self.cursor.executemany(stmt, batch)
            except Exception as e:
                error = e
        if error:
            raise error

@c0c0n3
Copy link
Member

c0c0n3 commented Jan 5, 2023

@NEC-Vishal please keep in mind I haven't tested any of this and the pseudo-code I sketched out earlier may need some tweaking. But the biggest chunk of work in this PR will be the testing. This is a critical section of the code base, so you'll need to implement test cases to cover all possible scenarios---no failure, partial failure, total failure. Both for Crate and Timescale...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants