forked from cloudera/impyla
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync with Cloudera Impyla #2
Draft
thyarles
wants to merge
27
commits into
smartlab-br:master
Choose a base branch
from
cloudera:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…515) that are applicable to username/password auth and JWT auth are not mixed together on the same call to the connect method. These additional checks prevent confusion about which authentication method is actually used for the connection. New tests were added to cover the new checks.
Impyla gets cookies from an HTTMessage object formed from a response to an HTTP message. The format of cookies in the message differs across the python versions. In Python 2 the HTTPMessage is a mimetools.Message object, and the Set-Cookie values all appear in a single header, separated by newlines. In Python 3 the HTTPMessage is an email.message.Message, and the Set-Cookie values appear as duplicate headers. Add platform dependent code to get_all_matching_cookies() that loads cookies from all the Set-Cookie headers. TESTING: Changed test_get_all_matching_cookies() to build the HTTPMessage using a new utility method that creates Set-Cookie headers in the appropriate format for the platform. I hand tested with a proxy that inserted 3 cookies into http responses. I added the 3 cookie names to the list of default cookies. I ran TestHttpConnect.test_simple_connect() connecting to Impala through the proxy and verified with the debugger that the cookies were returned correctly from get_all_matching_cookies() in both python2 and python3.
Co-authored-by: cravani <[email protected]>
Current Usage part works well for Impala users but will fail for Hive users because of the `auth_mechanism` default value. This adds a comment targeted towards Hive users so they can quick start too.
ImpalaService.thrift is updated to contain CloseImpalaOperation, which can be used get the number of modified rows in DMLs. This is not just a copy, some parts of ImpalaService.thrift are not included to avoid pulling in more Thrift files as dependencies. Also updated process_thrift.sh to work with current Impala env vars.
sqlalchemy 2 (now default on pip in Python 3) removed some functions used in tests. Updated these to work both with sqlalchemy 2.* and 1.* (>=1.2).
* Support Cursor.rowcount and close finished queries With current Impala server rowcount support needs DMLs to be closed with CloseImpalaOperation() as there is no simpler way to get the number of modifed rows. See https://issues.apache.org/jira/browse/IMPALA-12647 for alternatives. This change adds option close_finished_queries for cursors with default True. Setting it to False brings back the old behavior. If queries are closed after finishing queries, calling get_log RPC is no longer possible. If close_finished_queries is true then the logs are fetched and stored before closing to query to be able to return the saved results with get_log. Generally get_log shouldn't be a too expensive RPC. Another potential side-effect is that get_profile may fail as Impala can discard the runtime profile after the query is closed (see Impala flag query_log_size). Despite the above side effects closing the queries seems a better default behavior as it helps avoiding queries hanging in the "waiting to be closed" state and provides reliable rowcount. This is also consistent with the way impala-shell works. Testing: - rowcount already had good coverage in DBAPI2 compliance tests (e.g. test_mixedfetch) - new tests were added for some missing rowcount cases and for getting warning/error log for closed queries * Fix review comments
The old version used deprecated functions that were removed in Python 3.12. The change only contains code generated by: versioneer install
Co-authored-by: David Hulsman <[email protected]>
thyarles
commented
Mar 25, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
This function is called for every query during normal execution, making this info level too verbose.
* Add text() wrapper for metadata queries. Remove tablename from retrieve columnname results. * Update sqlalchemy.py remove tablename from get_columns result. * replace 'r' in re.sub argument
* Avoid retrying non-idempotent RPCs in binary connections (#549) See the #549 for the detailed analyses of the issue. The fix works similarly to the existing solution for http connections: - each RPC knows whether it is idempotent - if the error comes from establishing the connection, then retry - if the error comes from executing the RPC, only retry if the RPC is idempotent A test is added that relies on slow metadata handling in the Impala cluster to trigger timouts. It would be nice to add wider and more reliable tests in the future similarly to the http tests in test_hs2_fault_injection.py * Fix review comments * Fix review comment
The goal is to support "long poll" (IMPALA-13294). When query option long_polling_time_ms is set, the impala server will wait in GetOperationStatus for this time (or until the query status changes). This allows detecting earlier that a query has finished without making GetOperationStatus RPCs more frequent. If long_polling_time_ms is not used then the effect should be minor - GetOperationStatus is quick RPC so the time it takes should mainly come from network delay. _get_sleep_interval() is not changed (min 0.01s, max 1s) to avoid regression in existing use cases. It could be useful to override this in a later patch based on the value of long_polling_time_ms.
Supported Python versions are also updated in setup.py.
The issue was introduced in #542. Caught by Impala's LdapImpylaHttpTest.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.