Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP(iox-11398): patched df upgrade 2024-07-08 #33

Closed
wants to merge 8 commits into from

Conversation

appletreeisyellow
Copy link

@appletreeisyellow appletreeisyellow commented Jul 12, 2024

⚠️ This will not be merged. ⚠️

  1. Bringing us up to datafusion to 2024-07-08, apache@4123ad6

  2. This PR is based on 2024-07-02 apache@3421b52

    git co -b chunchun/update-df-july-week-1-2 3421b52605b00cd2e5a6498ea210cce196a19496
    
  3. Cherry-picked the following commits:

    1. fix: Incorrect LEFT JOIN evaluation result on OR conditions apache/datafusion#11203 / apache@03c8db0

      commit 03c8db0a988f75446719bf77535076854c97b220
      Author: Liang-Chi Hsieh <[email protected]>
      Date:   Wed Jul 3 08:01:26 2024 -0700
      
           fix: Incorrect LEFT JOIN evaluation result on OR conditions (#11203)
    2. feat: add UDF to_local_time() apache/datafusion#11347 / apache@f284e3b

      commit f284e3bb73e089abc0c06b3314014522411bf1da
      Author: Chunchun Ye <[email protected]>
      Date:   Thu Jul 11 11:17:09 2024 -0500
      
          feat: add UDF to_local_time() (#11347)
    3. Track parquet writer encoding memory usage on MemoryPool apache/datafusion#11345 / apache@6038f4c

      commit 6038f4cfac536dbb54ea2761828f7344a23b94f0
      Author: wiedld <[email protected]>
      Date:   Wed Jul 10 11:21:01 2024 -0700
      
          Track parquet writer encoding memory usage on MemoryPool (#11345)
    4. fix(11397): surface proper errors in ParquetSink apache/datafusion#11399 / apache@1dfac86

      commit 1dfac86a89750193491cf3e04917e37b92c64ffa
      Author: wiedld <[email protected]>
      Date:   Fri Jul 12 04:04:42 2024 -0700
      
          fix(11397): surface proper errors in ParquetSink (#11399)
    5. temporary workaround: Test + workaround for SanityCheckPlan error apache/datafusion#11493

      commit 73196fdb7ef4dbb6e24d653cd18d4b0cc70a3474
      Author: Andrew Lamb <[email protected]>
      Date:   Tue Jul 16 12:14:19 2024 -0400
      
          Test + workaround for SanityCheck plan

viirya and others added 3 commits July 12, 2024 17:10
…1203)

* fix: Incorrect LEFT JOIN evaluation result on OR conditions

* Add a few more test cases

* Don't push join filter predicates into join_conditions

* Add test case and fix typo

* Add test case

---------

Co-authored-by: Andrew Lamb <[email protected]>
* feat: add UDF `to_local_time()`

* chore: support column value in array

* chore: lint

* chore: fix conversion for us, ms, and s

* chore: add more tests for daylight savings time

* chore: add function description

* refactor: update tests and add examples in description

* chore: add description and example

* chore: doc

chore: doc

chore: doc

chore: doc

chore: doc

* chore: stop copying

* chore: fix typo

* chore: mention that the offset varies based on daylight savings time

* refactor: parse timezone once and update examples in description

* refactor: replace map..concat with flat_map

* chore: add hard code timestamp value in test

chore: doc

chore: doc

* chore: handle errors and remove panics

* chore: move some test to slt

* chore: clone time_value

* chore: typo

---------

Co-authored-by: Andrew Lamb <[email protected]>
* feat(11344): track memory used for non-parallel writes

* feat(11344): track memory usage during parallel writes

* test(11344): create bounded stream for testing

* test(11344): test ParquetSink memory reservation

* feat(11344): track bytes in file writer

* refactor(11344): tweak the ordering to add col bytes to rg_reservation, before selecting shrinking for data bytes flushed

* refactor: move each col_reservation and rg_reservation to match the parallelized call stack for col vs rg

* test(11344): add memory_limit enforcement test for parquet sink

* chore: cleanup to remove unnecessary reservation management steps

* fix: fix CI test failure due to file extension rename
* fix(11397): do not surface errors for closed channels, and instead let the task join errors be surfaced

* fix(11397): terminate early on channel send failure

Add Optimizer Sanity Checker, improve sortedness equivalence properties (apache#11196)

* Initial optimizer sanity checker.

Only includes sort reqs, docs will be added.

* Add distro and pipeline friendly checks

* Also check the plans we create are correct.

* Add distribution test cases using global limit exec.

* Add test for multiple children using SortMergeJoinExec.

* Move PipelineChecker to SanityCheckPlan

* Fix some tests and add docs

* Add some test docs and fix clippy diagnostics.

* Fix some failing tests

* Replace PipelineChecker with SanityChecker in .slt files.

* Initial commit

* Slt tests pass

* Resolve linter errors

* Minor changes

* Minor changes

* Minor changes

* Minor changes

* Sort PreservingMerge clear per partition

* Minor changes

* Update output_requirements.rs

* Address reviews

* Update datafusion/core/src/physical_optimizer/optimizer.rs

Co-authored-by: Mehmet Ozan Kabak <[email protected]>

* Update datafusion/core/src/physical_optimizer/sanity_checker.rs

Co-authored-by: Mehmet Ozan Kabak <[email protected]>

* Address reviews

* Minor changes

* Apply suggestions from code review

Co-authored-by: Andrew Lamb <[email protected]>

* Update comment

* Add map implementation

---------

Co-authored-by: Erman Yafay <[email protected]>
Co-authored-by: berkaysynnada <[email protected]>
Co-authored-by: Mehmet Ozan Kabak <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
@appletreeisyellow appletreeisyellow changed the title WIP(iox-11398): patched df upgrade 2024-07-TBD WIP(iox-11398): patched df upgrade 2024-07-08 Jul 17, 2024
@appletreeisyellow
Copy link
Author

Closing since upgrade is done

@appletreeisyellow appletreeisyellow deleted the chunchun/update-df-july-week-1-2 branch July 31, 2024 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants