Upgrade DataFusion to latest, to include fixes for aggregation (#216) · coralogix/arrow-datafusion@ca4b6ee

Commit

Upgrade DataFusion to latest, to include fixes for aggregation (#216)

* Cleanup logical optimizer rules.  (apache#7919)

* Initial commit

* Address todos

* Update comments

* Simplifications

* Minor simplifications

* Address reviews

* Add TableScan constructor

* Minor changes

* make try_new_with_schema method of Aggregate private

* Use projection try_new instead of try_new_schema

* Simplifications, add comment

* Review changes

* Improve comments

* Move get_wider_type to type_coercion module

* Clean up type coercion file

---------

Co-authored-by: berkaysynnada <[email protected]>
Co-authored-by: Mehmet Ozan Kabak <[email protected]>

* Parallelize Serialization of Columns within Parquet RowGroups (apache#7655)

* merge main

* fixes and cmt

* review comments, tuning parameters, updating docs

* cargo fmt

* reduce default buffer size to 2 and update docs

* feat: Use bloom filter when reading parquet to skip row groups  (apache#7821)

* feat: implement read bloom filter support

* test: add unit test for read bloom filter

* Simplify bloom filter application

* test: add unit test for bloom filter with sql `in`

* fix: imrpove bloom filter match express

* fix: add more test for bloom filter

* ci: rollback dependences

* ci: merge main branch

* fix: unit tests for bloom filter

* ci: cargo clippy

* ci: cargo clippy

---------

Co-authored-by: Andrew Lamb <[email protected]>

* fix: don't push down volatile predicates in projection (apache#7909)

* fix: don't push down volatile predicates in projection

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Andrew Lamb <[email protected]>

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Andrew Lamb <[email protected]>

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Andrew Lamb <[email protected]>

* add suggestions

* fix

* fix doc

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Jonah Gao <[email protected]>

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Jonah Gao <[email protected]>

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Jonah Gao <[email protected]>

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Jonah Gao <[email protected]>

---------

Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Jonah Gao <[email protected]>

* Add `parquet` feature flag, enabled by default, and make parquet conditional  (apache#7745)

* Make parquet an option by adding multiple cfg attributes without significant code changes.

* Extract parquet logic into submodule from execution::context

* Extract parquet logic into submodule from datafusion_core::dataframe

* Extract more logic into submodule from execution::context

* Move tests from execution::context

* Rename submodules

* [MINOR]: Simplify enforce_distribution, minor changes (apache#7924)

* Initial commit

* Simplifications

* Cleanup imports

* Review

---------

Co-authored-by: Mehmet Ozan Kabak <[email protected]>

* Add simple window query to sqllogictest (apache#7928)

* ci: upgrade node to version 20 (apache#7918)

* Change input for `to_timestamp` function to be seconds rather than nanoseconds, add `to_timestamp_nanos` (apache#7844)

* Change input for `to_timestamp` function

* docs

* fix examples

* output `to_timestamp` signature as ns

* Minor: Document `parquet` crate feature (apache#7927)

* Minor: reduce some #cfg(feature = "parquet") (apache#7929)

* Minor: reduce use of cfg(parquet) in tests (apache#7930)

* Fix CI failures on `to_timestamp()` calls (apache#7941)

* Change input for `to_timestamp` function

* docs

* fix examples

* output `to_timestamp` signature as ns

* Fix CI `to_timestamp()` failed

* Update datafusion/expr/src/built_in_function.rs

Co-authored-by: Andrew Lamb <[email protected]>

* fix typo

* fix

---------

Co-authored-by: Andrew Lamb <[email protected]>

* minor: add a datatype casting for the updated value (apache#7922)

* minor: cast the updated value to the data type of target column

* Update datafusion/sqllogictest/test_files/update.slt

Co-authored-by: Alex Huang <[email protected]>

* Update datafusion/sqllogictest/test_files/update.slt

Co-authored-by: Alex Huang <[email protected]>

* Update datafusion/sqllogictest/test_files/update.slt

Co-authored-by: Alex Huang <[email protected]>

* fix tests

---------

Co-authored-by: Alex Huang <[email protected]>

* fix (apache#7946)

* Add simple exclude all columns test to sqllogictest (apache#7945)

* Add simple exclude all columns test to sqllogictest

* Add more exclude test cases

* Support Partitioning Data by Dictionary Encoded String Array Types (apache#7896)

* support dictionary encoded string columns for partition cols

* remove debug prints

* cargo fmt

* generic dictionary cast and dict encoded test

* updates from review

* force retry checks

* try checks again

* Minor: Remove array() in array_expression (apache#7961)

* remove array

Signed-off-by: jayzhan211 <[email protected]>

* cleanup others

Signed-off-by: jayzhan211 <[email protected]>

* clippy

Signed-off-by: jayzhan211 <[email protected]>

* cleanup cast

Signed-off-by: jayzhan211 <[email protected]>

* fmt

Signed-off-by: jayzhan211 <[email protected]>

* cleanup cast

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>

* Minor: simplify update code (apache#7943)

* Add some initial content about creating logical plans (apache#7952)

* Minor: Change from `&mut SessionContext` to `&SessionContext` in substrait (apache#7965)

* Lower &mut SessionContext in substrait

* rm mut ctx in tests

* Fix crate READMEs (apache#7964)

* Minor: Improve `HashJoinExec` documentation (apache#7953)

* Minor: Improve `HashJoinExec` documentation

* Apply suggestions from code review

Co-authored-by: Liang-Chi Hsieh <[email protected]>

---------

Co-authored-by: Liang-Chi Hsieh <[email protected]>

* chore: clean useless clone baesd on clippy (apache#7973)

* Add README.md to `core`, `execution` and `physical-plan` crates (apache#7970)

* Add README.md to `core`, `execution` and `physical-plan` crates

* prettier

* Update datafusion/physical-plan/README.md

* Update datafusion/wasmtest/README.md

---------

Co-authored-by: Daniël Heres <[email protected]>

* Move source repartitioning into `ExecutionPlan::repartition` (apache#7936)

* Move source repartitioning into ExecutionPlan::repartition

* cleanup

* update test

* update test

* refine docs

* fix merge

* minor: fix broken links in README.md (apache#7986)

* minor: fix broken links in README.md

* fix proto link

* Minor: Upate the `sqllogictest` crate README (apache#7971)

* Minor: Upate the sqllogictest crate README

* prettier

* Apply suggestions from code review

Co-authored-by: Jonah Gao <[email protected]>
Co-authored-by: jakevin <[email protected]>

---------

Co-authored-by: Jonah Gao <[email protected]>
Co-authored-by: jakevin <[email protected]>

* Improve MemoryCatalogProvider default impl block placement (apache#7975)

* Fix `ScalarValue` handling of NULL values for ListArray (apache#7969)

* Fix try_from_array data type for NULL value in ListArray

* Fix

* Explicitly assert the datatype

* For review

* Refactor of Ordering and Prunability Traversals and States (apache#7985)

* simplify ExprOrdering

* Comment improvements

* Move map/transform comment up

---------

Co-authored-by: Mehmet Ozan Kabak <[email protected]>

* Keep output as scalar for scalar function if all inputs are scalar (apache#7967)

* Keep output as scalar for scalar function if all inputs are scalar

* Add end-to-end tests

* Fix crate READMEs for core, execution, physical-plan (apache#7990)

* Update sqlparser requirement from 0.38.0 to 0.39.0 (apache#7983)

* chore: Update sqlparser requirement from 0.38.0 to 0.39.0

* support FILTER Aggregates

* Fix panic in multiple distinct aggregates by fixing `ScalarValue::new_list` (apache#7989)

* Fix panic in multiple distinct aggregates by fixing ScalarValue::new_list

* Update datafusion/common/src/scalar.rs

Co-authored-by: Daniël Heres <[email protected]>

---------

Co-authored-by: Daniël Heres <[email protected]>

* MemoryReservation exposes MemoryConsumer (apache#8000)

... as a getter method.

* fix: generate logical plan for `UPDATE SET FROM` statement (apache#7984)

* Create temporary files for reading or writing (apache#8005)

* Create temporary files for reading or writing

* nit

* addr comment

---------

Co-authored-by: zhongjingxiong <[email protected]>

* doc: minor fix to SortExec::with_fetch comment (apache#8011)

* Fix: dataframe_subquery example Optimizer rule `common_sub_expression_eliminate` failed (apache#8016)

* Fix: Optimizer rule 'common_sub_expression_eliminate' failed

* nit

* nit

* nit

---------

Co-authored-by: zhongjingxiong <[email protected]>

* Percent Decode URL Paths (apache#8009) (apache#8012)

* Treat ListingTableUrl as URL-encoded (apache#8009)

* Update lockfile

* Review feedback

* Minor: Extract common deps into workspace (apache#7982)

* Improve datafusion-*

* More common crates

* Extract async-trait

* Extract more

* Fix cli

---------

Co-authored-by: Andrew Lamb <[email protected]>

* minor: change some plan_err to exec_err (apache#7996)

* minor: change some plan_err to exec_err

Signed-off-by: Ruihang Xia <[email protected]>

* change unreachable code to internal error

Signed-off-by: Ruihang Xia <[email protected]>

---------

Signed-off-by: Ruihang Xia <[email protected]>

* Minor: error on unsupported RESPECT NULLs syntax (apache#7998)

* Minor: error on unsupported RESPECT NULLs syntax

* fix clippy

* Update datafusion/sql/tests/sql_integration.rs

Co-authored-by: Liang-Chi Hsieh <[email protected]>

---------

Co-authored-by: Liang-Chi Hsieh <[email protected]>

* GroupedHashAggregateStream breaks spill batch (apache#8004)

... into smaller chunks to decrease memory required for merging.

* Minor: Add implementation examples to ExecutionPlan::execute (apache#8013)

* Add implementation examples to ExecutionPlan::execute

* Review feedback

* address comment (apache#7993)

Signed-off-by: jayzhan211 <[email protected]>

* GroupedHashAggregateStream should register spillable consumer (apache#8002)

* fix: single_distinct_aggretation_to_group_by fail (apache#7997)

* fix: single_distinct_aggretation_to_group_by faile

* fix

* move test to groupby.slt

* Read only enough bytes to infer Arrow IPC file schema via stream (apache#7962)

* Read only enough bytes to infer Arrow IPC file schema via stream

* Error checking for collect bytes func

* Update datafusion/core/src/datasource/file_format/arrow.rs

Co-authored-by: Andrew Lamb <[email protected]>

---------

Co-authored-by: Andrew Lamb <[email protected]>

* Minor: remove a strange char (apache#8030)

* Minor: Improve documentation for Filter Pushdown (apache#8023)

* Minor: Improve documentation for Fulter Pushdown

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: jakevin <[email protected]>

* Apply suggestions from code review

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Alex Huang <[email protected]>

---------

Co-authored-by: jakevin <[email protected]>
Co-authored-by: Alex Huang <[email protected]>

* Minor: Improve `ExecutionPlan` documentation (apache#8019)

* Minor: Improve `ExecutionPlan` documentation

* Add link to Partitioning

* fix: clippy warnings from nightly rust 1.75 (apache#8025)

Signed-off-by: Ruihang Xia <[email protected]>

* Minor: Avoid recomputing compute_array_ndims in align_array_dimensions (apache#7963)

* Refactor align_array_dimensions

Signed-off-by: jayzhan211 <[email protected]>

* address comment

Signed-off-by: jayzhan211 <[email protected]>

* remove unwrap

Signed-off-by: jayzhan211 <[email protected]>

* address comment

Signed-off-by: jayzhan211 <[email protected]>

* fix rebase

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>

* Minor: fix doc check (apache#8037)

* Minor: remove uncessary #cfg test (apache#8036)

* Minor: remove uncessary #cfg test

* fmt

* Update datafusion/core/src/datasource/file_format/arrow.rs

Co-authored-by: Ruihang Xia <[email protected]>

---------

Co-authored-by: Daniël Heres <[email protected]>
Co-authored-by: Ruihang Xia <[email protected]>

* Minor: Improve documentation for  `PartitionStream` and `StreamingTableExec` (apache#8035)

* Minor: Improve documentation for  `PartitionStream` and `StreamingTableExec`

* fmt

* fmt

* Combine Equivalence and Ordering equivalence to simplify state (apache#8006)

* combine equivalence and ordering equivalence

* Remove EquivalenceProperties struct

* Minor changes

* all tests pass

* Refactor oeq

* Simplifications

* Resolve linter errors

* Minor changes

* Minor changes

* Add new tests

* Simplifications window mode selection

* Simplifications

* Use set_satisfy api

* Use utils for aggregate

* Minor changes

* Minor changes

* Minor changes

* All tests pass

* Simplifications

* Simplifications

* Minor changes

* Simplifications

* All tests pass, fix bug

* Remove unnecessary code

* Simplifications

* Minor changes

* Simplifications

* Move oeq join to methods

* Simplifications

* Remove redundant code

* Minor changes

* Minor changes

* Simplifications

* Simplifications

* Simplifications

* Move window to util from method, simplifications

* Simplifications

* Propagate meet in the union

* Simplifications

* Minor changes, rename

* Address berkay reviews

* Simplifications

* Add new buggy test

* Add data test for sort requirement

* Add experimental check

* Add random test

* Minor changes

* Random test gives error

* Fix missing test case

* Minor changes

* Minor changes

* Simplifications

* Minor changes

* Add new test case

* Minor changes

* Address reviews

* Minor changes

* Increase coverage of random tests

* Remove redundant code

* Simplifications

* Simplifications

* Refactor on tests

* Solving clippy errors

* prune_lex improvements

* Fix failing tests

* Update get_finer and get_meet

* Fix window lex ordering implementation

* Buggy state

* Do not use output ordering in the aggregate

* Add union test

* Update comment

* Fix bug, when batch_size is small

* Review Part 1

* Review Part 2

* Change union meet implementation

* Update comments

* Remove redundant check

* Simplify project out_expr function

* Remove Option<Vec<_>> API.

* Do not use project_out_expr

* Simplifications

* Review Part 3

* Review Part 4

* Review Part 5

* Review Part 6

* Review Part 7

* Review Part 8

* Update comments

* Add new unit tests, simplifications

* Resolve linter errors

* Simplify test codes

* Review Part 9

* Add unit tests for remove_redundant entries

* Simplifications

* Review Part 10

* Fix test

* Add new test case, fix implementation

* Review Part 11

* Review Part 12

* Update comments

* Review Part 13

* Review Part 14

* Review Part 15

* Review Part 16

* Review Part 17

* Review Part 18

* Review Part 19

* Review Part 20

* Review Part 21

* Review Part 22

* Review Part 23

* Review Part 24

* Do not construct idx and sort_expr unnecessarily, Update comments, Union meet single entry

* Review Part 25

* Review Part 26

* Name Changes, comment updates

* Review Part 27

* Add issue links

* Address reviews

* Fix failing test

* Update comments

* SortPreservingMerge, SortPreservingRepartition only preserves given expression ordering among input ordering equivalences

---------

Co-authored-by: metesynnada <[email protected]>
Co-authored-by: Mehmet Ozan Kabak <[email protected]>

* Encapsulate `ProjectionMapping` as a struct (apache#8033)

* Minor: Fix bugs in docs for `to_timestamp`, `to_timestamp_seconds`, ... (apache#8040)

* Minor: Fix bugs in docs for `to_timestamp`, `to_timestamp_seconds`, etc

* prettier

* Update docs/source/user-guide/sql/scalar_functions.md

Co-authored-by: comphead <[email protected]>

* Update docs/source/user-guide/sql/scalar_functions.md

Co-authored-by: comphead <[email protected]>

---------

Co-authored-by: comphead <[email protected]>

* Improve comments for `PartitionSearchMode` struct (apache#8047)

* Improve comments

* Make comments partition/group agnostic

* General approach for Array replace (apache#8050)

* checkpoint

Signed-off-by: jayzhan211 <[email protected]>

* optimize non-list

Signed-off-by: jayzhan211 <[email protected]>

* replace list ver

Signed-off-by: jayzhan211 <[email protected]>

* cleanup

Signed-off-by: jayzhan211 <[email protected]>

* rename

Signed-off-by: jayzhan211 <[email protected]>

* cleanup

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>

* Minor: Remove the irrelevant note from the Expression API doc (apache#8053)

* Minor: Add more documentation about Partitioning (apache#8022)

* Minor: Add more documentation about Partitioning

* fix typo

* Apply suggestions from code review

Co-authored-by: comphead <[email protected]>

* Add more diagrams, improve text

* undo unintended changes

* undo unintended changes

* fix links

* Try and clarify

---------

Co-authored-by: comphead <[email protected]>

* Minor: improve documentation for IsNotNull, DISTINCT, etc (apache#8052)

* Minor: improve documentation for IsNotNull, DISTINCT, etc

* fix

* Prepare 33.0.0 Release (apache#8057)

* changelog

* update version

* update changelog

* Minor: improve error message by adding types to message (apache#8065)

* Minor: improve error message

* add test

* Minor: Remove redundant BuiltinScalarFunction::supports_zero_argument() (apache#8059)

* deprecate BuiltinScalarFunction::supports_zero_argument()

* unify old supports_zero_argument() impl

* Add example to ci (apache#8060)

* feat: add example to ci

* nit

* addr comments

---------

Co-authored-by: zhongjingxiong <[email protected]>

* Update substrait requirement from 0.18.0 to 0.19.0 (apache#8076)

Updates the requirements on [substrait](https://github.com/substrait-io/substrait-rs) to permit the latest version.
- [Release notes](https://github.com/substrait-io/substrait-rs/releases)
- [Changelog](https://github.com/substrait-io/substrait-rs/blob/main/CHANGELOG.md)
- [Commits](substrait-io/substrait-rs@v0.18.0...v0.19.0)

---
updated-dependencies:
- dependency-name: substrait
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix incorrect results in COUNT(*) queries with LIMIT (apache#8049)

Co-authored-by: Mark Sirek <[email protected]>

* feat: Support determining extensions from names like `foo.parquet.snappy` as well as `foo.parquet` (apache#7972)

* feat: read files based on the file extention

* fix: some the file extension might be started with . and some not

* fix: rename extention to extension

* chore: use exec_err

* chore: rename extention to extension

* chore: rename extention to extension

* chore: simplify the code

* fix: check table is empty

* ci: fix test

* fix: add err info

* refactor: extract the logic to infer_types

* fix: add tests for different extensions

* fix: ci clippy

* fix: add more tests

* fix: simplify the logic

* fix: ci

* Use FairSpillPool for TaskContext with spillable config (apache#8072)

* Minor: Improve HashJoinStream docstrings (apache#8070)

* Minor: Improve HashJoinStream docstrings

* fix comments

* Update datafusion/physical-plan/src/joins/hash_join.rs

Co-authored-by: comphead <[email protected]>

* Update datafusion/physical-plan/src/joins/hash_join.rs

Co-authored-by: comphead <[email protected]>

---------

Co-authored-by: Daniël Heres <[email protected]>
Co-authored-by: comphead <[email protected]>

* Fixing broken link (apache#8085)

* Fixing broken link

* Update docs/source/contributor-guide/index.md

Thanks for spotting this as well

Co-authored-by: Liang-Chi Hsieh <[email protected]>

---------

Co-authored-by: Liang-Chi Hsieh <[email protected]>

* fix: DataFusion suggests invalid functions (apache#8083)

* fix: DataFusion suggests invalid functions

* update test

* Add test for BuiltInWindowFunction

* Replace macro with function for  `array_repeat` (apache#8071)

* General array repeat

Signed-off-by: jayzhan211 <[email protected]>

* cleanup

Signed-off-by: jayzhan211 <[email protected]>

* cleanup

Signed-off-by: jayzhan211 <[email protected]>

* cleanup

Signed-off-by: jayzhan211 <[email protected]>

* add test

Signed-off-by: jayzhan211 <[email protected]>

* add test

Signed-off-by: jayzhan211 <[email protected]>

* done

Signed-off-by: jayzhan211 <[email protected]>

* remove test

Signed-off-by: jayzhan211 <[email protected]>

* add comment

Signed-off-by: jayzhan211 <[email protected]>

* fm

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>

* Minor: remove unnecessary projection in `single_distinct_to_group_by` rule (apache#8061)

* Minor: remove unnecessary projection

* fix ci

* minor: Remove duplicate version numbers for arrow, object_store, and parquet dependencies (apache#8095)

* remove duplicate version numbers for arrow, object_store, and parquet dependencies

* cargo update

* use default features in parquet crate

* disable default parquet features in wasmtest

* fix: add match encode/decode  scalar function type (apache#8089)

* feat: Protobuf serde for Json file sink (apache#8062)

* Protobuf serde for Json file sink

* Fix tests

* Fix test

* Minor: use `Expr::alias` in a few places to make the code more concise (apache#8097)

* Minor: Cleanup BuiltinScalarFunction::return_type() (apache#8088)

* Expose metrics from FileSinkExec impl of ExecutionPlan

---------

Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Mustafa Akur <[email protected]>
Co-authored-by: berkaysynnada <[email protected]>
Co-authored-by: Mehmet Ozan Kabak <[email protected]>
Co-authored-by: Devin D'Angelo <[email protected]>
Co-authored-by: Hengfei Yang <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Huaijin <[email protected]>
Co-authored-by: Jonah Gao <[email protected]>
Co-authored-by: Chih Wang <[email protected]>
Co-authored-by: Jeffrey <[email protected]>
Co-authored-by: Marco Neumann <[email protected]>
Co-authored-by: comphead <[email protected]>
Co-authored-by: Alex Huang <[email protected]>
Co-authored-by: Jay Zhan <[email protected]>
Co-authored-by: Andy Grove <[email protected]>
Co-authored-by: yi wang <[email protected]>
Co-authored-by: Liang-Chi Hsieh <[email protected]>
Co-authored-by: jakevin <[email protected]>
Co-authored-by: 张林伟 <[email protected]>
Co-authored-by: Berkay Şahin <[email protected]>
Co-authored-by: Marko Milenković <[email protected]>
Co-authored-by: jokercurry <[email protected]>
Co-authored-by: zhongjingxiong <[email protected]>
Co-authored-by: Weston Pace <[email protected]>
Co-authored-by: Raphael Taylor-Davies <[email protected]>
Co-authored-by: Ruihang Xia <[email protected]>
Co-authored-by: metesynnada <[email protected]>
Co-authored-by: Yongting You <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mark Sirek <[email protected]>
Co-authored-by: Mark Sirek <[email protected]>
Co-authored-by: Edmondo Porcu <[email protected]>
Co-authored-by: Syleechan <[email protected]>
Co-authored-by: Dan Harris <[email protected]>

Loading branch information

35 people committed Nov 9, 2023

1 parent f430805 commit ca4b6ee

.github/pull_request_template.md

-Original file line number
+Diff line change
@@ Expand Up @@
     <!--
     If there are any breaking changes to public APIs, please add the `api change` label.
-    -->
+    -->

.github/workflows/dev.yml

-Original file line number
+Diff line change
@@ Expand Up / @@ -43,7 +43,7 @@ jobs: @@
           - uses: actions/checkout@v4
           - uses: actions/setup-node@v4
             with:
-              node-version: "14"
+              node-version: "20"
           - name: Prettier check
             run: |
               # if you encounter error, rerun the command below and commit the changes
@@ Expand Down @@

.github/workflows/rust.yml

-Original file line number
+Diff line change
@@ Expand Up / @@ -139,19 +139,7 @@ jobs: @@
               # test datafusion-sql examples
               cargo run --example sql
               # test datafusion-examples
-              cargo run --example avro_sql --features=datafusion/avro
-              cargo run --example csv_sql
-              cargo run --example custom_datasource
-              cargo run --example dataframe
-              cargo run --example dataframe_in_memory
-              cargo run --example deserialize_to_struct
-              cargo run --example expr_api
-              cargo run --example parquet_sql
-              cargo run --example parquet_sql_multiple_files
-              cargo run --example memtable
-              cargo run --example rewrite_expr
-              cargo run --example simple_udf
-              cargo run --example simple_udaf
+              ci/scripts/rust_example.sh
           - name: Verify Working Directory Clean
             run: git diff --exit-code
@@ Expand Down Expand Up / @@ -527,7 +515,7 @@ jobs: @@
               rust-version: stable
           - uses: actions/setup-node@v4
             with:
-              node-version: "14"
+              node-version: "20"
           - name: Check if configs.md has been modified
             run: |
               # If you encounter an error, run './dev/update_config_docs.sh' and commit
@@ Expand Down @@

Cargo.toml

-Original file line number
+Diff line change
@@ Expand Up / @@ -32,6 +32,7 @@ members = [ @@
         "datafusion/substrait",
         "datafusion/wasmtest",
         "datafusion-examples",
+        "docs",
         "test-utils",
         "benchmarks",
     ]
@@ Expand All / @@ -45,17 +46,50 @@ license = "Apache-2.0" @@
     readme = "README.md"
     repository = "https://github.com/apache/arrow-datafusion"
     rust-version = "1.70"
-    version = "32.0.0"
+    version = "33.0.0"
     [workspace.dependencies]
     arrow = { version = "48.0.0", features = ["prettyprint"] }
     arrow-array = { version = "48.0.0", default-features = false, features = ["chrono-tz"] }
     arrow-buffer = { version = "48.0.0", default-features = false }
     arrow-flight = { version = "48.0.0", features = ["flight-sql-experimental"] }
+    arrow-ord = { version = "48.0.0", default-features = false }
     arrow-schema = { version = "48.0.0", default-features = false }
-    parquet = { version = "48.0.0", features = ["arrow", "async", "object_store"] }
-    sqlparser = { version = "0.38.0", features = ["visitor"] }
+    async-trait = "0.1.73"
+    bigdecimal = "0.4.1"
+    bytes = "1.4"
+    ctor = "0.2.0"
+    datafusion = { path = "datafusion/core" }
+    datafusion-common = { path = "datafusion/common" }
+    datafusion-expr = { path = "datafusion/expr" }
+    datafusion-sql = { path = "datafusion/sql" }
+    datafusion-optimizer = { path = "datafusion/optimizer" }
+    datafusion-physical-expr = { path = "datafusion/physical-expr" }
+    datafusion-physical-plan = { path = "datafusion/physical-plan" }
+    datafusion-execution = { path = "datafusion/execution" }
+    datafusion-proto = { path = "datafusion/proto" }
+    datafusion-sqllogictest = { path = "datafusion/sqllogictest" }
+    datafusion-substrait = { path = "datafusion/substrait" }
+    dashmap = "5.4.0"
+    doc-comment = "0.3"
+    env_logger = "0.10"
+    futures = "0.3"
+    half = "2.2.1"
+    indexmap = "2.0.0"
+    itertools = "0.11"
+    log = "^0.4"
+    num_cpus = "1.13.0"
+    object_store = { version = "0.7.0", default-features = false }
+    parking_lot = "0.12"
+    parquet = { version = "48.0.0", default-features = false, features = ["arrow", "async", "object_store"] }
+    rand = "0.8"
+    rstest = "0.18.0"
+    serde_json = "1"
+    sqlparser = { version = "0.39.0", features = ["visitor"] }
+    tempfile = "3"
+    thiserror = "1.0.44"
     chrono = { version = "0.4.31", default-features = false }
+    url = "2.2"
     [profile.release]
     codegen-units = 1
@@ Expand All / @@ -74,3 +108,4 @@ opt-level = 3 @@
     overflow-checks = false
     panic = 'unwind'
     rpath = false

README.md

-Original file line number
+Diff line change
@@ Expand Up / @@ -47,6 +47,7 @@ Default features: @@
     - `compression`: reading files compressed with `xz2`, `bzip2`, `flate2`, and `zstd`
     - `crypto_expressions`: cryptographic functions such as `md5` and `sha256`
     - `encoding_expressions`: `encode` and `decode` functions
+    - `parquet`: support for reading the [Apache Parquet] format
     - `regex_expressions`: regular expression functions, such as `regexp_match`
     - `unicode_expressions`: Include unicode aware functions such as `character_length`
@@ Expand All / @@ -59,6 +60,7 @@ Optional features: @@
     - `simd`: enable arrow-rs's manual `SIMD` kernels (requires Rust `nightly`)
     [apache avro]: https://avro.apache.org/
+    [apache parquet]: https://parquet.apache.org/
     ## Rust Version Compatibility
@@ Expand Down @@

benchmarks/Cargo.toml

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -18,7 +18,7 @@
  
    [package]

    name = "datafusion-benchmarks"

    description = "DataFusion Benchmarks"

    version = "32.0.0"

    version = "33.0.0"

    edition = { workspace = true }

    authors = ["Apache Arrow <[email protected]>"]

    homepage = "https://github.com/apache/arrow-datafusion"

    @@ -34,20 +34,20 @@ snmalloc = ["snmalloc-rs"]
  
    [dependencies]

    arrow = { workspace = true }

    datafusion = { path = "../datafusion/core", version = "32.0.0" }

    datafusion-common = { path = "../datafusion/common", version = "32.0.0" }

    env_logger = "0.10"

    futures = "0.3"

    log = "^0.4"

    datafusion = { path = "../datafusion/core", version = "33.0.0" }

    datafusion-common = { path = "../datafusion/common", version = "33.0.0" }

    env_logger = { workspace = true }

    futures = { workspace = true }

    log = { workspace = true }

    mimalloc = { version = "0.1", optional = true, default-features = false }

    num_cpus = "1.13.0"

    parquet = { workspace = true }

    num_cpus = { workspace = true }

    parquet = { workspace = true, default-features = true }

    serde = { version = "1.0.136", features = ["derive"] }

    serde_json = "1.0.78"

    serde_json = { workspace = true }

    snmalloc-rs = { version = "0.3", optional = true }

    structopt = { version = "0.3", default-features = false }

    test-utils = { path = "../test-utils/", version = "0.1.0" }

    tokio = { version = "^1.0", features = ["macros", "rt", "rt-multi-thread", "parking_lot"] }

    [dev-dependencies]

    datafusion-proto = { path = "../datafusion/proto", version = "32.0.0" }

    datafusion-proto = { path = "../datafusion/proto", version = "33.0.0" }

ci/scripts/rust_example.sh

-Original file line number
+Diff line change
@@ -0,0 +1,35 @@
+    #!/usr/bin/env bash
+    #
+    # Licensed to the Apache Software Foundation (ASF) under one
+    # or more contributor license agreements.  See the NOTICE file
+    # distributed with this work for additional information
+    # regarding copyright ownership.  The ASF licenses this file
+    # to you under the Apache License, Version 2.0 (the
+    # "License"); you may not use this file except in compliance
+    # with the License.  You may obtain a copy of the License at
+    #
+    #   http://www.apache.org/licenses/LICENSE-2.0
+    #
+    # Unless required by applicable law or agreed to in writing,
+    # software distributed under the License is distributed on an
+    # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    # KIND, either express or implied.  See the License for the
+    # specific language governing permissions and limitations
+    # under the License.
+    set -ex
+    cd datafusion-examples/examples/
+    cargo fmt --all -- --check
+    files=$(ls .)
+    for filename in $files
+    do
+      example_name=`basename $filename ".rs"`
+      # Skip tests that rely on external storage and flight
+      # todo: Currently, catalog.rs is placed in the external-dependence directory because there is a problem parsing
+      # the parquet file of the external parquet-test that it currently relies on.
+      # We will wait for this issue[https://github.com/apache/arrow-datafusion/issues/8041] to be resolved.
+      if [ ! -d $filename ]; then
+         cargo run --example $example_name
+      fi
+    done

0 comments on commit `ca4b6ee`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `ca4b6ee`

Commit

There are no files selected for viewing

0 comments on commit ca4b6ee

0 comments on commit `ca4b6ee`