Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apache arrow 14.0.2 hotfix #1

Closed
wants to merge 73 commits into from
Closed

Commits on Oct 11, 2023

  1. apacheGH-24868: [C++] Add a Tensor logical value type with varying di…

    …mensions, implemented using ExtensionType (apache#37166)
    
    ### Rationale for this change
    
    For use cases where underlying datatype and number of dimensions in tensors are equal but not the actual shape we want to add a `VariableShapeTensorType`.
    See apache#24868 and huggingface/datasets#5272
    
    ### What changes are included in this PR?
    
    This introduces definition of `arrow.variable_shape_tensor` extension and it's C++ implementation and a Python wrapper.
    
    ### Are these changes tested?
    
    Yes.
    
    ### Are there any user-facing changes?
    
    This introduces new extension type to the user.
    * Closes: apache#24868
    
    Lead-authored-by: Rok Mihevc <[email protected]>
    Co-authored-by: Joris Van den Bossche <[email protected]>
    Co-authored-by: Antoine Pitrou <[email protected]>
    Signed-off-by: Joris Van den Bossche <[email protected]>
    3 people authored and raulcd committed Oct 11, 2023
    Configuration menu
    Copy the full SHA
    1e417cb View commit details
    Browse the repository at this point in the history
  2. apacheGH-37880: [CI][Python][Packaging] Add support for Python 3.12 (a…

    …pache#37901)
    
    ### Rationale for this change
    
    Python 3.12 will be released in the next couple of weeks. We should add the wheels for pyarrow on our 14.0.0 release.
    
    ### What changes are included in this PR?
    
    This PR adds jobs to build pyarrow wheels for Python 3.12.
    
    ### Are these changes tested?
    
    They will be tested via archery tasks
    
    ### Are there any user-facing changes?
    
    No but users will be able to use pyarrow with Python 3.12
    
    * Closes: apache#37880
    
    Authored-by: Raúl Cumplido <[email protected]>
    Signed-off-by: Raúl Cumplido <[email protected]>
    raulcd committed Oct 11, 2023
    Configuration menu
    Copy the full SHA
    4db1888 View commit details
    Browse the repository at this point in the history
  3. apacheGH-38201: [CI][Packaging] Pin zlib 1.2.13 when using thrift on …

    …conan (apache#38202)
    
    ### Rationale for this change
    There is a conflict between the required Zlib version when using both thrift and GRPC.
    
    ### What changes are included in this PR?
    
    Pinning zlib when using thrifht.
    
    ### Are these changes tested?
    
    Via archery
    
    ### Are there any user-facing changes?
    
    No
    * Closes: apache#38201
    
    Authored-by: Raúl Cumplido <[email protected]>
    Signed-off-by: Jacob Wujciak-Jens <[email protected]>
    raulcd committed Oct 11, 2023
    Configuration menu
    Copy the full SHA
    88fa946 View commit details
    Browse the repository at this point in the history

Commits on Oct 12, 2023

  1. apacheGH-38142: [R] Add NEWS for 14.0.0 (apache#38143)

    ### Rationale for this change
    
    The NEWS file needs updating for 14.0.0.
    
    ### What changes are included in this PR?
    
    The NEWS file is updated with commits since 13.0.0.
    
    ### Are these changes tested?
    
    N/A
    
    ### Are there any user-facing changes?
    
    No
    * Closes: apache#38142
    
    Lead-authored-by: Dewey Dunnington <[email protected]>
    Co-authored-by: Dewey Dunnington <[email protected]>
    Co-authored-by: Nic Crane <[email protected]>
    Signed-off-by: Dewey Dunnington <[email protected]>
    2 people authored and raulcd committed Oct 12, 2023
    Configuration menu
    Copy the full SHA
    f8259c9 View commit details
    Browse the repository at this point in the history
  2. apacheGH-38209: [Docs] Reduce width of header items and keep header h…

    …eight default (small) on smaller screens (apache#38148)
    
    ### Rationale for this change
    
    The Sphinx theme we have been using (PyData Sphinx Theme) has been pinned to an older version for a while now and with the apache#36591 we have updated the code and are now using version 0.14.0 for the dev docs.
    
    This PR fixes bugs we have encountered after the PR updating the theme has been merged.
    
    ### What changes are included in this PR?
    
    - Have default header size for smaller screens and keep it increased for bigger screens.
    
    * Closes: apache#38209
    
    Authored-by: AlenkaF <[email protected]>
    Signed-off-by: Joris Van den Bossche <[email protected]>
    AlenkaF authored and raulcd committed Oct 12, 2023
    Configuration menu
    Copy the full SHA
    b06cfd1 View commit details
    Browse the repository at this point in the history
  3. apacheGH-37510: [C++] Don't install bundled Azure SDK for C++ (apache…

    …#38176)
    
    ### Rationale for this change
    
    It's an internal bundled library. We should not install it as a part of Arrow.
    
    ### What changes are included in this PR?
    
    Exclude all Azure SDK for C++ jobs including install jobs aren't executed by default. Building jobs are executed because they are required to build Arrow.
    
    ### Are these changes tested?
    
    Yes.
    
    ### Are there any user-facing changes?
    
    Yes.
    * Closes: apache#37510
    
    Authored-by: Sutou Kouhei <[email protected]>
    Signed-off-by: Sutou Kouhei <[email protected]>
    kou authored and raulcd committed Oct 12, 2023
    Configuration menu
    Copy the full SHA
    2fad064 View commit details
    Browse the repository at this point in the history
  4. apacheGH-38200: [CI][Release][Go] Ensure removing all module caches (a…

    …pache#38222)
    
    ### Rationale for this change
    
    Module caches don't have write permission by owner. So we can remove them by `rm -rf`.
    
    ### What changes are included in this PR?
    
    Run `go clean -modcache` after all builds.
    
    ### Are these changes tested?
    
    Yes.
    
    ### Are there any user-facing changes?
    
    No.
    * Closes: apache#38200
    
    Authored-by: Sutou Kouhei <[email protected]>
    Signed-off-by: Sutou Kouhei <[email protected]>
    kou authored and raulcd committed Oct 12, 2023
    Configuration menu
    Copy the full SHA
    40e1513 View commit details
    Browse the repository at this point in the history
  5. apacheGH-38226: [R] Remove R 3.5 from test-r-versions (apache#38230)

    ### Rationale for this change
    
    The test-r-versions job is failing because not all of our dependencies support R 3.5. We follow the tidyverse support policy where possible, which means we only support R 3.6 and above. Thus, we can drop the test for R 3.5.
    
    ### What changes are included in this PR?
    
    R 3.5 was removed from the test matrix for test-r-versions
    
    ### Are these changes tested?
    
    Yes
    
    ### Are there any user-facing changes?
    
    No
    * Closes: apache#38226
    
    Authored-by: Dewey Dunnington <[email protected]>
    Signed-off-by: Jacob Wujciak-Jens <[email protected]>
    paleolimbot authored and raulcd committed Oct 12, 2023
    Configuration menu
    Copy the full SHA
    7c9ad8f View commit details
    Browse the repository at this point in the history

Commits on Oct 13, 2023

  1. apacheGH-38243: [CI][Python] Add missing dataset marker for dataset e…

    …ncryption tests (apache#38244)
    
    * Closes: apache#38243
    
    Authored-by: Joris Van den Bossche <[email protected]>
    Signed-off-by: Joris Van den Bossche <[email protected]>
    jorisvandenbossche authored and raulcd committed Oct 13, 2023
    Configuration menu
    Copy the full SHA
    bf7cc7e View commit details
    Browse the repository at this point in the history
  2. apacheGH-38228: [R] Fence examples that need dataset with examplesIf (

    apache#38229)
    
    ### Rationale for this change
    
    The minimal nightly build are failing with examples that won't run without the dataset feature
    
    ### What changes are included in this PR?
    
    - Added `examplesIf` where needed
    - Redocumented
    
    ### Are these changes tested?
    
    Yes, by all R CMD check jobs
    
    ### Are there any user-facing changes?
    
    No
    * Closes: apache#38228
    
    Authored-by: Dewey Dunnington <[email protected]>
    Signed-off-by: Dewey Dunnington <[email protected]>
    paleolimbot authored and raulcd committed Oct 13, 2023
    Configuration menu
    Copy the full SHA
    e1c66c0 View commit details
    Browse the repository at this point in the history
  3. apacheGH-38197: [R] Update actions that used setup-r@v1 to use setup-…

    …r@v2 (apache#38218)
    
    ### Rationale for this change
    
    CI jobs that used setup-r@ v1 no longer run without error.
    
    ### What changes are included in this PR?
    
    - Updated the rchk job to use the `setup-r@ v2`
    - Updated the devdocs job to use `setup-r@ v2`. To make this work, we needed to remove the Windows build because it was installing an old version of R. It seems that the job has been running an outdated and unsable (for most users) for a very long time.
    
    ### Are these changes tested?
    
    Will be covered by crossbow jobs submitted below.
    
    ### Are there any user-facing changes?
    
    No.
    * Closes: apache#38197
    
    Lead-authored-by: Dewey Dunnington <[email protected]>
    Co-authored-by: Dewey Dunnington <[email protected]>
    Signed-off-by: Dewey Dunnington <[email protected]>
    paleolimbot authored and raulcd committed Oct 13, 2023
    Configuration menu
    Copy the full SHA
    cd900bc View commit details
    Browse the repository at this point in the history
  4. apacheGH-38227: [R] Fix non-unicode character errors in nightly builds (

    apache#38232)
    
    ### Rationale for this change
    
    We have several nightly builds failing with errors building the manual as a result of unicode characters. The unicode characters aren't new, so I'm not sure why this happened now.
    
    ### What changes are included in this PR?
    
    Install a distribution of latex that supports unicode characters (maybe)?
    
    ### Are these changes tested?
    
    Yes
    
    ### Are there any user-facing changes?
    
    No
    * Closes: apache#38227
    
    Lead-authored-by: Dewey Dunnington <[email protected]>
    Co-authored-by: Dewey Dunnington <[email protected]>
    Signed-off-by: Jacob Wujciak-Jens <[email protected]>
    2 people authored and raulcd committed Oct 13, 2023
    Configuration menu
    Copy the full SHA
    3c2de94 View commit details
    Browse the repository at this point in the history
  5. apacheGH-37907: [R] Setting rosetta variable is missing (apache#37961)

    ### Rationale for this change
    
    The latest version of `r/R/install-arrow.R`  was not working properly, since it was relying on the `on_rosetta()` function, which is not defined elsewhere. I just fixed the identification of rosetta in the script.
    
    With the current code, the following gives an error
    
    ````r
    > source("https://raw.githubusercontent.com/apache/arrow/master/r/R/install-arrow.R") 
    > install_arrow()
    Error in on_rosetta() : could not find function "on_rosetta"
    ````
    
    ### What changes are included in this PR?
    
    It only removed the `on_rosetta()` function, which was not defined elsewhere, and reverted back to the `rosetta` object to identify if rosetta is present or not on a user's system.
    
    ### Are these changes tested?
    
    Yes. It was tested with the current code and the proposed PR. The proposed PR works as expected.
    
    ### Are there any user-facing changes?
    
    No.
    
    * Closes: apache#37907
    
    Lead-authored-by: Fernando Mayer <[email protected]>
    Co-authored-by: Jonathan Keane <[email protected]>
    Signed-off-by: Nic Crane <[email protected]>
    2 people authored and raulcd committed Oct 13, 2023
    Configuration menu
    Copy the full SHA
    677b42f View commit details
    Browse the repository at this point in the history
  6. apacheGH-37945: [R] Update developer documentation (apache#38220)

    ### Rationale for this change
    
    Several PRs over the last few months have update the build system to be more friendly for developers. During this process it has also come to light that we haven't supported the Windows development setup documented here since R 4.1 (released in spring 2021). I had to remove Windows from the test-r-devdocs job because the approach used there was not compatible with the `setup-r@ v2` action, and the job was failing with the `@ v1` action.
    
    ### What changes are included in this PR?
    
    - Updated the sections on using pre-built static libraries and bundled builds
    - Removed the Windows section regarding the bundled build. This section would need rewriting to support the last two minor releases of R but in the meantime I think it is mostly confusing.
    
    ### Are these changes tested?
    
    They are documentation changes. They are also slightly optimisitc: we can fix problems with the developer setup incrementally between releases, but it's more difficult to update our documentation. This PR documents the intended behaviour after apache#38236 .
    
    ### Are there any user-facing changes?
    
    No.
    * Closes: apache#37945
    
    Lead-authored-by: Dewey Dunnington <[email protected]>
    Co-authored-by: Dewey Dunnington <[email protected]>
    Co-authored-by: Jacob Wujciak-Jens <[email protected]>
    Signed-off-by: Dewey Dunnington <[email protected]>
    2 people authored and raulcd committed Oct 13, 2023
    Configuration menu
    Copy the full SHA
    e5bcfd3 View commit details
    Browse the repository at this point in the history
  7. apacheGH-38043: [R] Enable all features by default on macOS (apache#3…

    …8195)
    
    ### Rationale for this change
    
    Previously GCS/S3 support would need to be explicitly enabled in source builds (when they are build without `NOT_CRAN`). As we want the macos binaries to be fully featured we should turn the features on when the dependencies exists. 
    
    ### What changes are included in this PR?
    
    This PR enables this behavior for macOS only, on Linux setting `NOT_CRAN` or  `LIBARROW_MINIMAL=false` is still required.
    
    ### Are these changes tested?
    
    Crossbow and locally (thanks @ paleolimbot )
    * Closes: apache#38043
    
    Lead-authored-by: Jacob Wujciak-Jens <[email protected]>
    Co-authored-by: Dewey Dunnington <[email protected]>
    Co-authored-by: Jonathan Keane <[email protected]>
    Signed-off-by: Jacob Wujciak-Jens <[email protected]>
    3 people authored and raulcd committed Oct 13, 2023
    Configuration menu
    Copy the full SHA
    6cc8214 View commit details
    Browse the repository at this point in the history

Commits on Oct 16, 2023

  1. apacheGH-38239: [CI][Python] Disable -W error on Python CI jobs tempo…

    …rarily (apache#38238)
    
    * Closes: apache#38239
    
    Lead-authored-by: Joris Van den Bossche <[email protected]>
    Co-authored-by: Raúl Cumplido <[email protected]>
    Co-authored-by: Jacob Wujciak-Jens <[email protected]>
    Signed-off-by: Raúl Cumplido <[email protected]>
    3 people committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    2fa6812 View commit details
    Browse the repository at this point in the history
  2. apacheGH-38074: [C++] Fix Offset Size Calculation for Slicing Large S…

    …tring and Binary Types in Hash Join (apache#38147)
    
    ### Rationale for this change
    
    We found that the wrong results in inner joins during hash join operations were caused by a problem with how large strings and binary types were handled. The `Slice` function was not calculating their sizes correctly.
    
    To fix this, I changed the `Slice` function to calculate the sizes correctly, based on the type of data for large string and binary. 
    
    * Issue raised: apache#37729 
    
    ### What changes are included in this PR?
    
    * The `Slice` function has been updated to correctly calculate the offset for Large String and Large Binary types, and assertion statements have been added to improve maintainability.
    * Unit tests (`TEST(KeyColumnArray, SliceBinaryTest)`)for the Slice function have been added. 
    * During random tests for Hash Join (`TEST(HashJoin, Random)`), modifications were made to allow the creation of Large String as key column values.
    
    ### Are these changes tested?
    
    Yes
    
    ### Are there any user-facing changes?
    
    Acero might not have a large user base as it is an experimental feature, but I deemed the issue of incorrect join results as critical and have addressed the bug.
    
    * Closes: apache#38074
    
    Authored-by: Hyunseok Seo <[email protected]>
    Signed-off-by: Antoine Pitrou <[email protected]>
    llama90 authored and raulcd committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    bb3fe3f View commit details
    Browse the repository at this point in the history
  3. apacheGH-38263 [C++]: Prefer to call string_view::data() instead of b…

    …egin() where a char pointer is expected (apache#38265)
    
    ### Rationale for this change
    
    The MSVC compiler doesn't seem to allow user code to assume `std::string_view::const_iterator` is `const char*`, so using only `re2::StringPiece` and preferring to call `.data()` instead of `.begin()` should make things more uniform across different compilers and STL implementations.
    
    ### What changes are included in this PR?
    
     - Using `re2::StringPiece` instead of `std::string_view` to interact with `re2`
     - Use `data()` instead of `begin()` where a `char*` is expected
    
    ### Are these changes tested?
    
    Yes, by existing tests.
    * Closes: apache#38263
    
    Authored-by: Felipe Oliveira Carvalho <[email protected]>
    Signed-off-by: Raúl Cumplido <[email protected]>
    felipecrv authored and raulcd committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    41bb5e4 View commit details
    Browse the repository at this point in the history
  4. apacheGH-38206: [CI] Remove more pre-installed files (apache#38233)

    ### Rationale for this change
    
    We need more disk space...
    
    ### What changes are included in this PR?
    
    Remove more pre-installed files.
    
    ### Are these changes tested?
    
    Yes.
    
    ### Are there any user-facing changes?
    
    No.
    * Closes: apache#38206
    
    Authored-by: Sutou Kouhei <[email protected]>
    Signed-off-by: Raúl Cumplido <[email protected]>
    kou authored and raulcd committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    ef21f3d View commit details
    Browse the repository at this point in the history
  5. MINOR: [Go] Bump golang.org/x/net from 0.15.0 to 0.17.0 in /go (apach…

    …e#38225)
    
    Bumps [golang.org/x/net](https://github.com/golang/net) from 0.15.0 to 0.17.0.
    <details>
    <summary>Commits</summary>
    <ul>
    <li><a href="https://github.com/golang/net/commit/b225e7ca6dde1ef5a5ae5ce922861bda011cfabd"><code>b225e7c</code></a> http2: limit maximum handler goroutines to MaxConcurrentStreams</li>
    <li><a href="https://github.com/golang/net/commit/88194ad8ab44a02ea952c169883c3f57db6cf9f4"><code>88194ad</code></a> go.mod: update golang.org/x dependencies</li>
    <li><a href="https://github.com/golang/net/commit/2b60a61f1e4cf3a5ecded0bd7e77ea168289e6de"><code>2b60a61</code></a> quic: fix several bugs in flow control accounting</li>
    <li><a href="https://github.com/golang/net/commit/73d82efb96cacc0c378bc150b56675fc191894b9"><code>73d82ef</code></a> quic: handle DATA_BLOCKED frames</li>
    <li><a href="https://github.com/golang/net/commit/5d5a036a503f8accd748f7453c0162115187be13"><code>5d5a036</code></a> quic: handle streams moving from the data queue to the meta queue</li>
    <li><a href="https://github.com/golang/net/commit/350aad2603e57013fafb1a9e2089a382fe67dc80"><code>350aad2</code></a> quic: correctly extend peer's flow control window after MAX_DATA</li>
    <li><a href="https://github.com/golang/net/commit/21814e71db756f39b69fb1a3e06350fa555a79b1"><code>21814e7</code></a> quic: validate connection id transport parameters</li>
    <li><a href="https://github.com/golang/net/commit/a600b3518eed7a9a4e24380b4b249cb986d9b64d"><code>a600b35</code></a> quic: avoid redundant MAX_DATA updates</li>
    <li><a href="https://github.com/golang/net/commit/ea633599b58dc6a50d33c7f5438edfaa8bc313df"><code>ea63359</code></a> http2: check stream body is present on read timeout</li>
    <li><a href="https://github.com/golang/net/commit/ddd8598e5694aa5e966e44573a53e895f6fa5eb2"><code>ddd8598</code></a> quic: version negotiation</li>
    <li>Additional commits viewable in <a href="https://github.com/golang/net/compare/v0.15.0...v0.17.0">compare view</a></li>
    </ul>
    </details>
    <br />
    
    [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=golang.org/x/net&package-manager=go_modules&previous-version=0.15.0&new-version=0.17.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
    
    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@ dependabot rebase`.
    
    [//]: # (dependabot-automerge-start)
    [//]: # (dependabot-automerge-end)
    
    ---
    
    <details>
    <summary>Dependabot commands and options</summary>
    <br />
    
    You can trigger Dependabot actions by commenting on this PR:
    - `@ dependabot rebase` will rebase this PR
    - `@ dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
    - `@ dependabot merge` will merge this PR after your CI passes on it
    - `@ dependabot squash and merge` will squash and merge this PR after your CI passes on it
    - `@ dependabot cancel merge` will cancel a previously requested merge and block automerging
    - `@ dependabot reopen` will reopen this PR if it is closed
    - `@ dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    - `@ dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
    - `@ dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    - `@ dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    - `@ dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/apache/arrow/network/alerts).
    
    </details>
    
    Authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    Signed-off-by: Matt Topol <[email protected]>
    dependabot[bot] authored and raulcd committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    bc5b518 View commit details
    Browse the repository at this point in the history
  6. apacheGH-38285: [Go] Slight deps and docs update (apache#38284)

    ### Rationale for this change
    Making sure the documentation that shows up on pkg.go.dev will show that the package is compatible with go1.19+
    
    ### What changes are included in this PR?
    slight patch/minor version updates of some dependencies along with a documentation update in `doc.go`.
    * Closes: apache#38285
    
    Authored-by: Matt Topol <[email protected]>
    Signed-off-by: Raúl Cumplido <[email protected]>
    zeroshade authored and raulcd committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    55adf78 View commit details
    Browse the repository at this point in the history
  7. apacheGH-38282 [C++]: Implement ReplaceString with the right type sig…

    …nature (apache#38283)
    
    ### Rationale for this change
    
    The type signature of `ReplaceString` should be identical when arrow is compiled with or without `ARROW_WITH_RE2`.
    
    ### What changes are included in this PR?
    
    The right signature + delegating to the implementation that takes `re2::StringPiece`. The conversion should be a no-op when compiled and optimized.
    
    ### Are these changes tested?
    
    By existing tests and CI checks.
    
    * Closes: apache#38282
    
    Authored-by: Felipe Oliveira Carvalho <[email protected]>
    Signed-off-by: Raúl Cumplido <[email protected]>
    felipecrv authored and raulcd committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    de7d3c6 View commit details
    Browse the repository at this point in the history

Commits on Oct 17, 2023

  1. MINOR: [Go] Bump go versions for testing nightly tasks (apache#38289)

    ### What changes are included in this PR?
    
    Bump versions of Go for our nightly tests to match supported Go versions
    
    ### Are these changes tested?
    Via archery
    
    ### Are there any user-facing changes?
    
    No
    
    Authored-by: Raúl Cumplido <[email protected]>
    Signed-off-by: Jacob Wujciak-Jens <[email protected]>
    raulcd committed Oct 17, 2023
    Configuration menu
    Copy the full SHA
    9f7d6c4 View commit details
    Browse the repository at this point in the history
  2. apacheGH-38286: [CI][R] Clean GitHub runner disk for ubuntu-r-only-r …

    …images (apache#38287)
    
    ### Rationale for this change
    Fix CI failures for job that is getting out of space.
    
    ### What changes are included in this PR?
    
    Using our free disk space script to add space for the ubuntu-r-only-r images.
    
    ### Are these changes tested?
    
    On CI
    
    ### Are there any user-facing changes?
    No
    * Closes: apache#38286
    
    Authored-by: Raúl Cumplido <[email protected]>
    Signed-off-by: Jacob Wujciak-Jens <[email protected]>
    raulcd committed Oct 17, 2023
    Configuration menu
    Copy the full SHA
    e59abda View commit details
    Browse the repository at this point in the history
  3. apacheGH-38293: [R] Fix non-deterministic duckdb test (apache#38294)

    ### Rationale for this change
    
    The test fail with the latest version of duckdb (0.9.1).
    
    ### What changes are included in this PR?
    
    The test was changed so that it did not depend on non-deterministic behaviour. We sort all of the other expectations involving a group_by to avoid this problem...we hadn't changed this one yet because it didn't fail in any previous version of duckdb.
    
    ### Are these changes tested?
    
    Yes
    
    ### Are there any user-facing changes?
    
    No
    * Closes: apache#38293
    
    Authored-by: Dewey Dunnington <[email protected]>
    Signed-off-by: Dewey Dunnington <[email protected]>
    paleolimbot authored and raulcd committed Oct 17, 2023
    Configuration menu
    Copy the full SHA
    676b2dd View commit details
    Browse the repository at this point in the history
  4. apacheGH-38240: [Docs] version_match should match the version from ve…

    …rsions.json (apache#38241)
    
    This PR corrects the version for the `version_match` to be equal to the version defined in versions.json. This way the text is correctly displayed in the version switcher button.
    * Closes: apache#38240
    
    Authored-by: AlenkaF <[email protected]>
    Signed-off-by: Joris Van den Bossche <[email protected]>
    AlenkaF authored and raulcd committed Oct 17, 2023
    Configuration menu
    Copy the full SHA
    f208e7f View commit details
    Browse the repository at this point in the history
  5. apacheGH-38295: [CI][R] Free up disk space for Azure Pipelines jobs (a…

    …pache#38302)
    
    ### Rationale for this change
    
    test-r-rhub-ubuntu-gcc-release-latest doesn't have enough disk space.
    
    ### What changes are included in this PR?
    
    Remove pre-installed files on Azure Pipelines too.
    
    ### Are these changes tested?
    
    Yes.
    
    ### Are there any user-facing changes?
    
    No.
    * Closes: apache#38295
    
    Authored-by: Sutou Kouhei <[email protected]>
    Signed-off-by: Raúl Cumplido <[email protected]>
    kou authored and raulcd committed Oct 17, 2023
    Configuration menu
    Copy the full SHA
    ac48f57 View commit details
    Browse the repository at this point in the history
  6. apacheGH-36994: [Java] Use JDK 21 in CI (apache#38219)

    ### Rationale for this change
    
    Verify JDK 21 in CI in time for the Arrow v14 release.
    
    ### What changes are included in this PR?
    
    * Bump latest Java version from 20 -> 21 in CI
    
    ### Are these changes tested?
    
    Yes, via CI.
    
    ### Are there any user-facing changes?
    
    No.
    * Closes: apache#36994
    
    Authored-by: Dane Pitkin <[email protected]>
    Signed-off-by: Raúl Cumplido <[email protected]>
    danepitkin authored and raulcd committed Oct 17, 2023
    Configuration menu
    Copy the full SHA
    df0b700 View commit details
    Browse the repository at this point in the history
  7. MINOR: [R] Avoid stray output from expr when checking for 10.13 (apac…

    …he#38303)
    
    ### Rationale for this change
    
    `expr` was printing the number of matching chars which showed up as noise in the log (which we want to avoid as much as possible to avoid any false positive checks)
    See apache#38236 (comment) for @ jonkeane's investigation.
    
    ### What changes are included in this PR?
    
    Replace use of expr with test.
    
    ### Are these changes tested?
    Crossbow
    
    Lead-authored-by: Jacob Wujciak-Jens <[email protected]>
    Co-authored-by: Jonathan Keane <[email protected]>
    Signed-off-by: Jonathan Keane <[email protected]>
    2 people authored and raulcd committed Oct 17, 2023
    Configuration menu
    Copy the full SHA
    b65c4f5 View commit details
    Browse the repository at this point in the history

Commits on Oct 18, 2023

  1. apacheGH-38312: [Docs] Add the Arrow C Device data interface page to …

    …the sidebar TOC (apache#38313)
    
    * Closes: apache#38312
    
    Authored-by: Joris Van den Bossche <[email protected]>
    Signed-off-by: Joris Van den Bossche <[email protected]>
    jorisvandenbossche authored and raulcd committed Oct 18, 2023
    Configuration menu
    Copy the full SHA
    1b3f498 View commit details
    Browse the repository at this point in the history
  2. apacheGH-35531: [Python] C Data Interface PyCapsule Protocol (apache#…

    …37797)
    
    ### Rationale for this change
    
    ### What changes are included in this PR?
    
    * A new specification for Arrow PyCapsules and related dunder methods
    * Implementing the dunder methods for `DataType`, `Field`, `Schema`, `Array`, `RecordBatch`, `Table`, and `RecordBatchReader`.
    
    ### Are these changes tested?
    
    Yes, I've added various roundtrip tests for each of the types.
    
    ### Are there any user-facing changes?
    
    This introduces some new APIs and documents them.
    
    * Closes: apache#34031
    * Closes: apache#35531
    
    Authored-by: Joris Van den Bossche <[email protected]>
    Signed-off-by: Antoine Pitrou <[email protected]>
    wjones127 authored and raulcd committed Oct 18, 2023
    Configuration menu
    Copy the full SHA
    bd61239 View commit details
    Browse the repository at this point in the history

Commits on Oct 19, 2023

  1. apacheGH-38332: [CI][Release] Resolve symlinks in RAT lint (apache#38337

    )
    
    ### Rationale for this change
    
    Our release script (`dev/release/02-source.sh`) resolves symlinks in source archive but our lint script (`dev/archery/archery/utils/source.py`) doesn't resolve symlinks. So we may detect RAT problem by our CI.
    
    ### What changes are included in this PR?
    
    Resolve symlinks in our lint script too.
    
    ### Are these changes tested?
    
    Yes.
    
    ### Are there any user-facing changes?
    
    No.
    * Closes: apache#38332
    
    Lead-authored-by: Sutou Kouhei <[email protected]>
    Co-authored-by: Sutou Kouhei <[email protected]>
    Co-authored-by: Raúl Cumplido <[email protected]>
    Signed-off-by: Raúl Cumplido <[email protected]>
    3 people committed Oct 19, 2023
    Configuration menu
    Copy the full SHA
    9f90995 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    3e9734f View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    297428c View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    2dcee3f View commit details
    Browse the repository at this point in the history

Commits on Nov 6, 2023

  1. apacheGH-38431: [Python][CI] Update fs.type_name checks for s3fs tests (

    apache#38455)
    
    ### Rationale for this change
    
    Appveyor CI is failing https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/48347810. It seems the reason for the test failures is a change in the `type_name` (from `"py::fsspec+s3"` to `"py::fsspec+('s3', 's3a')"`) and due to it tests are not being skipped.
    
    ### What changes are included in this PR?
    
    Update the check for `type_name` in case of `PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))`.
    * Closes: apache#38431
    
    Authored-by: AlenkaF <[email protected]>
    Signed-off-by: Sutou Kouhei <[email protected]>
    AlenkaF authored and raulcd committed Nov 6, 2023
    Configuration menu
    Copy the full SHA
    5a37e74 View commit details
    Browse the repository at this point in the history
  2. apacheGH-38607: [Python] Disable PyExtensionType autoload (apache#38608)

    ### Rationale for this change
    
    PyExtensionType autoload is really a misfeature. It creates PyArrow-specific extension types, though using ExtensionType is almost the same complexity while allowing deserialization from non-PyArrow software.
    
    ### What changes are included in this PR?
    
    * Disable PyExtensionType autoloading and deprecate PyExtensionType instantiation.
    * Update the docs to emphasize ExtensionType.
    
    ### Are these changes tested?
    
    Yes.
    
    ### Are there any user-facing changes?
    
    Yes.
    
    * Closes: apache#38607
    
    Authored-by: Antoine Pitrou <[email protected]>
    Signed-off-by: Raúl Cumplido <[email protected]>
    pitrou authored and raulcd committed Nov 6, 2023
    Configuration menu
    Copy the full SHA
    f141709 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    b84bbca View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    529f376 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    ba53748 View commit details
    Browse the repository at this point in the history

Commits on Nov 28, 2023

  1. apacheGH-38364: [Python] Initialize S3 on first use (apache#38375)

    ### Rationale for this change
    
    In accordance to apache#38364, we believe that for various reasons (shortening import time, preventing unnecessary resource consumption and potential bugs with S3 library) it is appropriate to avoid initialization of S3 resources at import time and move that step to occur at first-use.
    
    ### What changes are included in this PR?
    
    - Remove calls to `ensure_s3_initialized()` that were up until now executed during `import pyarrow.fs`;
    - Move `ensure_s3_intialized()` calls to `python/pyarrow/_s3fs.pyx` module;
    - Add global flag to mark whether S3 has been previously initialized and `atexit` handlers registered.
    
    ### Are these changes tested?
    
    Yes, existing S3 tests check whether it has been initialized, otherwise failing with a C++ exception.
    
    ### Are there any user-facing changes?
    
    No, the behavior is now slightly different with S3 initialization not happening immediately after `pyarrow.fs` is imported, but no changes are expected from a user perspective relying on the public API alone.
    
    **This PR contains a "Critical Fix".**
    A bug in aws-sdk-cpp reported in aws/aws-sdk-cpp#2681 causes segmentation faults under specific circumstances when Python processes shutdown, specifically observed with Dask+GPUs (so far we were unable to pinpoint the exact correlation of Dask+GPUs+S3). While this definitely doesn't seem to affect all users and is not directly sourced in Arrow, it may affect use cases that are completely independent of S3 to operate, which is particularly problematic in CI where all tests pass successfully but the process crashes at shutdown.
    * Closes: apache#38364
    
    Lead-authored-by: Peter Andreas Entschev <[email protected]>
    Co-authored-by: Antoine Pitrou <[email protected]>
    Signed-off-by: Antoine Pitrou <[email protected]>
    2 people authored and raulcd committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    4ed68c0 View commit details
    Browse the repository at this point in the history
  2. apacheGH-27839: [R] Fetch latest nightly binary for arrow R dev versi…

    …ons. (apache#38236)
    
    ### Rationale for this change
    
    We currently need to manually download the latest nightly build as the ".9000' dev version never matches any nightly builds.
    
    ### What changes are included in this PR?
    
    Check the nightly  repo html listing for the latest version matching the local major version.
    Refactor  nixlibs.R and integrate the functionality of winlibs.R  into it to simplify maintenance and prepare for the future of in-line windows libarrow builds.
    
    These changes are build on apache#38195 so once that is merged I will rebase this.
    
    ### Are these changes tested?
    
    Crossbow
    * Closes: apache#27839
    
    Lead-authored-by: Jacob Wujciak-Jens <[email protected]>
    Co-authored-by: Dewey Dunnington <[email protected]>
    Signed-off-by: Jacob Wujciak-Jens <[email protected]>
    2 people authored and raulcd committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    66ca387 View commit details
    Browse the repository at this point in the history
  3. apacheGH-38570: [R] Ensure that test-nix-libs is warning free (apache…

    …#38571)
    
    ### Rationale for this change
    
    Although we stop / fail the CI job for warnings elsewhere, we don't seem to be doing that for our buildsystem tests. Let's change that so warnings + other output doesn't creep in.
    
    - [x] Prevent `*** Failed to find latest nightly for 8.0.0.9000` from showing up when the file is sourced under test
    - [x] fix `Warning ('test-nixlibs.R:140:3'): select_binary() with test program 'x' is NULL so the result will be NULL`
    - [x] update version strings to be characters to avoid r-devel warnings
    
    ### What changes are included in this PR?
    
    Added `stop_on_warning` to the test call, and fixes to make that pass
    
    ### Are these changes tested?
    
    Yes, they are all part of the test suite 
    
    ### Are there any user-facing changes?
    
    No.
    * Closes: apache#38570
    
    Authored-by: Jonathan Keane <[email protected]>
    Signed-off-by: Jacob Wujciak-Jens <[email protected]>
    jonkeane authored and raulcd committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    312b927 View commit details
    Browse the repository at this point in the history
  4. apacheGH-38591: [Parquet][C++] Remove redundant open calls in `Parque…

    …tFileFormat::GetReaderAsync` (apache#38621)
    
    ### Rationale for this change
    There were duplicate method calls causing extra I/O operations, apparently unintentional from apache@0793432.
    
    ### What changes are included in this PR?
    Remove the extra method calls.
    
    ### Are these changes tested?
    
    ### Are there any user-facing changes?
    
    * Closes: apache#38591
    
    Authored-by: Eero Lihavainen <[email protected]>
    Signed-off-by: mwish <[email protected]>
    eeroel authored and raulcd committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    ecfab3f View commit details
    Browse the repository at this point in the history
  5. apacheGH-38430: [R] Add test + fix corner cases after nixlibs.R refac…

    …tor (apache#38534)
    
    ### Rationale for this change
    
    A few rough edges exist after apache#38236 including:
    
    - When zero or 1 nightly with the matching major version exist, detection of the latest nightly might fail
    - At least one CI job is pulling nightlies instead of using the version from the current commit
    
    ### What changes are included in this PR?
    
    - Clean up `find_latest_nightly()` + add test
    - Ensure all CI jobs are not using `find_latest_nightly()`
    
    ### Are these changes tested?
    
    Yes (test added)
    
    ### Are there any user-facing changes?
    
    No
    * Closes: apache#38430
    
    Lead-authored-by: Dewey Dunnington <[email protected]>
    Co-authored-by: Jonathan Keane <[email protected]>
    Signed-off-by: Dewey Dunnington <[email protected]>
    2 people authored and raulcd committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    b0b4eb2 View commit details
    Browse the repository at this point in the history
  6. apacheGH-38626: [Python] Fix segfault when PyArrow is imported at shu…

    …tdown (apache#38637)
    
    ### Rationale for this change
    
    Some C++ destructors may be called after the Python interpreter has ceased to exist.
    If such a destructor tries to call back in the Python interpreter, for example by calling `Py_DECREF`, we get a crash.
    
    ### What changes are included in this PR?
    
    Protect `OwnedRef` and `OwneRefNoGIL` destructors against decref'ing a Python object after Python finalization.
    
    ### Are these changes tested?
    
    Yes.
    
    ### Are there any user-facing changes?
    
    No.
    * Closes: apache#38626
    
    Authored-by: Antoine Pitrou <[email protected]>
    Signed-off-by: Joris Van den Bossche <[email protected]>
    pitrou authored and raulcd committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    039582d View commit details
    Browse the repository at this point in the history
  7. apacheGH-38715: [R] Fix possible bashism in configure script (apache#…

    …38716)
    
    ### Rationale for this change
    
    The CRAN incoming check for 14.0.0 is failing with a NOTE about a possible bashism
    
    ### What changes are included in this PR?
    
    One `test -a` usage was replaced with `&&`.
    
    ### Are these changes tested?
    
    Yes (via crossbow, below)
    
    ### Are there any user-facing changes?
    
    No
    * Closes: apache#38715
    
    Authored-by: Dewey Dunnington <[email protected]>
    Signed-off-by: Dewey Dunnington <[email protected]>
    paleolimbot authored and raulcd committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    1e05986 View commit details
    Browse the repository at this point in the history
  8. apacheGH-38676: [Python] Fix potential deadlock when CSV reading erro…

    …rs out (apache#38713)
    
    ### Rationale for this change
    
    A deadlock can happen in a C++ destructor in the following case:
    * the C++ destructor is called from Python, holding the GIL
    * the C++ destructor waits for a threaded task to finish
    * the threaded task has invoked some Python code which is waiting to acquire the GIL
    
    ### What changes are included in this PR?
    
    To reliably present such a deadlock, introduce `std::shared_ptr` and `std::unique_ptr` wrappers that release the GIL when deallocating the embedded pointer.
    
    ### Are these changes tested?
    
    Yes.
    
    ### Are there any user-facing changes?
    
    No.
    * Closes: apache#38676
    
    Authored-by: Antoine Pitrou <[email protected]>
    Signed-off-by: Antoine Pitrou <[email protected]>
    pitrou authored and raulcd committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    17dc523 View commit details
    Browse the repository at this point in the history
  9. apacheGH-38752: [R] Wrap rosetta detection in tryCatch (apache#38754)

    ### Rationale for this change
    
    We should never allow rosetta checking from causing an error
    
    ### What changes are included in this PR?
    
    ~Wrap rosetta checking in a tryCatch~ our use of `try()` wasn't doing what we thought, it actually needs to have `silent = TRUE` specified to _not_ error.
    
    ### Are these changes tested?
    
    I tested them locally by manipulating the system call to a mangled command that doesn't exist, observing the error on load, then wrapping in trycatch. We might consider adding a test in CI, though there would be considerable complexity for something like that
    
    ### Are there any user-facing changes?
    
    No, though we will need to pull it into any point release
    * Closes: apache#38752
    
    Authored-by: Jonathan Keane <[email protected]>
    Signed-off-by: Jacob Wujciak-Jens <[email protected]>
    jonkeane authored and raulcd committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    490b7c2 View commit details
    Browse the repository at this point in the history
  10. apacheGH-38438: [C++] Dataset: Trying to fix the async bug in Parquet…

    … dataset (apache#38466)
    
    ### Rationale for this change
    
    Origin mentioned apache#38438
    
    1. When PreBuffer is default enabled, the code in `RowGroupGenerator::FetchNext` would switch to async mode. This make the state handling more complex
    2. In `RowGroupGenerator::FetchNext`, `[this]` is captured without `shared_from_this`. This is not bad, however, `this->executor_` may point to a invalid address if this dtor.
    
    This patch also fixes a lifetime issue I founded in CSV handling.
    
    ### What changes are included in this PR?
    
    1. Fix handling in `cpp/src/parquet/arrow/reader.cc` as I talked above
    2. Fix a lifetime problem in CSV
    
    ### Are these changes tested?
    
    I test it locality. But don't know how to write unittest here. Fell free to help.
    
    ### Are there any user-facing changes?
    
    Bugfix
    
    * Closes: apache#38438
    
    Authored-by: mwish <[email protected]>
    Signed-off-by: Benjamin Kietzman <[email protected]>
    mapleFU authored and raulcd committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    17e9099 View commit details
    Browse the repository at this point in the history
  11. apacheGH-38766: [R] Add timeout option to try_download (apache#38767)

    ### Rationale for this change
    
    The download of static libraries during installation might be causing an install failure: https://www.r-project.org/nosvn/R.check/r-devel-windows-x86_64/arrow-00install.html
    
    ### What changes are included in this PR?
    
    The timeout value is temporarily increased according to guidance in the help for `download.file()`
    
    ### Are these changes tested?
    
    Yes, this code runs during install for at least some CI jobs (also used to download cmake)
    
    ### Are there any user-facing changes?
    
    No
    * Closes: apache#38766
    
    Lead-authored-by: Dewey Dunnington <[email protected]>
    Co-authored-by: Dewey Dunnington <[email protected]>
    Signed-off-by: Jacob Wujciak-Jens <[email protected]>
    2 people authored and raulcd committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    d660c9f View commit details
    Browse the repository at this point in the history
  12. apacheGH-38756: [R] More debug output for r/configure and nixlibs.R (a…

    …pache#38819)
    
    ### Rationale for this change
    
    It  hinders debug-ability for users if the failing log doesn't include all info by default.
    
    ### What changes are included in this PR?
    
    Add debug output to test compile command in r/configure and always display output with regards to the binary download.
    
    ### Are these changes tested?
    crossbow, locally
    * Closes: apache#38756
    
    Lead-authored-by: Jacob Wujciak-Jens <[email protected]>
    Co-authored-by: Dewey Dunnington <[email protected]>
    Signed-off-by: Jacob Wujciak-Jens <[email protected]>
    2 people authored and raulcd committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    6803a0b View commit details
    Browse the repository at this point in the history
  13. apacheGH-38861: [C++] Add missing "-framework Security" to Libs.priva…

    …te in arrow.pc (apache#38869)
    
    ### Rationale for this change
    
    It's required only when:
    
    * We use bundled aws-sdk-cpp.
    * We use static library for Apache Arrow C++.
    
    Because bundled aws-sdk-cpp uses Security framework.
    
    ### What changes are included in this PR?
    
    Add `-framework Security` to `Libs.private` only on the condition.
    
    ### Are these changes tested?
    
    Yes.
    
    ### Are there any user-facing changes?
    
    Yes.
    * Closes: apache#38861
    
    Authored-by: Sutou Kouhei <[email protected]>
    Signed-off-by: Jacob Wujciak-Jens <[email protected]>
    kou authored and raulcd committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    c969534 View commit details
    Browse the repository at this point in the history
  14. apacheGH-38432: [C++][Parquet] Try to fix performance regression in t…

    …he DictByteArrayDecoderImpl (apache#38784)
    
    ### Rationale for this change
    
    Do some changes mentioned in apache#38432
    
    I believe this might fix apache#38577
    
    Problem1:
    
    The `BinaryHelper` might call `Prepare()` and `Prepare(estimated-output-binary-length)` for data. This might because:
    
    1. For Plain Encoding ByteArray, the `len_` is similar to the data-page size, so `Reserve` is related.
    2. For Dict Encoding. The Data Page is just a RLE encoding Page, it's `len_` might didn't directly related to output-binary. 
    
    Problem2:
    
    `Prepare` using `::arrow::kBinaryMemoryLimit` as min-value, we should use `this->chunk_space_remaining_`.
    
    Problem3:
    
    `std::optional<int64_t>` is hard to optimize for some compilers
    
    ### What changes are included in this PR?
    
    Mention the behavior of BinaryHelper. And trying to fix it.
    
    ### Are these changes tested?
    
    No
    
    ### Are there any user-facing changes?
    
    Regression fixes
    
    * Closes: apache#38432
    
    Lead-authored-by: mwish <[email protected]>
    Co-authored-by: mwish <[email protected]>
    Co-authored-by: Gang Wu <[email protected]>
    Signed-off-by: Antoine Pitrou <[email protected]>
    3 people authored and raulcd committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    218e3d8 View commit details
    Browse the repository at this point in the history
  15. apacheGH-38893: [R] Fix printf syntax in altrep.cpp (apache#38894)

    ### Rationale for this change
    
    We have CI errors and CRAN check errors on R-devel, where the appropriate attribute for printf format checking was just added.
    
    ### What changes are included in this PR?
    
    The appopriate types are now used for printf parameters.
    
    ### Are these changes tested?
    
    Covered by existing tests
    
    ### Are there any user-facing changes?
    
    No
    * Closes: apache#38893
    
    Authored-by: Dewey Dunnington <[email protected]>
    Signed-off-by: Jacob Wujciak-Jens <[email protected]>
    paleolimbot authored and raulcd committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    1484635 View commit details
    Browse the repository at this point in the history
  16. apacheGH-38864: [R] Update NEWS.md for 14.0.0.1 (apache#38866)

    ### What changes are included in this PR?
    
    Update NEWS file in R package for 14.0.0.1
    
    ### Are these changes tested?
    
    No
    
    ### Are there any user-facing changes?
    
    No
    * Closes: apache#38864
    
    Authored-by: Nic Crane <[email protected]>
    Signed-off-by: Nic Crane <[email protected]>
    thisisnic authored and raulcd committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    7f6a813 View commit details
    Browse the repository at this point in the history

Commits on Dec 4, 2023

  1. apacheGH-38779: [R][CI] Use devtools on self-hosted machines and use …

    …macos-11 for intel package build (apache#38974)
    
    ### Rationale for this change
    
    The action does not work smoothly on the self-hosted runners.
    
    ### What changes are included in this PR?
    
    Use devtools instead.
    
    ### Are these changes tested?
    crossbow
    * Closes: apache#38779
    
    Authored-by: Jacob Wujciak-Jens <[email protected]>
    Signed-off-by: Jacob Wujciak-Jens <[email protected]>
    assignUser authored and raulcd committed Dec 4, 2023
    Configuration menu
    Copy the full SHA
    a979718 View commit details
    Browse the repository at this point in the history
  2. apacheGH-38984: [Python][Packaging] Verification of wheels on AlmaLin…

    …ux 8 are failing due to missing pip (apache#38985)
    
    ### Rationale for this change
    
    Almalinux 8 has been updated from 8.8 to 8.9. When using 8.9 python3 seems to be shipped without pip as  the command `python3 -m pip install -U pip` fails to find pip.
    
    ### What changes are included in this PR?
    
    Use the [ensurepip package](https://docs.python.org/3/library/ensurepip.html) which provides support for bootstrapping the pip installer into an existing Python installation.
    
    ### Are these changes tested?
    
    Yes via archery.
    
    ### Are there any user-facing changes?
    
    No
    * Closes: apache#38984
    
    Authored-by: Raúl Cumplido <[email protected]>
    Signed-off-by: Sutou Kouhei <[email protected]>
    raulcd committed Dec 4, 2023
    Configuration menu
    Copy the full SHA
    ef2f46b View commit details
    Browse the repository at this point in the history
  3. apacheGH-38342: [Python] Update to_pandas to use non-deprecated DataF…

    …rame constructor (apache#38374)
    
    ### Rationale for this change
    
    Avoiding a deprecation warning from pandas
    
    * Closes: apache#38342
    
    Authored-by: Joris Van den Bossche <[email protected]>
    Signed-off-by: Joris Van den Bossche <[email protected]>
    jorisvandenbossche authored and raulcd committed Dec 4, 2023
    Configuration menu
    Copy the full SHA
    8f50fe5 View commit details
    Browse the repository at this point in the history
  4. apacheGH-38902: [R] Handle failing library detection with pkg-config (a…

    …pache#38970)
    
    ### Rationale for this change
    
    We can get into a broken state with a working test compile in `nixlibs.R` but empty `PKG_LIBS` when pkg-config fails to find some libraries (e.g. libcurl on mac due to missing system stubs) in `configure`. This leads to a failed test compile in configure with pc errors silenced.
    
    ### What changes are included in this PR?
    
    Catch this and rerun the pkg-config-less library detection that should fix this in most cases.
    
    ### Are these changes tested?
    
    locally and on cran (where this error first surfaced)
    * Closes: apache#38902
    
    Lead-authored-by: Jacob Wujciak-Jens <[email protected]>
    Co-authored-by: Jonathan Keane <[email protected]>
    Signed-off-by: Jacob Wujciak-Jens <[email protected]>
    2 people authored and raulcd committed Dec 4, 2023
    Configuration menu
    Copy the full SHA
    e825485 View commit details
    Browse the repository at this point in the history
  5. apacheGH-38904:[R] Update news.md for 14.0.0.2 (apache#39022)

    ### Rationale for this change
    
    Update news.md
    
    ### Are these changes tested?
    no
    * Closes: apache#38904
    
    Authored-by: Jacob Wujciak-Jens <[email protected]>
    Signed-off-by: Jacob Wujciak-Jens <[email protected]>
    assignUser authored and raulcd committed Dec 4, 2023
    Configuration menu
    Copy the full SHA
    1fe7cd2 View commit details
    Browse the repository at this point in the history

Commits on Dec 6, 2023

  1. apacheGH-38345: [Release] Use local test data for verification if pos…

    …sible (apache#38362)
    
    ### Rationale for this change
    
    We have external test data repositories, apache/arrow-testing and apache/parquet-testing. We use them as submodule. apache/arrow may not use the latest test data repositories. But our verification script always use the latest test data repositories. It may cause test failures.
    
    ### What changes are included in this PR?
    
    Use local test data if they exist.
    
    ### Are these changes tested?
    
    Yes.
    
    ### Are there any user-facing changes?
    
    No.
    * Closes: apache#38345
    
    Authored-by: Sutou Kouhei <[email protected]>
    Signed-off-by: Raúl Cumplido <[email protected]>
    kou authored and raulcd committed Dec 6, 2023
    Configuration menu
    Copy the full SHA
    bc1ea6f View commit details
    Browse the repository at this point in the history
  2. apacheGH-39003: [CI][macOS] Don't update Homebrew (apache#39016)

    ### Rationale for this change
    
    It's better that we always use the latest Homebrew to check with the latest Homebrew that are used by most users. But it's difficult to maintain.
    
    ### What changes are included in this PR?
    
    We don't update Homebrew manually. GitHub hosted GitHub Actions Runners update Homebrew periodically. We depend on it instead of manual `brew update`.
    
    ### Are these changes tested?
    
    Yes.
    
    ### Are there any user-facing changes?
    
    No.
    * Closes: apache#39003
    
    Authored-by: Sutou Kouhei <[email protected]>
    Signed-off-by: Sutou Kouhei <[email protected]>
    kou authored and raulcd committed Dec 6, 2023
    Configuration menu
    Copy the full SHA
    44238dd View commit details
    Browse the repository at this point in the history
  3. apacheGH-38618: [C++] S3FileSystem: fix regression in deleting explic…

    …itly created sub-directories (apache#38845)
    
    ### Rationale for this change
    
    See apache#38618 (comment) and below for the analysis. When deleting the dir contents, we use a GetFileInfo with recursive FileSelector to list all objects to delete, but when doing that the file paths for directories don't end in a trailing `/`, so for deleting explicitly created directories we need to add the `kSep` here as well to properly delete the object.
    
    ### Are these changes tested?
    
    I tested them manually with an actual S3 bucket. The problem is that MinIO doesn't have the same problem, and so it's not actually tested with the test I added using our MinIO testing setup.
    
    ### Are there any user-facing changes?
    
    Fixes the regression
    * Closes: apache#38618
    
    Lead-authored-by: Joris Van den Bossche <[email protected]>
    Co-authored-by: Antoine Pitrou <[email protected]>
    Signed-off-by: Joris Van den Bossche <[email protected]>
    2 people authored and raulcd committed Dec 6, 2023
    Configuration menu
    Copy the full SHA
    ae8ea4d View commit details
    Browse the repository at this point in the history
  4. apacheGH-39041:[R] Improve update-checksum.R output (apache#39042)

    ### Rationale for this change
    
    The script was to quiet.
    
    ### What changes are included in this PR?
    
    Fix regex and add some output: 
    ```
    Rscript tools/update-checksums.R 14.0.0                                                                           1 ✘
    [1] "Extracting libarrow binary paths from tasks.yml"
    [1] "Downloading windows/arrow-14.0.0.zip.sha512"
    [1] "Converting windows/arrow-14.0.0.zip to windows style line endings"
    [1] "Downloading linux-openssl-1.0/arrow-14.0.0.zip.sha512"
    [1] "Downloading linux-openssl-1.1/arrow-14.0.0.zip.sha512"
    [1] "Downloading linux-openssl-3.0/arrow-14.0.0.zip.sha512"
    [1] "Downloading darwin-arm64-openssl-1.1/arrow-14.0.0.zip.sha512"
    [1] "Downloading darwin-arm64-openssl-3.0/arrow-14.0.0.zip.sha512"
    [1] "Downloading darwin-x86_64-openssl-1.1/arrow-14.0.0.zip.sha512"
    [1] "Downloading darwin-x86_64-openssl-3.0/arrow-14.0.0.zip.sha512"
    [1] "Checksums updated successfully!"
    ```
    
    ### Are these changes tested?
    locally 
    
    ### Are there any user-facing changes?
    no
    * Closes: apache#39041
    
    Authored-by: Jacob Wujciak-Jens <[email protected]>
    Signed-off-by: Jacob Wujciak-Jens <[email protected]>
    assignUser authored and raulcd committed Dec 6, 2023
    Configuration menu
    Copy the full SHA
    84e72b1 View commit details
    Browse the repository at this point in the history
  5. apacheGH-39076: [R] Fix tests that trigger confusing dplyr warnings (a…

    …pache#39077)
    
    ### Rationale for this change
    
    Running our test suite results in many spurious warnings being printed that make it difficult to spot actual warnings.
    
    ### What changes are included in this PR?
    
    The data used for specific tests involving `summarise()` was updated to not trigger the warnings.
    
    ### Are these changes tested?
    
    Yes
    
    ### Are there any user-facing changes?
    
    No
    * Closes: apache#39076
    
    Authored-by: Dewey Dunnington <[email protected]>
    Signed-off-by: Dewey Dunnington <[email protected]>
    paleolimbot authored and raulcd committed Dec 6, 2023
    Configuration menu
    Copy the full SHA
    7802c03 View commit details
    Browse the repository at this point in the history
  6. apacheGH-39072: [Release][CI] Python3.11-devel is required for the ve…

    …rification job on AlmaLinux 8 (apache#39073)
    
    ### Rationale for this change
    
    The verification task for Almalinux 8 was failing.
    
    ### What changes are included in this PR?
    
    Add required python3.11-devel to the Docker image.
    
    ### Are these changes tested?
    
    Yes via archery task.
    
    ### Are there any user-facing changes?
    
    No
    
    * Closes: apache#39072
    
    Authored-by: Raúl Cumplido <[email protected]>
    Signed-off-by: Sutou Kouhei <[email protected]>
    raulcd committed Dec 6, 2023
    Configuration menu
    Copy the full SHA
    9123615 View commit details
    Browse the repository at this point in the history

Commits on Dec 9, 2023

  1. apacheGH-39074: [Release][Packaging] Use UTF-8 explicitly for KEYS (a…

    …pache#39082)
    
    ### Rationale for this change
    
    `KEYS` may have UTF-8 (non ASCII) characters. Ruby chooses the default encoding based on `LANG`. If `LANG=C`, Ruby uses the `US-ASCII` encoding as the default encoding. If Ruby uses the `US-ASCII` encoding, we can't process `KEYS` because it has non ASCII characters.
    
    ### What changes are included in this PR?
    
    Use the `UTF-8` encoding explicitly for `KEYS`. If we specify the `UTF-8` encoding explicitly, our `KEYS` processing don't depend on `LANG`.
    
    ### Are these changes tested?
    
    Yes.
    
    ### Are there any user-facing changes?
    
    No.
    * Closes: apache#39074
    
    Authored-by: Sutou Kouhei <[email protected]>
    Signed-off-by: Sutou Kouhei <[email protected]>
    kou authored and raulcd committed Dec 9, 2023
    Configuration menu
    Copy the full SHA
    7a54881 View commit details
    Browse the repository at this point in the history

Commits on Dec 12, 2023

  1. apacheGH-38449: [Release][Go][macOS] Use local test data if possible (a…

    …pache#38450)
    
    ### Rationale for this change
    
    On macOS, "cp -a source/ destination/" copies "source/*" to "destination/" (such as "source/a" is copied to "destination/a") not "source/" to "destination/" (such as "source/a" is copied to "destination/source/a").
    
    ### What changes are included in this PR?
    
    We need to remove the trailing "/" from "source/" to copy "source/" itself to "destination/source/".
    
    ### Are these changes tested?
    
    Yes.
    
    ### Are there any user-facing changes?
    
    No.
    * Closes: apache#38449
    
    Authored-by: Sutou Kouhei <[email protected]>
    Signed-off-by: Raúl Cumplido <[email protected]>
    kou authored and raulcd committed Dec 12, 2023
    Configuration menu
    Copy the full SHA
    8fc81ce View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    b3b5307 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    6dcedc9 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    740889f View commit details
    Browse the repository at this point in the history

Commits on Jun 24, 2024

  1. Configuration menu
    Copy the full SHA
    e234e04 View commit details
    Browse the repository at this point in the history