Skip to content

Releases: GoogleCloudDataproc/spark-bigquery-connector

0.30.0

11 Apr 16:12
Compare
Choose a tag to compare
  • New connectors are out of preview and are now generally available! This includes all the new
    connectors: spark-2.4-bigquery, spark-3.1-bigquery, spark-3.2-bigquery and spark-3.3-bigquery are GA and ready to be used in all workloads. Please
    refer to the compatibility matrix
    when using them.
  • Direct write method is out of preview and is now generally available!
  • spark-bigquery-with-dependencies_2.11 is no longer published. If a recent version of the Scala
    2.11 connector is needed, it can be built by checking out the code and running
    ./mvnw install -Pdsv1_2.11.
  • Issue #522: Supporting Spark's Map type. Notice there are few restrictions as this is not a
    BigQuery native type.
  • Added support for reading BigQuery table snapshots.
  • BigQuery API has been upgraded to version 2.24.4
  • BigQuery Storage API has been upgraded to version 2.34.2
  • GAX has been upgraded to version 2.24.0
  • gRPC has been upgraded to version 1.54.0
  • Netty has been upgraded to version 4.1.90.Final
  • PR #944: Added support to set query job priority

0.29.0

03 Mar 19:54
Compare
Choose a tag to compare
  • Added two new connectors, spark-3.2-bigquery and spark-3.3-bigquery aimed to be used in Spark 3.2 and 3.3
    respectively. Those connectors implement new APIs and capabilities provided by the Spark Data Source V2 API. Both
    connectors are in preview mode.
  • Dynamic partition pruning is supported in preview mode by spark-3.2-bigquery and spark-3.3-bigquery.
  • This is the last version of the Spark BigQuery connector for scala 2.11. The code will remain in the repository and
    can be compiled into a connector if needed.
  • PR #857: Fixing autovalue shaded classes repackaging
  • BigQuery API has been upgraded to version 2.22.0
  • BigQuery Storage API has been upgraded to version 2.31.0
  • GAX has been upgraded to version 2.23.0
  • gRPC has been upgraded to version 1.53.0
  • Netty has been upgraded to version 4.1.89.Final

0.28.1

28 Feb 00:46
Compare
Choose a tag to compare

PR #904: Fixing premature client closing in certain cases, which causes RejectedExecutionException to be thrown

0.28.0

10 Jan 01:15
Compare
Choose a tag to compare
  • Adding support for the JSON data type.
    Thanks to @abhijeet-lele and @jonathan-ostrander for their contributions!
  • Issue #821: Fixing direct write of empty DataFrames
  • PR #832: Fixed client closing
  • Issue #838: Fixing unshaded artifacts
  • PR #848: Making schema comparison on write less strict
  • PR #852: fixed enableListInference usage when using the default intermediate format
  • Jackson has been upgraded to version 2.14.1, addressing CVE-2022-42003
  • BigQuery API has been upgraded to version 2.20.0
  • BigQuery Storage API has been upgraded to version 2.27.0
  • GAX has been upgraded to version 2.20.1
  • Guice has been upgraded to version 5.1.0
  • gRPC has been upgraded to version 1.51.1
  • Netty has been upgraded to version 4.1.86.Final
  • Protocol Buffers has been upgraded to version 3.21.12

0.27.1

18 Oct 22:09
Compare
Choose a tag to compare
  • PR #792: Added ability to set table labels while writing to a BigQuery table
  • PR #796: Allowing custom BigQuery API endpoints
  • PR #803: Removed grpc-netty-shaded from the connector jar
  • Protocol Buffers has been upgraded to version 3.21.7, addressing CVE-2022-3171
  • BigQuery API has been upgraded to version 2.16.1
  • BigQuery Storage API has been upgraded to version 2.21.0
  • gRPC has been upgraded to version 1.49.1
  • Netty has been upgraded to version 4.1.82.Final

0.27.0

21 Sep 20:08
Compare
Choose a tag to compare
  • Added new Scala 2.13 connector, aimed at Spark versions from 3.2 and above
  • PR #750: Adding support for custom access token creation. See more here.
  • PR #745: Supporting load from query in spark-3.1-bigquery.
  • PR #767: Adding the option createReadSessionTimeoutInSeconds, to override the timeout for CreateReadSession.

0.26.0

18 Jul 17:44
Compare
Choose a tag to compare
  • All connectors support the DIRECT write method, using the BigQuery Storage Write API,
    without first writing the data to GCS. DIRECT write method is in preview mode.
  • spark-3.1-bigquery has been released in preview mode. This is a Java only library,
    implementing the Spark 3.1 DataSource v2 APIs.
  • BigQuery API has been upgraded to version 2.13.8
  • BigQuery Storage API has been upgraded to version 2.16.0
  • gRPC has been upgraded to version 1.47.0
  • Netty has been upgraded to version 4.1.79.Final

0.25.2

23 Jun 00:55
Compare
Choose a tag to compare
  • PR #673: Added integration tests for BigLake external tables.
  • PR #674: Increasing default maxParallelism to 10K for BigLake external tables

0.25.1

13 Jun 21:33
Compare
Choose a tag to compare
  • Issue #651: Fixing the write back to BigQuery.
  • PR #664: Add support for BigLake external tables.
  • PR #667: Allowing clustering on unpartitioned tables.
  • PR #668: Using spark default parallelism as default.

0.25.0

31 May 21:57
Compare
Choose a tag to compare
  • Issue #593: Allow users to disable cache when loading data via SQL query,
    by setting cacheExpirationTimeInMinutes=0
  • PR #613: Added field level schema checks. This can be disabled by setting
    enableModeCheckForSchemaFields=false
  • PR #618: Added support for the enableListInterface option. This allows to
    use parquet as an intermediate format also for arrays, without adding the
    list element in the resulting schema as described
    here
  • PR #641: Removed Conscrypt from the shaded artifact in order to improve
    compatibility with Dataproc Serverless and with clusters where Conscrypt is
    disabled.
  • BigQuery API has been upgraded to version 2.10.6
  • BigQuery Storage API has been upgraded to version 2.12.0
  • gRPC has been upgraded to version 1.46.0
  • Netty has been upgraded to version 4.1.75.Final