Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare Unified codebase for Spark ( Iceberg ) Support #77

Merged
merged 16 commits into from
Aug 30, 2024

Conversation

ilias1111
Copy link
Collaborator

@ilias1111 ilias1111 commented Aug 14, 2024

Description

  1. Universal Changes
    • Replaced the QUALIFY clause with PostgreSQL-compatible logic to ensure consistent behavior across different target types.
    • Modified the casting syntax to use CAST(x AS data_type) for Spark compatibility.
    • Adapted integration tests by creating Spark-specific source files to accommodate Spark's syntax requirements and limitations.
    • Configured the incremental strategy based on the target type, using 'delete+insert' for Postgres and Redshift, and 'merge' for Spark.
    • Adjusted the handling of ROW_NUMBER() in tests to account for Spark's non-deterministic behavior.
  2. Snowplow Utils Integration
    • Snowplow Unified now leverages the updated utility functions, macros, and configurations provided by Snowplow Utils to ensure compatibility and optimal performance when running on Spark with Iceberg.

What type of PR is this? (check all applicable)

  • 🍕 Feature
  • 🐛 Bug Fix
  • 📝 Documentation Update
  • 🎨 Style
  • 🧑‍💻 Code Refactor
  • 🔥 Performance Improvements
  • ✅ Test
  • 🤖 Build
  • 🔁 CI
  • 📦 Chore (Release)
  • ⏩ Revert

Related Tickets & Documents

Checklist

  • 💣 Is your change a breaking change?
  • 📖 I have updated the CHANGELOG.md

Added tests?

  • 👍 yes
  • 🙅 no, because they aren't needed
  • 🙋 no, because I need help

Added to documentation?

  • 📓 internal package docs (ymls, macros, readme, if applicable)
  • 📕 I have raised a Snowplow documentation PR if applicable (Link here)
  • 🙅 no documentation needed

[optional] What gif best describes this PR or how it makes you feel?

I like big data

@ilias1111 ilias1111 requested a review from a team as a code owner August 14, 2024 12:22
@ilias1111 ilias1111 changed the title Full changes Spark PR Aug 14, 2024
@ilias1111 ilias1111 changed the title Spark PR Prepare Unified codebase for Spark ( Iceberg ) Support Aug 19, 2024
Copy link
Collaborator

@agnessnowplow agnessnowplow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like for utils, leaving some comments for clarification, nothing else sticks out for now.

dbt_project.yml Outdated Show resolved Hide resolved
integration_tests/dbt_project.yml Outdated Show resolved Hide resolved
macros/field_extractions/get_cwv_fields.sql Outdated Show resolved Hide resolved
integration_tests/.scripts/integration_test.sh Outdated Show resolved Hide resolved
dbt_project.yml Outdated Show resolved Hide resolved
dbt_project.yml Show resolved Hide resolved
Copy link
Collaborator

@agnessnowplow agnessnowplow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tentative approval, same as with utils, let's agree on the incremental_stategy (be explicit or permissive) for the release and also investigate why bigquery / redshift started failing. Most likely not related to the spark prep (utils was not linked to the branch for redshift at least) hence the ok for now.

@ilias1111 ilias1111 merged commit b53d0a7 into Release/snowplow-unified/0.5.0 Aug 30, 2024
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants