Prepare Unified codebase for Spark ( Iceberg ) Support #77

ilias1111 · 2024-08-14T12:22:45Z

Description

Universal Changes
- Replaced the QUALIFY clause with PostgreSQL-compatible logic to ensure consistent behavior across different target types.
- Modified the casting syntax to use CAST(x AS data_type) for Spark compatibility.
- Adapted integration tests by creating Spark-specific source files to accommodate Spark's syntax requirements and limitations.
- Configured the incremental strategy based on the target type, using 'delete+insert' for Postgres and Redshift, and 'merge' for Spark.
- Adjusted the handling of ROW_NUMBER() in tests to account for Spark's non-deterministic behavior.
Snowplow Utils Integration
- Snowplow Unified now leverages the updated utility functions, macros, and configurations provided by Snowplow Utils to ensure compatibility and optimal performance when running on Spark with Iceberg.

What type of PR is this? (check all applicable)

Related Tickets & Documents

Checklist

💣 Is your change a breaking change?
📖 I have updated the CHANGELOG.md

Added tests?

👍 yes
🙅 no, because they aren't needed
🙋 no, because I need help

Added to documentation?

📓 internal package docs (ymls, macros, readme, if applicable)
📕 I have raised a Snowplow documentation PR if applicable (Link here)
🙅 no documentation needed

[optional] What gif best describes this PR or how it makes you feel?

agnessnowplow

Just like for utils, leaving some comments for clarification, nothing else sticks out for now.

dbt_project.yml

integration_tests/dbt_project.yml

macros/field_extractions/get_cwv_fields.sql

integration_tests/.scripts/integration_test.sh

dbt_project.yml

agnessnowplow

Tentative approval, same as with utils, let's agree on the incremental_stategy (be explicit or permissive) for the release and also investigate why bigquery / redshift started failing. Most likely not related to the spark prep (utils was not linked to the branch for redshift at least) hence the ok for now.

Full changes

f78cff9

ilias1111 requested a review from a team as a code owner August 14, 2024 12:22

Merge branch 'Release/snowplow-unified/0.5.0' into support_spark_vol_1

1b52257

ilias1111 changed the title ~~Full changes~~ Spark PR Aug 14, 2024

ilias1111 added 7 commits August 16, 2024 15:27

Spark Implementation

e1482ac

Move folder

b80eec9

Update pr_tests.yml

62d0019

Update on timestamp_diff

60a0224

Update project to return to production level

4324998

Update snowplow_unified_sessions_this_run.sql

b29e436

Unified updates

4d4a8a1

ilias1111 changed the title ~~Spark PR~~ Prepare Unified codebase for Spark ( Iceberg ) Support Aug 19, 2024

agnessnowplow reviewed Aug 20, 2024

View reviewed changes

ilias1111 added 7 commits August 22, 2024 16:57

Revert comments

d721b77

Pass the last test

aa56e9e

Check incremental strategy change

14c05b4

incremental_strategy changes

b6d5954

Update dbt_project.yml

8b6e67d

Update dbt_project.yml

c3a430f

Update packages.yml

0e4dfad

agnessnowplow approved these changes Aug 30, 2024

View reviewed changes

ilias1111 merged commit b53d0a7 into Release/snowplow-unified/0.5.0 Aug 30, 2024
5 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prepare Unified codebase for Spark ( Iceberg ) Support #77

Prepare Unified codebase for Spark ( Iceberg ) Support #77

ilias1111 commented Aug 14, 2024 •

edited

Loading

agnessnowplow left a comment

agnessnowplow left a comment

Prepare Unified codebase for Spark ( Iceberg ) Support #77

Prepare Unified codebase for Spark ( Iceberg ) Support #77

Conversation

ilias1111 commented Aug 14, 2024 • edited Loading

Description

What type of PR is this? (check all applicable)

Related Tickets & Documents

Checklist

Added tests?

Added to documentation?

[optional] What gif best describes this PR or how it makes you feel?

agnessnowplow left a comment

Choose a reason for hiding this comment

agnessnowplow left a comment

Choose a reason for hiding this comment

ilias1111 commented Aug 14, 2024 •

edited

Loading