Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using force_batch=true with incremental models, it will fail if there is no data to write #519

Open
parsable-alex-antonison opened this issue Nov 21, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@parsable-alex-antonison
Copy link

parsable-alex-antonison commented Nov 21, 2023

dbt version: 1.7.0
use case: I am processing CDC data from a database into Athena Iceberg tables.
issue: While I can successfully run the first load of the query, when I go to run an incremental model without any data, it will fail when it is attempting to query a table that does not exist.

  1. Attempt to create a tmp table
create table "awsdatacatalog"."databasename"."tablename__dbt_tmp__tmp_not_partitioned"
.... (query that is looking for new records in the source table from the target table. it will return nothing because no additional data has been added to the source table)
  1. Attempt to query partitions from the temp table
select distinct partition_1, partition_2 from "awsdatacatalog"."databasename"."tablename__dbt_tmp__tmp_not_partitioned" 
order by team_partition, team_id
  1. And this last step is where it fails
select distinct partition_1, partition_2 from ("awsdatacatalog"."databasename"."tablename__dbt_tmp")
order by team_partition, team_id
@nicor88
Copy link
Contributor

nicor88 commented Nov 21, 2023

Adding a bit more context.
The issue occur also for simple models like:

{{ config(
    materialized='incremental',
    incremental_strategy='merge',
    unique_key='user_id',
    table_type='iceberg',
   force_batch='true'
) }}

select 1 as user_id, 'a' as user_name

The issue seems to be in get_partition_batches and create_table_as_with_partitions

Here are some possible solutions:

  • if the underlying dataset is empty - create_table_as_with_partitions can create an empty table from SQL - no hurt/no pain - it's possible to check if __tmp_not_partitioned is empty and if it's the case, we just create an empty relation as tmp table.
  • if the user try to use force_batch with not partitioned tables -> we have 2 options:
    • all works normally and force_batch it's just ignored, e.g. pass partitioning(bool) to safe_create_table_as
    • we raise an error and ask the user to remove force_batch from not partitioned tables
  • if the user works with partitioned tables by the data is not more than 100 partitions partition batches everything should still work. - must be tested

@svdimchenko FYI.

@nicor88 nicor88 added the bug Something isn't working label Nov 21, 2023
@antonysouthworth-halter
Copy link
Contributor

Oh, I didn't see this when I searched before opening #656 , will close in favour of this one.

@antonysouthworth-halter
Copy link
Contributor

I have a fix in progress on #658

4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants