Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HashJoinBatch does not support schema changes #2941

Open
egasimov opened this issue Sep 10, 2024 · 0 comments
Open

HashJoinBatch does not support schema changes #2941

egasimov opened this issue Sep 10, 2024 · 0 comments
Labels

Comments

@egasimov
Copy link

egasimov commented Sep 10, 2024

Hello Drill community,
Recently, we have encountered with the following issue( UNSUPPORTED_OPERATION ERROR: HashJoinBatch does not support schema changes in build side. Prior schema), when joining two CTEs.

There two CTEs which having filter conditions from list of items. and need to join based on given condition.
it seems, after join operation, schema changes for field filtered__items1
[filtered__items1 (NULL:OPTIONAL)]]
[filtered__items1 (MAP:REPEATED)

Executed queries:

WITH cte_1 AS (
  SELECT t.customer_id as customer_id1, t.customer_name as customer_name1,
         COLLECT_LIST('val', t.p_item) filtered__items1
  FROM (
    SELECT d.customer_id, d.customer_name, flatten(d.purchased_items) AS p_item 
    FROM dfs.root.`/datas3/customers/*` d
  ) t
  WHERE t.p_item['product_id'] IN (777, 888)
  GROUP BY t.customer_id, t.customer_name
),

cte_2 AS (
  SELECT t.customer_id as customer_id2, t.customer_name as customer_name2,
         COLLECT_LIST('val', t.p_item) filtered__items2
  FROM (
    SELECT d.customer_id, d.customer_name, flatten(d.purchased_items) AS p_item 
    FROM dfs.root.`/datas3/customers/*` d
  ) t
  WHERE t.p_item['product_id'] IN (999)
  GROUP BY t.customer_id, t.customer_name
)

SELECT cte_1.*
FROM cte_1
JOIN cte_2 ON cte_1.customer_id1 = cte_2.customer_id2;

Error details:

-------------------------------------------------------------------------------------------------------
org.apache.drill.common.exceptions.UserRemoteException: UNSUPPORTED_OPERATION ERROR: HashJoinBatch does not support schema changes in build side.
Prior schema : 
BatchSchema [fields=[[`customer_id1` (BIGINT:OPTIONAL)], [`customer_name1` (VARCHAR:OPTIONAL)], [`filtered__items1` (NULL:OPTIONAL)]], selectionVector=NONE]
New schema : 
BatchSchema [fields=[[`customer_id1` (BIGINT:OPTIONAL)], [`customer_name1` (VARCHAR:OPTIONAL)], [`filtered__items1` (MAP:REPEATED), children=([`val` (MAP:REQUIRED), children=([`item_id` (BIGINT:OPTIONAL)], [`item_class` (VARCHAR:OPTIONAL)], [`product_id` (BIGINT:OPTIONAL)], [`created_at` (VARCHAR:OPTIONAL)])])]], selectionVector=NONE]

Unsupported schema change
Fragment: 0:0

[Error Id: 6b857b70-583e-4fb0-ab02-d1e6b536c569 on 8a288c59e4a7:31010]
------------------------------------------------------------------------

Drill version
Apache drill version: 1.21.2

Additional context
You may download as parquet file from here
Test data

[
    {
    "customer_id": 1000001,
    "customer_name": "John Doe",
    "purchased_items": [
        {
            "item_id": 2000001,
            "item_class": "A",
            "product_id": 777,
            "created_at": "2024-06-12T11:36:37.751Z"
        },
        {
            "item_id": 2000002,
            "item_class": "B",
            "product_id": 888,
            "created_at": "2024-06-12T08:46:37.751Z"
        },
        {
            "item_id": 2000003,  # Corrected item_id for uniqueness
            "item_class": "C",
            "product_id": 999,
            "created_at": "2024-06-12T11:56:00.751Z"
        }
    ]
    },
    {
    "customer_id": 1000002,
    "customer_name": "Black Smith",
    "purchased_items": [
        {
            "item_id": 2000004,
            "item_class": "A",
            "product_id": 777,
            "created_at": "2024-06-12T11:36:33.751Z"
        },
        {
            "item_id": 2000006,  # Corrected item_id for uniqueness
            "item_class": "C",
            "product_id": 999,
            "created_at": "2024-08-12T11:56:37.751Z"
        }
    ]
    },
        {
    "customer_id": 1000003,
    "customer_name": "Alice Doe",
    "purchased_items": [
        {
            "item_id": 2000010,
            "item_class": "A",
            "product_id": 777,
            "created_at": "2024-01-12T11:36:37.751Z"
        },
        {
            "item_id": 2000011,  # Corrected item_id for uniqueness
            "item_class": "C",
            "product_id": 888,
            "created_at": "2024-04-12T11:56:37.751Z"
        }
    ]
    }
]

@egasimov egasimov added the bug label Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant