-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] S3 Deltastreamer: Block has already been inflated #6428
Comments
@yihua can you take a look, seems like a metadata related issue? |
Update: I got it working on an older version of Hudi 0.10.1, so seems like a regression |
@dyang108 : is this happening infrequently? or your pipeline is just stuck. We have unit tests, integration tests for metadata table and we haven't this this issue yet. trying to gauge whats diff in your env or set up. |
This happened consistently with the command above every time i ran on Hudi version : 0.12.0 (also tried 0.11.1) The pipeline failed and exited when I saw this issue. |
We have the same stacktrace when running on hudi version 0.11.0, spark 3.2.1, EMR 6.7. We have metadata service enabled and our Spark Streaming Query fails each time. This is a COW table |
@nsivabalan What might be a general workaround in that situation to unblock the processing? Of course it depends on the root cause. However will deleting and recreating metadata from hudi-cli help ? One other option might be to disable metadata on the current table and proceed. |
got it. did you mean, you are using EMR's spark or oss spark? I understand its EMR cluster. |
@kasured to unblock the processing, could you try disabling and deleting the metadata table by setting |
@yihua Yes that helped. However I can assume that the same can be done with hudi-cli as I wrote before. medatada delete and metadata create @nsivabalan Yes we are using amazon bundle for Spark 3.2.1 which is provided by EMR 6.7 |
yes, you are right. you can disable via hudi-cli as well. |
I saw this issue with Spark on mesos (on EC2), not EMR Spark |
Hi , Is there any resolution for this issue yet or any idea by which release this issue can be fixed ?
When metadata is enabled Bulk Insert works fine , but Upsert Aborts with "Caused by: java.lang.IllegalStateException: Block has already been inflated" My test cases mostly depend on Metadata , so I need it to be enabled. Please let me know if there is any other workaround. Thank you ! |
cc @nsivabalan to look into this issue, thanks. |
Is a fix for this issue planned to be regressed into 13.0 or a 12.x patch release? |
I got the same issue. |
Hit this when using Flink 1.16 and Hudi bdb50dd on EKS. Metadata table was
|
right, this seems obviously flawed, it is hiding the actual IO Exception, instead throwing an irrevelant block inflated. |
@zinking Can you fire a fix for it. |
Hey folks, This issue: https://gist.github.com/envomp/268bdd35a3b3399db59583c0e159c229#file-cover-logs Which in turn was caused by TIMELINE_SERVER_BASED marker types being unable when using spark structured streaming. Workaround was to disable metadata table. |
@envomp Are you setting |
Hey @ad1happy2go We have the following s3a configurations:
Also tried setting fs.s3a.connection.maximum to 8096 but the issue persisted. EDIT: For a table with smaller volume this is how disabling metadata table affected the app duraton time: |
same problem, version 0.14.1, on hdfs . |
The issue of |
Describe the problem you faced
Deltastreamer with write output to S3 exits unexpectedly when running in continuous mode.
To Reproduce
Steps to reproduce the behavior:
I ran the following:
the /etc/spark/work-dir/ looks like this:
aws-java-sdk-bundle-1.12.283.jar hadoop-aws-2.6.5.jar hudi-utilities-bundle_2.11-0.12.0.jar scala-library-2.11.12.jar spark-streaming-kafka-0-10_2.11-2.4.8.jar
Expected behavior
I don't expect there to be issues on compaction here.
Environment Description
Hudi version : 0.12.0 (also tried 0.11.1)
Spark version : 2.4.8
Hive version :
Hadoop version : 2.6.5
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : Yes, docker on Mesos
I'm reading from an Avro kafka topic
Additional context
Add any other context about the problem here.
Reading Avro record from Kafka
Stacktrace
The text was updated successfully, but these errors were encountered: