[INLONG-10463][SDK] Optimization of ultra-long field processing in InlongSDK #11119

qy-liuhuo · 2024-09-15T12:11:46Z

Motivation

Due to dataproxy limitations, a CONNECTION_BREAK error will occur when sending too much data through the SDK. So this PR solves the following problems：

Support automatic ultra-long data truncation in SDK
User-configurable automatic truncation
Provide the default value of the allowed data length

Modifications

Added a MAX_MESSAGE_LENGTH constant in inlong-sdk/dataproxy-sdk/src/main/java/org/apache/inlong/sdk/dataproxy/ConfigConstants.java.
Provided a DataTruncationUtil to implement data truncation.
Added the enableDataTruncation property to DefaultMessageSender.java to determine whether to enable the automatic truncation function. And added a configuration function for users to call.
Add data truncation logic to all sendMessage and asyncSendMessage interfaces.

Now you can open the automatic truncation function by sender.enableDataTruncation(true);

And you can also change the default MAX_MESSAGE_LENGTH constant at ConfigConstants.java file

Verifying this change

(Please pick either of the following options)

This change is a trivial rework/code cleanup without any test coverage.
This change is already covered by existing tests, such as:
(please describe tests)
This change added tests and can be verified as follows:

(example:)
- Added integration tests for end-to-end deployment with large payloads (10MB)
- Extended integration test for recovery after broker failure

Documentation

Does this pull request introduce a new feature? (yes / no)
If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
If a feature is not applicable for documentation, explain why?
If a feature is not documented yet in this PR, please create a follow-up issue for adding the documentation

…longSDK

gosonzhang · 2024-09-19T03:13:48Z

...ng-sdk/dataproxy-sdk/src/main/java/org/apache/inlong/sdk/dataproxy/DefaultMessageSender.java

@@ -218,6 +226,9 @@ public SendResult sendMessage(byte[] body, String groupId, String streamId, long
     */
    public SendResult sendMessage(byte[] body, String groupId, String streamId, long dt, String msgUUID,
            long timeout, TimeUnit timeUnit, boolean isProxySend) {
+        if (enableDataTruncation) {
+            body = truncateData(body);


If a truncation is made, how can we ensure that the transmitted data is the user-reported data?

Sorry, do you mean to add data verification function?

The data content reported by the user cannot be modified

gosonzhang

How to successfully parse incomplete data?

...ng-sdk/dataproxy-sdk/src/main/java/org/apache/inlong/sdk/dataproxy/DefaultMessageSender.java

qy-liuhuo · 2024-09-19T03:42:50Z

How to successfully parse incomplete data?

In my opinion, if the transmitted content cannot be parsed after truncation, then the user should not configure the function to allow truncation

qy-liuhuo · 2024-09-19T05:53:56Z

How to successfully parse incomplete data?

Now when the configuration does not allow truncation, if the body exceeds the limit, it will be rejected directly.

Specifically for a body of type byte[], SendResult.BODY_EXCEED_MAX_LEN will be returned directly. And for List<byte[]> type bodies, they are truncated at the body granularity, which ensures the integrity of each body.

gosonzhang · 2024-09-19T09:29:19Z

How to successfully parse incomplete data?

Now when the configuration does not allow truncation, if the body exceeds the limit, it will be rejected directly.

Specifically for a body of type byte[], SendResult.BODY_EXCEED_MAX_LEN will be returned directly. And for List<byte[]> type bodies, they are truncated at the body granularity, which ensures the integrity of each body.

In any case, the SDK should not set truncation, otherwise this operation will modify the data content reported by the business and cause errors

qy-liuhuo · 2024-09-19T13:35:01Z

How to successfully parse incomplete data?

Now when the configuration does not allow truncation, if the body exceeds the limit, it will be rejected directly.
Specifically for a body of type byte[], SendResult.BODY_EXCEED_MAX_LEN will be returned directly. And for List<byte[]> type bodies, they are truncated at the body granularity, which ensures the integrity of each body.

In any case, the SDK should not set truncation, otherwise this operation will modify the data content reported by the business and cause errors

If it cannot be truncated, is this issue unnecessary?

gosonzhang · 2024-09-20T08:19:52Z

How to successfully parse incomplete data?

Now when the configuration does not allow truncation, if the body exceeds the limit, it will be rejected directly.
Specifically for a body of type byte[], SendResult.BODY_EXCEED_MAX_LEN will be returned directly. And for List<byte[]> type bodies, they are truncated at the body granularity, which ensures the integrity of each body.

In any case, the SDK should not set truncation, otherwise this operation will modify the data content reported by the business and cause errors

If it cannot be truncated, is this issue unnecessary?

It cannot be truncated, this is the premise. How to check whether it is too long requires analysis to see how to modify it appropriately.

qy-liuhuo · 2024-09-21T03:16:42Z

How to successfully parse incomplete data?

Now when the configuration does not allow truncation, if the body exceeds the limit, it will be rejected directly.
Specifically for a body of type byte[], SendResult.BODY_EXCEED_MAX_LEN will be returned directly. And for List<byte[]> type bodies, they are truncated at the body granularity, which ensures the integrity of each body.

In any case, the SDK should not set truncation, otherwise this operation will modify the data content reported by the business and cause errors

If it cannot be truncated, is this issue unnecessary?

It cannot be truncated, this is the premise. How to check whether it is too long requires analysis to see how to modify it appropriately.

I have fixed it, now if the data is too long, it will directly return BODY_EXCEED_MAX_LEN error

qy-liuhuo added 2 commits September 15, 2024 19:52

[INLONG-10463][SDK] Optimization of ultra-long field processing in In…

99d2bd1

…longSDK

[INLONG-10463][SDK] Rename the name of truncation config function

90186c0

github-actions bot added the component/sdk label Sep 15, 2024

dockerzhang requested review from luchunliang, vernedeng and aloyszhang September 17, 2024 02:58

vernedeng previously approved these changes Sep 18, 2024

View reviewed changes

dockerzhang requested a review from gosonzhang September 19, 2024 03:03

luchunliang previously approved these changes Sep 19, 2024

View reviewed changes

gosonzhang reviewed Sep 19, 2024

View reviewed changes

gosonzhang requested changes Sep 19, 2024

View reviewed changes

luchunliang reviewed Sep 19, 2024

View reviewed changes

...ng-sdk/dataproxy-sdk/src/main/java/org/apache/inlong/sdk/dataproxy/DefaultMessageSender.java Outdated Show resolved Hide resolved

luchunliang self-requested a review September 19, 2024 03:33

[INLONG-10463][SDK] Add rejection logic

431ee81

qy-liuhuo dismissed stale reviews from luchunliang and vernedeng via 431ee81 September 19, 2024 05:48

aloyszhang assigned qy-liuhuo Sep 20, 2024

[INLONG-10463][SDK] Remove the logic of truncating ultra-long data

dffb135

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[INLONG-10463][SDK] Optimization of ultra-long field processing in InlongSDK #11119

[INLONG-10463][SDK] Optimization of ultra-long field processing in InlongSDK #11119

qy-liuhuo commented Sep 15, 2024

gosonzhang Sep 19, 2024

qy-liuhuo Sep 19, 2024

gosonzhang Sep 19, 2024

gosonzhang left a comment

qy-liuhuo commented Sep 19, 2024

qy-liuhuo commented Sep 19, 2024

gosonzhang commented Sep 19, 2024

qy-liuhuo commented Sep 19, 2024

gosonzhang commented Sep 20, 2024

qy-liuhuo commented Sep 21, 2024

[INLONG-10463][SDK] Optimization of ultra-long field processing in InlongSDK #11119

Are you sure you want to change the base?

[INLONG-10463][SDK] Optimization of ultra-long field processing in InlongSDK #11119

Conversation

qy-liuhuo commented Sep 15, 2024

Motivation

Modifications

Verifying this change

Documentation

gosonzhang Sep 19, 2024

Choose a reason for hiding this comment

qy-liuhuo Sep 19, 2024

Choose a reason for hiding this comment

gosonzhang Sep 19, 2024

Choose a reason for hiding this comment

gosonzhang left a comment

Choose a reason for hiding this comment

qy-liuhuo commented Sep 19, 2024

qy-liuhuo commented Sep 19, 2024

gosonzhang commented Sep 19, 2024

qy-liuhuo commented Sep 19, 2024

gosonzhang commented Sep 20, 2024

qy-liuhuo commented Sep 21, 2024