Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INLONG-10463][SDK] Optimization of ultra-long field processing in InlongSDK #11119

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

qy-liuhuo
Copy link
Contributor

Fixes #10463

Motivation

Due to dataproxy limitations, a CONNECTION_BREAK error will occur when sending too much data through the SDK. So this PR solves the following problems:

  1. Support automatic ultra-long data truncation in SDK
  2. User-configurable automatic truncation
  3. Provide the default value of the allowed data length

Modifications

  1. Added a MAX_MESSAGE_LENGTH constant in inlong-sdk/dataproxy-sdk/src/main/java/org/apache/inlong/sdk/dataproxy/ConfigConstants.java.
  2. Provided a DataTruncationUtil to implement data truncation.
  3. Added the enableDataTruncation property to DefaultMessageSender.java to determine whether to enable the automatic truncation function. And added a configuration function for users to call.
  4. Add data truncation logic to all sendMessage and asyncSendMessage interfaces.

Now you can open the automatic truncation function by sender.enableDataTruncation(true);

And you can also change the default MAX_MESSAGE_LENGTH constant at ConfigConstants.java file

Verifying this change

(Please pick either of the following options)

  • This change is a trivial rework/code cleanup without any test coverage.

  • This change is already covered by existing tests, such as:
    (please describe tests)

  • This change added tests and can be verified as follows:

    (example:)

    • Added integration tests for end-to-end deployment with large payloads (10MB)
    • Extended integration test for recovery after broker failure

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
  • If a feature is not applicable for documentation, explain why?
  • If a feature is not documented yet in this PR, please create a follow-up issue for adding the documentation

vernedeng
vernedeng previously approved these changes Sep 18, 2024
luchunliang
luchunliang previously approved these changes Sep 19, 2024
@@ -218,6 +226,9 @@ public SendResult sendMessage(byte[] body, String groupId, String streamId, long
*/
public SendResult sendMessage(byte[] body, String groupId, String streamId, long dt, String msgUUID,
long timeout, TimeUnit timeUnit, boolean isProxySend) {
if (enableDataTruncation) {
body = truncateData(body);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a truncation is made, how can we ensure that the transmitted data is the user-reported data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, do you mean to add data verification function?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data content reported by the user cannot be modified

Copy link
Contributor

@gosonzhang gosonzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to successfully parse incomplete data?

@luchunliang luchunliang self-requested a review September 19, 2024 03:33
@qy-liuhuo
Copy link
Contributor Author

How to successfully parse incomplete data?

In my opinion, if the transmitted content cannot be parsed after truncation, then the user should not configure the function to allow truncation

@qy-liuhuo
Copy link
Contributor Author

How to successfully parse incomplete data?

Now when the configuration does not allow truncation, if the body exceeds the limit, it will be rejected directly.

Specifically for a body of type byte[], SendResult.BODY_EXCEED_MAX_LEN will be returned directly. And for List<byte[]> type bodies, they are truncated at the body granularity, which ensures the integrity of each body.

@gosonzhang
Copy link
Contributor

How to successfully parse incomplete data?

Now when the configuration does not allow truncation, if the body exceeds the limit, it will be rejected directly.

Specifically for a body of type byte[], SendResult.BODY_EXCEED_MAX_LEN will be returned directly. And for List<byte[]> type bodies, they are truncated at the body granularity, which ensures the integrity of each body.

In any case, the SDK should not set truncation, otherwise this operation will modify the data content reported by the business and cause errors

@qy-liuhuo
Copy link
Contributor Author

How to successfully parse incomplete data?

Now when the configuration does not allow truncation, if the body exceeds the limit, it will be rejected directly.
Specifically for a body of type byte[], SendResult.BODY_EXCEED_MAX_LEN will be returned directly. And for List<byte[]> type bodies, they are truncated at the body granularity, which ensures the integrity of each body.

In any case, the SDK should not set truncation, otherwise this operation will modify the data content reported by the business and cause errors

If it cannot be truncated, is this issue unnecessary?

@gosonzhang
Copy link
Contributor

How to successfully parse incomplete data?

Now when the configuration does not allow truncation, if the body exceeds the limit, it will be rejected directly.
Specifically for a body of type byte[], SendResult.BODY_EXCEED_MAX_LEN will be returned directly. And for List<byte[]> type bodies, they are truncated at the body granularity, which ensures the integrity of each body.

In any case, the SDK should not set truncation, otherwise this operation will modify the data content reported by the business and cause errors

If it cannot be truncated, is this issue unnecessary?

It cannot be truncated, this is the premise. How to check whether it is too long requires analysis to see how to modify it appropriately.

@qy-liuhuo
Copy link
Contributor Author

How to successfully parse incomplete data?

Now when the configuration does not allow truncation, if the body exceeds the limit, it will be rejected directly.
Specifically for a body of type byte[], SendResult.BODY_EXCEED_MAX_LEN will be returned directly. And for List<byte[]> type bodies, they are truncated at the body granularity, which ensures the integrity of each body.

In any case, the SDK should not set truncation, otherwise this operation will modify the data content reported by the business and cause errors

If it cannot be truncated, is this issue unnecessary?

It cannot be truncated, this is the premise. How to check whether it is too long requires analysis to see how to modify it appropriately.

I have fixed it, now if the data is too long, it will directly return BODY_EXCEED_MAX_LEN error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improve][SDK] Optimization of ultra-long field processing in InlongSDK
4 participants