Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add blobKeepAliveTimeout, queueKeepAliveTimeout, tableKeepAliveTimeout #2443

Closed
wants to merge 3 commits into from

Conversation

Slach
Copy link

@Slach Slach commented Aug 1, 2024

fix #2053
workaround for ClickHouse/ClickHouse#60447

we use azurite for Azure BLOB storage in our integration tests
Azure CPP Sdk (or it usage on clickhouse-server side) don't respect Keep-Alive: 5 header which sent from azurite side, and continue to make request to already closed connection after 5 seconds, this header sent from standard nodejs http library

We are going to fix this behavior in the next version of clickhouse-server
but currently to allow pass e2e tests, we would like to propose adding cli option and config settings to allow increase value for Keep-Alive header

we tested these changes and it actually works, hope your CI/CD pipelines also will pass

@Slach
Copy link
Author

Slach commented Aug 1, 2024

@microsoft-github-policy-service agree

Slach added a commit to Altinity/clickhouse-backup that referenced this pull request Aug 1, 2024
@blueww
Copy link
Member

blueww commented Aug 2, 2024

/azp run

Copy link

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@blueww
Copy link
Member

blueww commented Aug 2, 2024

@Slach

Thanks for the contribution!
Would you please rebase the code on the latest main to make PR validation can run?

And could you give more details on why the change can fix the issue?
And is the current default value aligned with product server behavior?
BTW, could you please also add test cases to make sure the new parameters really work?

@Slach
Copy link
Author

Slach commented Aug 2, 2024

rebased

@Slach
Copy link
Author

Slach commented Aug 2, 2024

PR description updated

@blueww
Copy link
Member

blueww commented Aug 2, 2024

@Slach

Besides test class "FSExtentStore" failure on BlobTest_Win_Node* is know issue which is in fixing, other test failures are new and related with this PR change.
Please help to fix the test failures.

Most of the new failure looks like :

  1) Blob Cors requests test
       Request Match rule exists for exact origin @loki @sql:
     FetchError: request to http://127.0.0.1:11000/devstoreaccount1?restype=service&comp=properties failed, reason: read ECONNRESET
      at ClientRequest.<anonymous> (node_modules/node-fetch/lib/index.js:1491:11)
      at ClientRequest.emit (node:events:519:28)
      at ClientRequest.emit (node:domain:488:12)
      at Socket.socketErrorListener (node:_http_client:500:9)
      at Socket.emit (node:events:519:28)
      at Socket.emit (node:domain:488:12)
      at emitErrorNT (node:internal/streams/destroy:169:8)
      at emitErrorCloseNT (node:internal/streams/destroy:128:3)
      at processTicksAndRejections (node:internal/process/task_queues:82:21)

@@ -186,10 +186,13 @@ Following extension configurations are supported:

- `azurite.blobHost` Blob service listening endpoint, by default 127.0.0.1
- `azurite.blobPort` Blob service listening port, by default 10000
- `azurite.blobKeepAliveTimeout` Blob service keep alive timeout, by default 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the value before this PR?
Could we use the value before this PR is merged to avoid regression?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was 5 by default

Copy link
Member

@blueww blueww Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks might not be 5 for all scenarios, since so many test cases failed with "read ECONNRESET", looks the network break since keep live timeout.

@Slach
Copy link
Author

Slach commented Aug 2, 2024

sorry
i don't understand how to interpet stacktrace in #2443 (comment)

how to match which test exactly doesn't work? And how to run only this failed test?

i tried to run npm run test:blob locally on node 20

and got lot of failures can't figure out how to debug it

@blueww
Copy link
Member

blueww commented Aug 5, 2024

@Slach

If you click the "Details" of the failed jobs, you should can see the failure log, and find the failed test class/case name.
image

You might can first focus to make the following 2 classes pass with node 20 on win or Ubuntu:

  • BlockBlobHighlevel
  • ServiceAPIs

BTW, the failed test cases are all for "read ECONNRESET", it might caused by the keepalive timeout change might be too short.

@blueww
Copy link
Member

blueww commented Aug 5, 2024

@Slach

This RP looks target for a temp fix of CPP SDK issue (you said : "We are going to fix this behavior in the next version of clickhouse-server"), and this looks a high risk change since there are so many test cases failed.
So to avoid blocking other Azurite user, we might don't prefer to take this change.

I would suggest you contact CPP SDK team if this is a CPP SDK issue.
If you would like to use Azurite with the customized keeplivetimeout, I would suggest you build your own Azurite package from your code, instead push the change to Azurite public repo.

@blueww
Copy link
Member

blueww commented Aug 8, 2024

Close this PR as its looks a temp fix for client SDK issue, and take many Azurite Unit test fail.

@blueww blueww closed this Aug 8, 2024
@Slach
Copy link
Author

Slach commented Aug 26, 2024

@blueww
I figure out with tests, now it passed in my local machine could we reopen PR? or i shall create new?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

unexptected TCP RST from azurite side
2 participants