Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not sure why i'm seeing grpc blocking the event loop so often #2753

Open
earonesty opened this issue May 21, 2024 · 4 comments
Open

not sure why i'm seeing grpc blocking the event loop so often #2753

earonesty opened this issue May 21, 2024 · 4 comments

Comments

@earonesty
Copy link

earonesty commented May 21, 2024

Problem description

i tried to turn on event loop block detection, and all i see are grpc calls (we use google pubsub and other grpc-using tools), which doesn't make sense to me, since i thought during the call the system should yield

Reproduction steps

Use grpc on a busy server (we use google secrets manager, google pubsub, etc.)

Environment

  • Linux Ubuntu 18.04 amd64
  • Node 18
  • nvm/yarn

Additional context

Examples:

7: "    at /app/node_modules/@grpc/grpc-js/build/src/resolving-call.js:192:22"
8: "    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)"
message: "Event loop blocked for 600.4265230000019ms, operation started here:"
        "4": "    at Subchannel.startConnecting (/app/node_modules/@grpc/grpc-js/build/src/subchannel.js:278:17)",
        "5": "    at ChannelSubchannelWrapper.startConnecting (/app/node_modules/@grpc/grpc-js/build/src/subchannel-interface.js:34:20)",
        "6": "    at PickFirstLoadBalancer.exitIdle (/app/node_modules/@grpc/grpc-js/build/src/load-balancer-pick-first.js:341:24)",
        "7": "    at ChildLoadBalancerHandler.exitIdle (/app/node_modules/@grpc/grpc-js/build/src/load-balancer-child-handler.js:112:31)",
        "8": "    at ResolvingLoadBalancer.exitIdle (/app/node_modules/@grpc/grpc-js/build/src/resolving-load-balancer.js:235:32)",
        "9": "    at InternalChannel.getConfig (/app/node_modules/@grpc/grpc-js/build/src/internal-channel.js:313:36)",
        "10": "    at ResolvingCall.getConfig (/app/node_modules/@grpc/grpc-js/build/src/resolving-call.js:121:43)",
        "11": "    at /app/node_modules/@grpc/grpc-js/build/src/internal-channel.js:200:26",
        "12": "    at process.processTicksAndRejections (node:internal/process/task_queues:77:11)",
        "message": "Event loop blocked for 99.32906599998473ms, operation started here:"

basically most of my event loop is blocked on grpc stuff. is there some configuration or setting to make it properly yield?

@earonesty
Copy link
Author

earonesty commented May 21, 2024

this is a common one... blocking for 600ms

        "1": "    at TLSWrap.emitInitNative (node:internal/async_hooks:200:43)",
        "2": "    at TLSSocket._wrapHandle (node:_tls_wrap:696:24)",
        "3": "    at new TLSSocket (node:_tls_wrap:577:18)",
        "4": "    at Object.connect (node:_tls_wrap:1752:19)",
        "5": "    at Object.connect (node:internal/http2/core:3298:22)",
        "6": "    at /app/node_modules/@grpc/grpc-js/build/src/transport.js:504:35",
        "7": "    at new Promise (<anonymous>)",
        "8": "    at Http2SubchannelConnector.createSession (/app/node_modules/@grpc/grpc-js/build/src/transport.js:417:16)",
        "9": "    at /app/node_modules/@grpc/grpc-js/build/src/transport.js:557:112",
        "10": "    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)",
        "message": "Event loop blocked for 699.5128900003433ms, operation started here:"

@murgatroid99
Copy link
Member

That last example shows the creation of a TLS socket. It's probably doing some encryption work, which takes time. This should only happen infrequently: one of the basic design principles of gRPC is that you establish a connection once and then use it to make several requests, so the time taken to establish the connection is amortized over all of the requests.

The other examples seem to have incomplete stack traces, so I don't think I can evaluate what happened with them.

@earonesty
Copy link
Author

earonesty commented May 21, 2024

i can show the full traces. the TLS establishment one happens a lot. i have a 10 minute log with a single instance running on google cloud. it happens 5000 times.

here's a full trace 400 non-io milliseconds to "sendMessageOnChild".

      "data": [
        {
          "message": "Event loop blocked for 399.0506949999332ms, operation started here:"
        },
        [
          "    at AsyncHook.init (/app/node_modules/blocked-at/index.js:31:11)",
          "    at emitInitNative (node:internal/async_hooks:200:43)",
          "    at emitInitScript (node:internal/async_hooks:503:3)",
          "    at promiseInitHook (node:internal/async_hooks:322:3)",
          "    at promiseInitHookWithDestroyTracking (node:internal/async_hooks:326:3)",
          "    at Promise.then (<anonymous>)",
          "    at ResolvingCall.sendMessageOnChild (/app/node_modules/@grpc/grpc-js/build/src/resolving-call.js:104:99)",
          "    at /app/node_modules/@grpc/grpc-js/build/src/resolving-call.js:192:22"
        ]

heck here's the full log:

https://text.is/W88JY

you can see almost all the block-logs are "grpc doing stuff". not always TLS. not sure why TLS needs so much CPU with grpc, but not with our https:// rest requests - which never block.

@murgatroid99
Copy link
Member

I don't entirely understand how to interpret these logs. It's not clear what exactly qualifies as "blocking" the event loop. If it's any time spent in JS code, why are these stack traces called out specifically? And what exactly in the stack trace is the blocking time attributed to?

Also, if you have 5000 instances of blocking 400-600ms in 10 minutes, that's over half an hour of blocking total. I assume that means that there are multiple separate client instances. How many times does each client experience this in a 10 minute period?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants