Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces a new feature for the Reactor Pool, aimed at preventing resource starvation in highly concurrent scenarios. Currently, when multiple threads simultaneously acquire and release resources, the SimpleDequeuePool's
drainLoop
method can monopolize a CPU core, causing a bottleneck where other threads may remain less utilized.There are of course existing ways to prevent one thread being executing the drainLoop method for a too long time, one can for instance use an
acquisition scheduler
, in order to offload borrowers delivery using a configured scheduler, at the cost of using extra threads.This PR experiments another way to offload the delivery of borrowers. Instead of using an acquisition scheduler, a concurrent
InstrumentedPoolDecorator
can be used. it allows to create a pool composed of multiple sub pools, each managing a portion of resources. Application Executors (i.e Netty Event Loops for example) can then be reused and assigned to each sub pools, where acquisitions tasks will be scheduled. No extra threads will be created. This design enables concurrent distribution of resource acquisitions across these sub-pools, using a work-stealing approach where a busy sub-pool can be helped by another free sub-pool, which can steal acquisition tasks from the busy sub-pool.For instance, in Reactor Netty, each Netty Event Loop thread will have its sub-pool with its assigned HTTP/2 connection resources.
The work is still in progress (even I'm not sure about the API), some issues remain and this PR could be merged in a temporary branch, in order to make it easier to continue the work on it. I don't have created a branch for the moment, to be confirmed by the Reactor team.
Attached to this PR a JMH project which compares the different approaches. The
WorkStealingPoolBenchmark
simulates a netty application which run hundreds of thousands of tasks running within an EventLoop Group. Each task will then acquire and release some resources. The benchmark needs this PR to be compiled and installed in local M2.See attached
reactor-pool-jmh.tgz
In the WorkStealingPoolBenchmark class:
benchWithSimplePool: this method simulates activities that are running within a Netty EventLoop group. So tasks are scheduled in Event Loops, each one is then acquiring/releasing a
PiCalculator
resources. When a task has acquired a PICalculator, the PI number is computed, and the PiCalculator is then returned to the pool. When running this method, there will be actually one single running thread: it will be the one that is running the drainLoop method, which will spend it's life delivering "PiCalculator" resources to all borrowers from all event loop threads. Since no acquisition scheduler is used, the drainLoop is then consuming one core forever, and other tasks are just scheduling acquisition tasks to the pool. Check with Top, the process only consume 125% of total CPUs. The test runs in about 18 seconds on a Mac M1 with 10 cpus.benchWithAcquisitionSchedulerEventLoop: this benchmark tries to avoid the starvation problem by using an acquisition scheduler, where all borrowers are then delivered using the EventLoopGroup of the simulated application. This significantly improves performance (see below the results: about 10.78 secs).
benchWithAcquisitionSchedulerFJP: this time, the common system ForkJoinPool is used to offload deliveries of borrowers. This improves even more performances, at the cost of using extra threads (in Reactor Netty, idealy, for example, we would like to avoid using extra threads, only the event loop threads ...): 5.11 sec.
finally, the benchWithConcurrentPools method is doing a benchmark with this PR: a "concurrent" InstrumentedPool, composed of multiple sub pools, each one assigned to each Event Loop Executors: 2.75 sec.
The results are the following (tested with JDK 21):
(lesser score is better)
Remaining issues:
InstrumentedPoolDecorators.concurrentPools
factory method.