-
Notifications
You must be signed in to change notification settings - Fork 420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
An algorithm for new target allocation, based on the average job algorithm. #3128
Comments
The algorithm is designed as follows:
|
@zh-w what do the metrics from the target allocator look like for the targets per collector metric? It's possible that the distribution is even, but the memory usage is not in the scenario where some of your scrape targets emit more metrics than others. |
I wouldn't really mind if we just made the Both |
|
My bad, I meant |
I retested in the my k8s cluster, which consists of about 1000 nodes. The configuration of my collector cr is as follows(Simplified):
When I use the consistent-hashing algorithm, the number of targets for the collector is as follows: and the datapoints received by collector distributed as follows(The statistical data was collected by a processor ‘stats’ I implemented): When I use the new job-average algorithm, the distribution of target numbers for the collector is as follows: the datapoints received by collector distributed as follows: |
I use metric "opentelemetry_allocator_targets_per_collector" from target-allocator to count the number of targets for the collector,and I added a “job_name” label to this metric for more accurately tally the target number of different jobs。 |
The ‘stats’ processor in collector implemented as follows:
Exposed metric of 'stats' processor is as follews:
|
I think there are different use situations。One kind of situation is that targets are relatively stable:for example,collector is deployed to collect stable jobs ,like cadvior、kube-state-metrics、node-exporter etc,targets of those jobs usually related to number of node。In this situation,which target would not frequently adding or deleting,this new algorithm works well。 |
Use
I tested least-weighted algorithm @swiatekm , the number of targets for the collector is as follows: the datapoints received by collector distributed as follows: It seems that a small number of collectors got just one target. |
Component(s)
target allocator
Is your feature request related to a problem? Please describe.
In the scenario where I use target-allocator, there are usually different jobs, and the metric datapoints of targets for each job vary significantly. For example, there are five types of collection jobs: A, B, C, D, and E. Suppose each job has the same collection interval and each job has 10 targets. The number of datapoints pulled by each target of job A is 1000 (e.g., KSM), for job B it is 100, for jobs C and D it is 50, and for job E it is 10.
At the same time, assume I have 5 collector instances deployed in StatefulSets. When using consistent-hashing or least-weighted algorithms, the targets for each job are not evenly distributed across each collector instance. In the assumed collection scenario, it is possible that collector-0 is assigned 3 targets of job A, while collector-4 is assigned 0 targets of job A. This can result in a significant disparity in the number of datapoints collected by each collector, leading to an imbalance in load.
In my actual use case, this situation occurs quite frequently. Below is a diagram showing the load distribution of collectors in a large cluster I deployed (using the consistent-hashing algorithm), illustrating the extreme imbalance in resource utilization across each collector.
Describe the solution you'd like
I have implemented a load-balancing algorithm based on jobs. The algorithm is designed as follows:
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: