[WIP] Prometheus Metrics #1461

binarycrayon · 2024-09-19T03:19:46Z

Motivation

Initial integration with prometheus client to facilitate metrics logging and expose the metrics to /metrics, which can be collected by external Grafana collectors such as alloy.

prometheus client

This is a Draft, how do I make this a draft?

Modifications

Add basic metrics collector and metrics type
Lazy initiate prometheus client in multiprocess mode
Log stats in TPServer
Stats at /metrics

Checklist

Format your code according to the Contributor Guide.
Add unit tests as outlined in the Contributor Guide.
Update documentation as needed, including docstrings or example tutorials.

blacker521 · 2024-09-19T07:10:01Z

File "/python/sglang/srt/managers/tp_worker.py", line 142, in init
from python.sglang.srt.metrics.metrics_collector import SGLangMetricsCollector

You should delete python.

add metrics

binarycrayon · 2024-09-22T21:01:48Z

Sample of current logged metrics

# HELP sglang:new_seq Number of new sequences
# TYPE sglang:new_seq gauge
sglang:new_seq{name="google/gemma-2-9b-it"} 50.0
# HELP sglang:new_token Number of new token
# TYPE sglang:new_token gauge
sglang:new_token{name="google/gemma-2-9b-it"} 153.0
# HELP sglang:cached_token Number of cached token
# TYPE sglang:cached_token gauge
sglang:cached_token{name="google/gemma-2-9b-it"} 6252.0
# HELP sglang:cache_hit_rate Cache hit rate
# TYPE sglang:cache_hit_rate gauge
sglang:cache_hit_rate{name="google/gemma-2-9b-it"} 87.58777633289988
# HELP sglang:queue_req Number of queue requests
# TYPE sglang:queue_req gauge
sglang:queue_req{name="google/gemma-2-9b-it"} 0.0
# HELP sglang:num_requests_running Number of requests currently running on GPU
# TYPE sglang:num_requests_running gauge
sglang:num_requests_running{name="google/gemma-2-9b-it"} 0.0
# HELP sglang:num_requests_waiting Number of requests waiting to be processed.
# TYPE sglang:num_requests_waiting gauge
sglang:num_requests_waiting{name="google/gemma-2-9b-it"} 0.0
# HELP sglang:gen_throughput Gen token throughput (token/s)
# TYPE sglang:gen_throughput gauge
sglang:gen_throughput{name="google/gemma-2-9b-it"} 0.0
# HELP sglang:token_usage Total token usage
# TYPE sglang:token_usage gauge
sglang:token_usage{name="google/gemma-2-9b-it"} 0.0
# HELP sglang:max_total_num_tokens Maximum total number of tokens
# TYPE sglang:max_total_num_tokens gauge
sglang:max_total_num_tokens{name="google/gemma-2-9b-it"} 161723.0
# HELP sglang:max_prefill_tokens Maximum prefill tokens
# TYPE sglang:max_prefill_tokens gauge
sglang:max_prefill_tokens{name="google/gemma-2-9b-it"} 16384.0
# HELP sglang:max_running_requests Maximum running requests
# TYPE sglang:max_running_requests gauge
sglang:max_running_requests{name="google/gemma-2-9b-it"} 4097.0
# HELP sglang:context_len Context length
# TYPE sglang:context_len gauge
sglang:context_len{name="google/gemma-2-9b-it"} 8192.0
# HELP sglang:request_prompt_tokens Number of prefill tokens processed
# TYPE sglang:request_prompt_tokens histogram
sglang:request_prompt_tokens_sum{name="google/gemma-2-9b-it"} 1129.0
sglang:request_prompt_tokens_bucket{le="1.0",name="google/gemma-2-9b-it"} 0.0
sglang:request_prompt_tokens_bucket{le="2.0",name="google/gemma-2-9b-it"} 0.0
sglang:request_prompt_tokens_bucket{le="5.0",name="google/gemma-2-9b-it"} 0.0
sglang:request_prompt_tokens_bucket{le="10.0",name="google/gemma-2-9b-it"} 1.0
sglang:request_prompt_tokens_bucket{le="20.0",name="google/gemma-2-9b-it"} 1.0
sglang:request_prompt_tokens_bucket{le="50.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_prompt_tokens_bucket{le="100.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_prompt_tokens_bucket{le="200.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_prompt_tokens_bucket{le="500.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_prompt_tokens_bucket{le="1000.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_prompt_tokens_bucket{le="2000.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_prompt_tokens_bucket{le="5000.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_prompt_tokens_bucket{le="10000.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_prompt_tokens_bucket{le="20000.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_prompt_tokens_bucket{le="50000.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_prompt_tokens_bucket{le="100000.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_prompt_tokens_bucket{le="+Inf",name="google/gemma-2-9b-it"} 51.0
sglang:request_prompt_tokens_count{name="google/gemma-2-9b-it"} 51.0
# HELP sglang:request_generation_tokens Number of generation tokens processed.
# TYPE sglang:request_generation_tokens histogram
sglang:request_generation_tokens_sum{name="google/gemma-2-9b-it"} 5341.0
sglang:request_generation_tokens_bucket{le="1.0",name="google/gemma-2-9b-it"} 0.0
sglang:request_generation_tokens_bucket{le="2.0",name="google/gemma-2-9b-it"} 0.0
sglang:request_generation_tokens_bucket{le="5.0",name="google/gemma-2-9b-it"} 0.0
sglang:request_generation_tokens_bucket{le="10.0",name="google/gemma-2-9b-it"} 1.0
sglang:request_generation_tokens_bucket{le="20.0",name="google/gemma-2-9b-it"} 1.0
sglang:request_generation_tokens_bucket{le="50.0",name="google/gemma-2-9b-it"} 1.0
sglang:request_generation_tokens_bucket{le="100.0",name="google/gemma-2-9b-it"} 1.0
sglang:request_generation_tokens_bucket{le="200.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_generation_tokens_bucket{le="500.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_generation_tokens_bucket{le="1000.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_generation_tokens_bucket{le="2000.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_generation_tokens_bucket{le="5000.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_generation_tokens_bucket{le="10000.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_generation_tokens_bucket{le="20000.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_generation_tokens_bucket{le="50000.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_generation_tokens_bucket{le="100000.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_generation_tokens_bucket{le="+Inf",name="google/gemma-2-9b-it"} 51.0
sglang:request_generation_tokens_count{name="google/gemma-2-9b-it"} 51.0

binarycrayon added 2 commits September 17, 2024 03:28

add promethus client to server

e7fb547

Add basic config stats

a5b4ef7

blacker521 and others added 2 commits September 19, 2024 18:38

add metrics

11f3112

Merge pull request #1 from blacker521/metrics

5888680

add metrics

binarycrayon changed the title ~~Metrics~~ [WIP] Prometheus Metrics Sep 19, 2024

binarycrayon added 3 commits September 19, 2024 23:17

copyright and Code formatting with pre-commit

1910268

a working setup for prometheus and grafana

a0b5b39

label metric names with sglang prefix

410c3b8

Ying1123 mentioned this pull request Sep 22, 2024

Development Roadmap (2024 Q3) #634

Open

30 tasks

merrymercy mentioned this pull request Sep 22, 2024

Development Roadmap (2024 Q4) #1487

Open

31 tasks

binarycrayon added 7 commits September 22, 2024 18:36

Add default value to metrics types

651b10a

Add console logger

36ab821

refactor to tp worker

4772039

Merge branch 'main' into metrics

98b3b24

integrate token usage metrics

613d411

clean up code and revert unintended changes

681976d

code formatting

ed4997e

binarycrayon added 9 commits September 22, 2024 21:47

fix typo

029316b

grafana dashboard

7cdbd85

adjust stats and log requests

576007b

revert the log print edit as this is out of the scope of this pr

b3f6b26

update dashboard example

e5434f2

add back missing gauge stats

f663cde

adjust how request stats are collected

dcd946e

adjust dashboard

103040e

Merge branch 'main' into metrics

0e5c202

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Prometheus Metrics #1461

[WIP] Prometheus Metrics #1461

binarycrayon commented Sep 19, 2024 •

edited

Loading

blacker521 commented Sep 19, 2024 •

edited

Loading

binarycrayon commented Sep 22, 2024

[WIP] Prometheus Metrics #1461

Are you sure you want to change the base?

[WIP] Prometheus Metrics #1461

Conversation

binarycrayon commented Sep 19, 2024 • edited Loading

Motivation

Modifications

Checklist

blacker521 commented Sep 19, 2024 • edited Loading

binarycrayon commented Sep 22, 2024

binarycrayon commented Sep 19, 2024 •

edited

Loading

blacker521 commented Sep 19, 2024 •

edited

Loading