Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(vllm metrics): error stack trace #3200

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gitdallas
Copy link
Contributor

@gitdallas gitdallas commented Sep 12, 2024

closes: https://issues.redhat.com/browse/RHOAIENG-11522

this is a situation that would result in the error stack and ui crash without this change:
image

Description

prevent ui from crashing. let the query be undefined if it doesn't exist, which will result in an empty data and no errors. vince said he did not want an error message at all as it might convey to the user that it might resolve with a refresh or something.

How Has This Been Tested?

tested code on a previous deploy that would crash the ui on metrics page, it no longer crashes. used MR cluster to test. existing tests still pass.

Test Impact

added a new test using mock data that only contains 1 query and made sure that the 4 charts show up (instead of an error stack page). i also updated the test mock for prometheus/serving to return empty results if the request body includes query=undefined\b as it would in the real endpoint. Here's a screenshot from a test with a missing query resulting in no data for one of the serving endpoints (it still shows the data):
image

Request review criteria:

test a vllm deploy, view the metrics. also view metrics of other types.

Self checklist (all need to be checked):

  • The developer has manually tested the changes and verified that the changes work
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has added tests or explained why testing cannot be added (unit or cypress tests for related changes)

If you have UI changes:

  • Included any necessary screenshots or gifs if it was a UI change.
  • Included tags to the UX team if it was a UI/UX change.

After the PR is posted & before it merges:

  • The developer has tested their solution on a cluster by using the image produced by the PR to main

Copy link
Contributor

openshift-ci bot commented Sep 12, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign manosnoam for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gitdallas gitdallas force-pushed the bug/11522-vllm-metrics branch 2 times, most recently from 2e62654 to 9f81631 Compare September 12, 2024 15:16
@vconzola
Copy link

LGTM.

Copy link

codecov bot commented Sep 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.44%. Comparing base (b5351a7) to head (8df986a).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3200      +/-   ##
==========================================
+ Coverage   85.39%   85.44%   +0.04%     
==========================================
  Files        1277     1277              
  Lines       28082    28088       +6     
  Branches     7487     7495       +8     
==========================================
+ Hits        23980    23999      +19     
+ Misses       4102     4089      -13     
Files with missing lines Coverage Δ
...end/src/api/prometheus/kservePerformanceMetrics.ts 98.03% <100.00%> (+0.16%) ⬆️
.../metrics/kserve/content/KserveMeanLatencyGraph.tsx 100.00% <100.00%> (ø)

... and 8 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b5351a7...8df986a. Read the comment docs.

@gitdallas gitdallas force-pushed the bug/11522-vllm-metrics branch 3 times, most recently from cfd4539 to faab5b4 Compare September 17, 2024 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants