Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separated control and compute loop, shorten the critical path, and enable more complicated policies #1287

Open
wants to merge 39 commits into
base: main
Choose a base branch
from

Conversation

Ying1123
Copy link
Member

@Ying1123 Ying1123 commented Sep 1, 2024

Moved from #1182. @xiezhq-hermann @hnyls2002
(I'm sorry that the original PR has been closed by me accidentally.)

Motivation

The existing design of the scheduler coupled control logic and model computation. While it simplifies the implementation, the overhead of scheduling can sometimes be non-negligible, especially when interacting with radix tree.
This PR decoupled the control and compute loops into different threads, overlapping computation and tasks that are not necessary on the critical paths. This reduces the scheduling overhead and enables more complicated policies in the future to be implemented on the control plane.
Based on the latest CI benchmark result, the PR would introduce about 10% throughput gain by reducing the overhead on critical path.

Modifications

  1. Serialize access to shared resources including radix tree and memory pools with minimal overhead for safety.
  2. A control loop that prepares the next prefill batch, handles finished requests and updates radix tree, concurrently with the compute loop.
  3. Making policy scheduler compatible with the new design (to be further refined for corner cases).

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@hnyls2002 hnyls2002 force-pushed the xiezhq-hermann/main branch 2 times, most recently from 66b98d5 to f4be82c Compare September 3, 2024 19:22
@merrymercy
Copy link
Contributor

What is the status of this PR? Do we still plan to merge it?

@xiezhq-hermann
Copy link
Collaborator

xiezhq-hermann commented Sep 4, 2024

What is the status of this PR? Do we still plan to merge it?

I think so, @hnyls2002 did a fix on the case of tp_size > 1, which should behave the same as before and serve as a temporary fix. For the single GPU setting, it achieves 10%-15% throughput gain. We can always wait to have a more complete PR to merge, but I would suggest doing it incrementally before it diverges too much.
We might still need some style change before merging so please help review the PR. @merrymercy @Ying1123

@merrymercy merrymercy mentioned this pull request Sep 22, 2024
31 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants