Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add cached generation buffer #1685

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

michael200892458
Copy link

in python GenerationSession.setup, call torch.empty new GPU memory, we should new this memory in GenerationSession init with call method _init_cache_buffer. In doing so, it can avoid memory fragmentation and Accelerate memory allocation speed

@byshiue byshiue self-assigned this Jun 6, 2024
@byshiue byshiue added the triaged Issue has been triaged by maintainers label Jun 6, 2024
@byshiue
Copy link
Collaborator

byshiue commented Jun 6, 2024

Sorry for delay response. Could you explain more about "it can avoid memory fragmentation and Accelerate memory allocation speed"? Should PyTorch memory pool prevent such issue?
Could you share the performance number to help understand the gap caused by this improvement?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers waiting for feedback
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants