Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hot fix for the repetition penalty and topp topk #214

Conversation

sunggg
Copy link
Member

@sunggg sunggg commented Feb 16, 2024

There was an issue where we have a mixture of greedy and random samplings with top-p/k.

Hope #200 helps testing more comprehensive cases with a number of different combinations.

Also, I made a couple of changes regarding the repetition_mask.
Mainly, some minor changes in the data structure and simplification of the mask computation.
Due to the urgency, I will merge this PR first so please follow-up @vvchernov.

@sunggg sunggg merged commit 7495bd0 into octoml:batch-serving Feb 16, 2024
1 check passed
@vvchernov vvchernov mentioned this pull request Feb 16, 2024
sunggg pushed a commit that referenced this pull request Feb 16, 2024
* transfer prompt mask from sampling params to request state. use torch tensor instead of list

* fix prompt mask for EvalMultiQueryRequest

* clean code

* update sampler tests

* fix after rebase

* add device to comment
@vvchernov vvchernov mentioned this pull request Feb 22, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant