Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add support for passing calib sequence length, and num samples + fixing use of custom calibration dataset for smoothquant in llama #2243

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Bhuvanesh09
Copy link
Contributor

Issue in the past code:

  • Unable to use SmoothQuant with custom calibration dataset in the current code of convert_checkpoint.py. This is because:
    • load_calib_dataset function in tensorrt_llm/models/convert_utils.py has the default value of split and key set to be None. In the past, this function would work only for cnn_dailymail, lambada. But with better default values of "train" split and "text" key, this function would be able to load any dataset configured correctly.
  • The number of samples for calibration in smoothquant was hard coded to 512 samples and max sequence length of 512 which lead to errors and incorrect calibration with custom calibration datasets.
    • This PR fixes the above with small changes in the files :
      • tensorrt_llm/models/llama/convert.py.
      • examples/llama/convert_checkpoint.py.
      • tensorrt_llm/models/llama/model.py.

Results:

With the above changes, to the release v0.12.0. I was able to check that the calibration works well with much better quality of quantized model in comparison to default calibration dataset. For SmoothQuant especially, when per_token is set to true, it is important that the calibration sequence length matches the distribution of the samples which are used during deployment/production.

@Bhuvanesh09
Copy link
Contributor Author

@Barry-Delaney : Kindly take a look. Thanks in advance!

@Barry-Delaney Barry-Delaney self-assigned this Sep 20, 2024
@Bhuvanesh09
Copy link
Contributor Author

Any help needed here @Barry-Delaney ?

Copy link
Collaborator

@Barry-Delaney Barry-Delaney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @Bhuvanesh09!
Overall, the change looks good to me. Left some minor comments.
We will integrate the calibration process into examples/quantization/quantize.py in the future, and the args for calib_size and the usage of customized dataset is already implemented there.
For now, we will merge it first, and thanks for your contribution!

tensorrt_llm/models/convert_utils.py Outdated Show resolved Hide resolved
@Barry-Delaney
Copy link
Collaborator

Barry-Delaney commented Sep 28, 2024

The change looks good to me. We'll merge your changes into internal code base.
Thanks for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants