fix: add support for passing calib sequence length, and num samples + fixing use of custom calibration dataset for smoothquant in llama #2243

Bhuvanesh09 · 2024-09-19T11:22:05Z

Issue in the past code:

Unable to use SmoothQuant with custom calibration dataset in the current code of convert_checkpoint.py. This is because:
- load_calib_dataset function in tensorrt_llm/models/convert_utils.py has the default value of split and key set to be None. In the past, this function would work only for cnn_dailymail, lambada. But with better default values of "train" split and "text" key, this function would be able to load any dataset configured correctly.
The number of samples for calibration in smoothquant was hard coded to 512 samples and max sequence length of 512 which lead to errors and incorrect calibration with custom calibration datasets.
- This PR fixes the above with small changes in the files :
  - tensorrt_llm/models/llama/convert.py.
  - examples/llama/convert_checkpoint.py.
  - tensorrt_llm/models/llama/model.py.

Results:

With the above changes, to the release v0.12.0. I was able to check that the calibration works well with much better quality of quantized model in comparison to default calibration dataset. For SmoothQuant especially, when per_token is set to true, it is important that the calibration sequence length matches the distribution of the samples which are used during deployment/production.

Bhuvanesh09 · 2024-09-19T11:59:30Z

@Barry-Delaney : Kindly take a look. Thanks in advance!

Bhuvanesh09 · 2024-09-25T09:33:50Z

Any help needed here @Barry-Delaney ?

Barry-Delaney

Thanks for the PR @Bhuvanesh09!
Overall, the change looks good to me. Left some minor comments.
We will integrate the calibration process into examples/quantization/quantize.py in the future, and the args for calib_size and the usage of customized dataset is already implemented there.
For now, we will merge it first, and thanks for your contribution!

tensorrt_llm/models/convert_utils.py

…alues in DEFAULT_HF_DATASET_META

Barry-Delaney · 2024-09-28T17:27:13Z

The change looks good to me. We'll merge your changes into internal code base.
Thanks for the contribution!

fix: fixing use of custom calibration dataset for smoothquant in llama

5df30e1

Barry-Delaney self-assigned this Sep 20, 2024

Barry-Delaney reviewed Sep 25, 2024

View reviewed changes

tensorrt_llm/models/convert_utils.py Outdated Show resolved Hide resolved

fix: Correctly setting default split and key without overriding the v…

67aa530

…alues in DEFAULT_HF_DATASET_META

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add support for passing calib sequence length, and num samples + fixing use of custom calibration dataset for smoothquant in llama #2243

fix: add support for passing calib sequence length, and num samples + fixing use of custom calibration dataset for smoothquant in llama #2243

Bhuvanesh09 commented Sep 19, 2024

Bhuvanesh09 commented Sep 19, 2024

Bhuvanesh09 commented Sep 25, 2024

Barry-Delaney left a comment

Barry-Delaney commented Sep 28, 2024 •

edited

Loading

fix: add support for passing calib sequence length, and num samples + fixing use of custom calibration dataset for smoothquant in llama #2243

Are you sure you want to change the base?

fix: add support for passing calib sequence length, and num samples + fixing use of custom calibration dataset for smoothquant in llama #2243

Conversation

Bhuvanesh09 commented Sep 19, 2024

Issue in the past code:

Results:

Bhuvanesh09 commented Sep 19, 2024

Bhuvanesh09 commented Sep 25, 2024

Barry-Delaney left a comment

Choose a reason for hiding this comment

Barry-Delaney commented Sep 28, 2024 • edited Loading

Barry-Delaney commented Sep 28, 2024 •

edited

Loading