Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Documentation] Difficulty using trt_int8_use_native_calibration_table option in ONNX Runtime #22059

Open
noujaimc opened this issue Sep 11, 2024 · 0 comments
Labels
documentation improvements or additions to documentation; typically submitted using template

Comments

@noujaimc
Copy link

noujaimc commented Sep 11, 2024

Hello,

I'm trying to figure out how to use the trt_int8_use_native_calibration_table option, but I can't find any examples or documentation that explain how to do so.

I generated a TensorRT INT8 calibration table using the IInt8EntropyCalibrator2 class provided by the TensorRT library. The output file is created using the write_calibration_cache method from this class.

Here is the content of the generated cache (model.cache):

TRT-100300-EntropyCalibration2
input: 4000890a
outputs.11: 3c010a14
inputs: 4000890a
inputs.4: 4000890a
onnx::Div_306_output: 4000890a
ONNXTRT_Broadcast_output: 4000890a
onnx::Shape_307: 3c010a14
input.3: 3d837e8e
onnx::MaxPool_316: 3d837e8e
input.7: 3d837e8e
input.15: 3db3d0d2
onnx::Conv_320: 3dabf04c
input.23: 3da4b590
onnx::Conv_323: 3da59c9b
out: 3da091b5
identity: 3db6158c
onnx::Relu_328: 3e13e55b
onnx::Conv_329: 3e1479ce
input.39: 3d970a00
onnx::Conv_332: 3d954d0d
input.47: 3dbef086
onnx::Conv_335: 3de8c6f1
out.3: 3d8c773a
onnx::Relu_338: 3dd20b88
onnx::Conv_339: 3e04f02d
input.59: 3dbafc15
onnx::Conv_342: 3db65224
input.67: 3d9ee4c2
onnx::Conv_345: 3db7d63d
out.7: 3d5825c6
onnx::Relu_348: 3dca0e2a
onnx::Conv_349: 3de1a1e4
input.79: 3dd03b67
onnx::Conv_352: 3dd66c7f
input.87: 3ddb6449
onnx::Conv_355: 3e05c218
out.11: 3e080095
identity.3: 3e268d27
onnx::Relu_360: 3e5f19be
onnx::Conv_361: 3e83c2bd
input.103: 3e1335cf
onnx::Conv_364: 3e1335cf
input.111: 3e03dd99
onnx::Conv_367: 3e105e44
out.15: 3e04f789
onnx::Relu_370: 3e6b5f6e
onnx::Conv_371: 3e83992f
input.123: 3e228955
onnx::Conv_374: 3e1e0353
input.131: 3e02d408
onnx::Conv_377: 3e09e66a
out.19: 3e0b554d
onnx::Relu_380: 3e6c0981
onnx::Conv_381: 3e80409a
input.143: 3de7f5e1
onnx::Conv_384: 3ddd4ea0
input.151: 3dc0cf8e
onnx::Conv_387: 3dafbd70
out.23: 3db63efc
onnx::Relu_390: 3e5bd7c1
onnx::Conv_391: 3e613116
input.163: 3de112a7
onnx::Conv_394: 3ddaf659
input.171: 3de02164
onnx::Conv_397: 3de02164
out.27: 3e1970ad
identity.7: 3e31e82f
onnx::Relu_402: 3e829e55
onnx::Conv_403: 3e8dfc46
input.187: 3e073934
onnx::Conv_406: 3e0cc8ce
input.195: 3dfe0bd3
onnx::Conv_409: 3dfe0bd3
out.31: 3e097656
onnx::Relu_412: 3e668708
onnx::Conv_413: 3e817602
input.207: 3db71f24
onnx::Conv_416: 3d9f9ed7
input.215: 3d8a9236
onnx::Conv_419: 3d6120f4
out.35: 3dd4130b
onnx::Relu_422: 3e612d41
onnx::Conv_423: 3e821f94
input.227: 3d9f4544
onnx::Conv_426: 3da23fdf
input.235: 3d86a04e
onnx::Conv_429: 3d4ecdbc
out.39: 3d99b9fc
onnx::Relu_432: 3e5ffff0
onnx::Conv_433: 3e7a290a
input.247: 3d9c422f
onnx::Conv_436: 3dad6852
input.255: 3d81d9a7
onnx::Conv_439: 3d4fb2bf
out.43: 3d8deaa5
onnx::Relu_442: 3e6a95b9
onnx::Conv_443: 3e78abb8
input.267: 3d9d65fa
onnx::Conv_446: 3dbe5bf3
input.275: 3d8c9bff
onnx::Conv_449: 3d772f5b
out.47: 3db49ddb
onnx::Relu_452: 3e82c087
onnx::Conv_453: 3e82c087
input.287: 3da7b4fb
onnx::Conv_456: 3dac09d3
input.295: 3dae4797
onnx::Conv_459: 3daf468f
out.51: 3d97f74a
identity.11: 3dfb27f7
onnx::Relu_464: 3e0d5192
onnx::Conv_465: 3e21155f
input.311: 3db027d9
onnx::Conv_468: 3db0d36d
input.319: 3d6bd33f
onnx::Conv_471: 3d5e93f3
out.55: 3d93cd0a
onnx::Relu_474: 3dde1e35
onnx::Conv_475: 3e0974a4
input.331: 3d9b10a3
onnx::Conv_478: 3d939f32
input.339: 3d870d8f
onnx::Conv_481: 3d552884
out.59: 3d8ad473
onnx::Relu_484: 3dd3e855
onnx::Conv_485: 3e0868a8
input.379: 3cb16664
input.387: 3db1330d
onnx::Shape_507: 3dae2052
input.367: 3dbd609c
onnx::Concat_491: 3da68ca3
input.391: 3da792c1
outputs.7: 3c010a14
outputs: 3e86947b
onnx::Conv_524: 3d9a8999
output: 3c010a14
onnx::Concat_494: 3da31a68
input.375: 3db6500f
input.407: 3e89b38d
outputs.3: 3acea51a
onnx::Conv_527: 3d3ab713
input.351: 3de23aec
onnx::Concat_497: 3da36c1d
onnx::Concat_488: 3dfb4c16
onnx::Concat_520: 3dae2052
input.403: 3d380b6f
input.359: 3db1b67e
input.395: 3d9895d2

I then try to use this cache in ONNX Runtime as follows:

import cv2
import onnxruntime
import time
import numpy as np
import os

onnx_model_path = 'model.onnx'

providers = [
    ('TensorrtExecutionProvider', {
        'trt_engine_cache_enable': True,
        'trt_engine_cache_path': 'trt_engine_cache',
        'trt_int8_enable': True,
        'trt_fp16_enable': True,
        'trt_int8_use_native_calibration_table': True,
        'trt_int8_calibration_table_name': 'model.cache',
    })
]

session = onnxruntime.InferenceSession(onnx_model_path, providers=providers)

input_name = session.get_inputs()[0].name

input_folder = 'data'
output_folder = 'output_images'

if not os.path.exists(output_folder):
    os.makedirs(output_folder)

for filename in os.listdir(input_folder):
    if filename.endswith(('.png', '.jpg', '.jpeg')):
        image_path = os.path.join(input_folder, filename)
        image = cv2.imread(image_path)
        
        if image is None:
            print(f"Failed to load image: {image_path}")
            continue
        
        image = image.astype(np.float32)
        image = np.expand_dims(image, axis=0)

        start_time = time.time()
        result = session.run(None, {input_name: image})
        end_time = time.time()
        duration = end_time - start_time
        print(f"Inference for {filename} completed in {duration:.4f} seconds")

When I run the code, the inference speed and output are the same as when I was using FP16. It seems like the INT8 calibration is not being applied, or it's not working correctly.

Could you provide guidance or examples on how to properly use the trt_int8_use_native_calibration_table option? Is there any additional configuration needed to ensure that INT8 optimization is applied in this scenario?

I'm currently using this method because the ONNX Runtime quantization tools take many hours to calibrate to an INT8 model, while TensorRT calibration only takes minutes (see issue and issue). I also noticed that TensorRT created a repository for model optimization, so I tried it out. However, I found that under the hood, it's calling ONNX Runtime quantization tools. I'm now trying to figure out the most efficient way to generate an INT8 model quickly and use it in ONNX Runtime. Thank you

Page / URL

No response

@noujaimc noujaimc added the documentation improvements or additions to documentation; typically submitted using template label Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation improvements or additions to documentation; typically submitted using template
Projects
None yet
Development

No branches or pull requests

1 participant