Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

模型转换为tensorrt后,预测结果与pytorch不一致 #3082

Open
2 tasks done
Bonheur96 opened this issue Jul 16, 2024 · 0 comments
Open
2 tasks done

模型转换为tensorrt后,预测结果与pytorch不一致 #3082

Bonheur96 opened this issue Jul 16, 2024 · 0 comments
Assignees

Comments

@Bonheur96
Copy link

Prerequisite

Environment

OrderedDict([('sys.platform', 'win32'), ('Python', '3.8.19 (default, Mar 20 2024, 19:55:45) [MSC v.1916 64 bit (AMD64)]'), ('CUDA available', True), ('MUSA available', False), ('numpy_r
andom_seed', 2147483648), ('GPU 0', 'NVIDIA GeForce RTX 3080'), ('CUDA_HOME', 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6'), ('NVCC', 'Cuda compilation tools, release
11.6, V11.6.55'), ('MSVC', '用于 x64 的 Microsoft (R) C/C++ 优化编译器 19.37.32705 版'), ('GCC', 'n/a'), ('PyTorch', '2.3.1+cu118'), ('PyTorch compiling details', 'PyTorch built with:\n

  • C++ Version: 201703\n - MSVC 192930154\n - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DN
    N v3.3.6 (Git Hash 86e6af5974177e513fd3fee58425e1063e7f1361)\n - OpenMP 2019\n - LAPACK is enabled (usually provided by MKL)\n - CPU capability usage: AVX2\n - CUDA Runtime 11.8\n
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=
    compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compu
    te_37,code=compute_37\n - CuDNN 8.7\n - Magma 2.5.4\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=C:/actions-runner/_wor
    k/pytorch/pytorch/builder/windows/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /Zc:__cplusplus /bigobj /FS /utf-8 -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_N
    OCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE /wd4624 /wd4068 /wd4067 /wd4267 /wd4661 /wd4717 /wd4244 /wd4804 /wd4273, LAPACK_INFO=mkl, PER
    F_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.3.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, U
    SE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, \n'), ('TorchVision', '0.18.1+cu118'), ('OpenCV', '4.10.0'), ('MMEngine', '0.10.4'), ('MMPose', '1.3.1+')])
    mmcv 2.1.0
    mmdeploy 1.3.1
    mmdeploy-runtime-gpu 1.3.1
    mmdet 3.3.0
    mmengine 0.10.4
    mmpose 1.3.1 d:\binocular_camera\mmpose-main
    mmpretrain 1.2.0 d:\binocular_camera\mmpose-main\mmpretrain

Reproduces the problem - code sample

img = r'C:\Users\HYGEA\Desktop\needle-label\240513_label\240513\HD2K_SN28988284_16-19-01_left_5persecond/HD2K_SN28988284_16-19-01_left_5persecond_000000.jpg'
work_dir = 'work_dir/trt/hrnet'
save_file = 'end2end.onnx'
deploy_cfg = r'D:\binocular_camera\mmpose-main\mmdeploy\configs\mmpose\pose-detection_tensorrt-fp16_static-256x256.py'
model_cfg = r'D:\binocular_camera\mmpose-main\configs\needle_10_keypoint\td-hm_hrnetv2-w18_dark-8xb64-60e_wflw-256x256.py'
model_checkpoint = r'D:\binocular_camera\mmpose-main\work_dirs\071624\best_coco_AP_epoch_98.pth'
device = 'cuda'

1. convert model to IR(onnx)

torch2onnx(img, work_dir, save_file, deploy_cfg, model_cfg,
model_checkpoint, device)

2. convert IR to tensorrt

onnx_model = os.path.join(work_dir, save_file)
save_file = 'end2end.engine'
model_id = 0
device = 'cuda'
onnx2tensorrt(work_dir, save_file, model_id, deploy_cfg, onnx_model, device)

3. extract pipeline info for sdk use (dump-info)

export2SDK(deploy_cfg, model_cfg, work_dir, pth=model_checkpoint, device=device)

image_path=r'D:\binocular_camera\mmpose-main\dataset\needle_10\images\val\HD2K_SN28988284_10-04-05_left_2persecond_000075.jpg'
img = cv2.imread(image_path)

model_path=r'D:/binocular_camera/mmpose-main/work_dir\trt\hrnet'
detector = PoseDetector(
model_path=model_path, device_name='cuda', device_id=0)

bbox=[ 939.4366197183099,
408.33333333333337,
906.2300469483569,
213.37089201877927]
if bbox is None:
result = detector(img)
else:
# converter (x, y, w, h) -> (left, top, right, bottom)
start_time = time.time()

print(bbox)
bbox = np.array(bbox, dtype=int)
bbox[2:] += bbox[:2]
result = detector(img, bbox)
end_time = time.time()

_, point_num, _ = result.shape
points = result[:, :, :2].reshape(point_num, 2)
for [x, y] in points.astype(int):
cv2.circle(img, (x, y), 1, (0, 255, 0), 2)

cv2.imwrite('output_pose.png', img)

Reproduces the problem - command or script

Reproduces the problem - error message

loading mmdeploy_trt_net.dll ...
loading mmdeploy_ort_net.dll ...
07/16 15:47:25 - mmengine - WARNING - Failed to search registry with scope "mmpose" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmpose" is a correct scope, or whether the registry is initialized.
07/16 15:47:25 - mmengine - WARNING - Failed to search registry with scope "mmpose" in the "mmpose_tasks" registry tree. As a workaround, the current "mmpose_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmpose" is a correct scope, or whether the registry is initialized.
Loads checkpoint by local backend from path: D:\binocular_camera\mmpose-main\work_dirs\071624\best_coco_AP_epoch_98.pth
07/16 15:47:26 - mmengine - WARNING - DeprecationWarning: get_onnx_config will be deprecated in the future.
07/16 15:47:26 - mmengine - INFO - Export PyTorch model to ONNX: work_dir/trt/hrnet\end2end.onnx.
07/16 15:47:26 - mmengine - WARNING - Can not find torch.nn.functional._scaled_dot_product_attention, function rewrite will not be applied
07/16 15:47:26 - mmengine - WARNING - Can not find mmdet.models.utils.transformer.PatchMerging.forward, function rewrite will not be applied
D:\binocular_camera\mmpose-main\mmpose\models\utils\ops.py:52: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
size = tuple(int(x) for x in size)
07/16 15:47:41 - mmengine - INFO - Successfully loaded tensorrt plugins from C:\Users\HYGEA\anaconda3\envs\mmdeploy\lib\site-packages\mmdeploy\lib\mmdeploy_tensorrt_ops.dll
[07/16/2024-15:47:41] [TRT] [I] [MemUsageChange] Init CUDA: CPU +409, GPU +0, now: CPU 19242, GPU 1382 (MiB)
[07/16/2024-15:47:43] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +389, GPU +102, now: CPU 19799, GPU 1484 (MiB)
[07/16/2024-15:47:43] [TRT] [I] ----------------------------------------------------------------
[07/16/2024-15:47:43] [TRT] [I] Input filename: work_dir/trt/hrnet\end2end.onnx
[07/16/2024-15:47:43] [TRT] [I] ONNX IR version: 0.0.6
[07/16/2024-15:47:43] [TRT] [I] Opset version: 11
[07/16/2024-15:47:43] [TRT] [I] Producer name: pytorch
[07/16/2024-15:47:43] [TRT] [I] Producer version: 2.3.1
[07/16/2024-15:47:43] [TRT] [I] Domain:
[07/16/2024-15:47:43] [TRT] [I] Model version: 0
[07/16/2024-15:47:43] [TRT] [I] Doc string:
[07/16/2024-15:47:43] [TRT] [I] ----------------------------------------------------------------
[07/16/2024-15:47:43] [TRT] [W] onnx2trt_utils.cpp:369: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[07/16/2024-15:47:46] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +5, GPU +10, now: CPU 19731, GPU 1494 (MiB)
[07/16/2024-15:47:46] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +11, GPU +8, now: CPU 19742, GPU 1502 (MiB)
[07/16/2024-15:47:46] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[07/16/2024-15:47:59] [TRT] [W] Weights [name=/backbone/conv1/Conv + /backbone/relu/Relu.weight] had the following issues when converted to FP16:
[07/16/2024-15:47:59] [TRT] [W] - Subnormal FP16 values detected.
[07/16/2024-15:47:59] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[07/16/2024-15:47:59] [TRT] [W] Weights [name=/backbone/conv1/Conv + /backbone/relu/Relu.weight] had the following issues when converted to FP16:
[07/16/2024-15:47:59] [TRT] [W] - Subnormal FP16 values detected.
[07/16/2024-15:47:59] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[07/16/2024-15:47:59] [TRT] [W] Weights [name=/backbone/conv1/Conv + /backbone/relu/Relu.weight] had the following issues when converted to FP16:
[07/16/2024-15:47:59] [TRT] [W] - Subnormal FP16 values detected.
[07/16/2024-15:47:59] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[07/16/2024-15:47:59] [TRT] [W] Weights [name=/backbone/conv1/Conv + /backbone/relu/Relu.weight] had the following issues when converted to FP16:
[07/16/2024-15:47:59] [TRT] [W] - Subnormal FP16 values detected.
[07/16/2024-15:47:59] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[07/16/2024-15:47:59] [TRT] [W] Weights [name=/backbone/conv1/Conv + /backbone/relu/Relu.weight] had the following issues when converted to FP16:
[07/16/2024-15:47:59] [TRT] [W] - Subnormal FP16 values detected.
[07/16/2024-15:47:59] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[07/16/2024-15:47:59] [TRT] [W] Weights [name=/backbone/conv1/Conv + /backbone/relu/Relu.weight] had the following issues when converted to FP16:
[07/16/2024-15:47:59] [TRT] [W] - Subnormal FP16 values detected.
[07/16/2024-15:47:59] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[07/16/2024-15:47:59] [TRT] [W] Weights [name=/backbone/conv1/Conv + /backbone/relu/Relu.weight] had the following issues when converted to FP16:
[07/16/2024-15:47:59] [TRT] [W] - Subnormal FP16 values detected.
[07/16/2024-15:47:59] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[07/16/2024-15:47:59] [TRT] [W] Weights [name=/backbone/conv1/Conv + /backbone/relu/Relu.weight] had the following issues when converted to FP16:
[07/16/2024-15:47:59] [TRT] [W] - Subnormal FP16 values detected.
[07/16/2024-15:47:59] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[07/16/2024-15:47:59] [TRT] [W] Weights [name=/backbone/conv1/Conv + /backbone/relu/Relu.weight] had the following issues when converted to FP16:
[07/16/2024-15:47:59] [TRT] [W] - Subnormal FP16 values detected.
[07/16/2024-15:47:59] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[07/16/2024-15:48:00] [TRT] [W] Weights [name=/backbone/conv2/Conv + /backbone/relu_1/Relu.weight] had the following issues when converted to FP16:
[07/16/2024-15:48:00] [TRT] [W] - Subnormal FP16 values detected.
[07/16/2024-15:48:00] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[07/16/2024-15:48:00] [TRT] [W] Weights [name=/backbone/conv2/Conv + /backbone/relu_1/Relu.weight] had the following issues when converted to FP16:
[07/16/2024-15:48:00] [TRT] [W] - Subnormal FP16 values detected.
[07/16/2024-15:48:00] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[07/16/2024-15:48:00] [TRT] [W] Weights [name=/backbone/conv2/Conv + /backbone/relu_1/Relu.weight] had the following issues when converted to FP16:
[07/16/2024-15:48:00] [TRT] [W] - Subnormal FP16 values detected.
[07/16/2024-15:48:00] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[07/16/2024-15:48:00] [TRT] [W] Weights [name=/backbone/conv2/Conv + /backbone/relu_1/Relu.weight] had the following issues when converted to FP16:
[07/16/2024-15:48:00] [TRT] [W] - Subnormal FP16 values detected.
[07/16/2024-15:48:00] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[07/16/2024-15:48:00] [TRT] [W] Weights [name=/backbone/conv2/Conv + /backbone/relu_1/Relu.weight] had the following issues when converted to FP16:
[07/16/2024-15:48:00] [TRT] [W] - Subnormal FP16 values detected.
[07/16/2024-15:48:00] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[07/16/2024-15:48:00] [TRT] [W] Weights [name=/backbone/conv2/Conv + /backbone/relu_1/Relu.weight] had the following issues when converted to FP16:
[07/16/2024-15:48:00] [TRT] [W] - Subnormal FP16 values detected.
[07/16/2024-15:48:00] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[07/16/2024-15:48:00] [TRT] [W] Weights [name=/backbone/conv2/Conv + /backbone/relu_1/Relu.weight] had the following issues when converted to FP16:
[07/16/2024-15:48:00] [TRT] [W] - Subnormal FP16 values detected.
[07/16/2024-15:48:00] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[07/16/2024-15:48:00] [TRT] [W] Weights [name=/backbone/conv2/Conv + /backbone/relu_1/Relu.weight] had the following issues when converted to FP16:
[07/16/2024-15:48:00] [TRT] [W] - Subnormal FP16 values detected.
[07/16/2024-15:48:00] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[07/16/2024-15:48:02] [TRT] [W] Weights [name=/backbone/layer1/layer1.0/conv2/Conv + /backbone/layer1/layer1.0/relu_1/Relu.weight] had the following issues when converted to FP16:
[07/16/2024-15:48:02] [TRT] [W] - Subnormal FP16 values detected.
[07/16/2024-15:48:02] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[07/16/2024-15:48:02] [TRT] [W] Weights [name=/backbone/layer1/layer1.0/conv2/Conv + /backbone/layer1/layer1.0/relu_1/Relu.weight] had the following issues when converted to FP16:
[07/16/2024-15:48:02] [TRT] [W] - Subnormal FP16 values detected.
[07/16/2024-15:48:02] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[07/16/2024-15:48:02] [TRT] [W] Weights [name=/backbone/layer1/layer1.0/conv2/Conv + /backbone/layer1/layer1.0/relu_1/Relu.weight] had the following issues when converted to FP16:
[07/16/2024-15:48:02] [TRT] [W] - Subnormal FP16 values detected.

Additional information

在linux平台训练和测试后,将模型在windows平台转换为tensorrt,是否有影响

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants