You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a onnx model whose size is only 204.57MB,but when I create the session, gpu memory consumpation comes 1.16GB, when inferencing, the gpu memory consumpation comes to 2.25GB, this result in high inference cost, so how to reduce gpu memory consumption ?
To reproduce
just simply create onnxruntime session with default options. the gpu memory consumption function:
Describe the issue
I have a onnx model whose size is only 204.57MB,but when I create the session, gpu memory consumpation comes 1.16GB, when inferencing, the gpu memory consumpation comes to 2.25GB, this result in high inference cost, so how to reduce gpu memory consumption ?
To reproduce
just simply create onnxruntime session with default options. the gpu memory consumption function:
Urgency
No response
Platform
Linux
OS Version
ubuntu 20.04
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
onnxruntime-gpu 1.11.0
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
11.4
Model File
No response
Is this a quantized model?
No
The text was updated successfully, but these errors were encountered: