You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The rdma indicator is needed to confirm whether the training is going through rdma as expected and whether the communication performance is good
Some rdma indicators can be seen in node-export shared mode and exclusive mode. The vf indicators on the host can be captured through node-export, but the vf device in the container in exclusive mode cannot be viewed on the host, so node-export is not applicable. Investigate some functions of sriov-operator in large cluster mode to see if it meets the ability to obtain rdma indicators.
Why is this needed?
No response
How to implement it (if possible)?
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
What would you like to be added?
The rdma indicator is needed to confirm whether the training is going through rdma as expected and whether the communication performance is good
Some rdma indicators can be seen in node-export shared mode and exclusive mode. The vf indicators on the host can be captured through node-export, but the vf device in the container in exclusive mode cannot be viewed on the host, so node-export is not applicable. Investigate some functions of sriov-operator in large cluster mode to see if it meets the ability to obtain rdma indicators.
Why is this needed?
No response
How to implement it (if possible)?
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: