Skip to content

Weird "Missing ranks" error in parallel training using horovod #3937

Answered by DingChangjie
DingChangjie asked this question in Q&A
Discussion options

You must be logged in to vote

Finally, I've figured out that this problem is caused by the environment variable KMP_AFFINITY (I manually changed it to scatter). This variable should be automatically set by deepmd-kit ......

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@DingChangjie
Comment options

@njzjz
Comment options

@DingChangjie
Comment options

Answer selected by DingChangjie
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants