-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check cuda - torch.cuda.get_device_capability #102
Conversation
1. Replace all occurrences of `restore_db` with `restart_db`. A note is added to the documentation to reflect this change. 2. Update changelog for breaking changes of `restart_db` and `make_trainvalidtest_split`.
WTH was I typing?
Only run triton check if cuda is available.
I had to use I initially added this in my PR as well, but I deleted it at the end. Actually, the cupy part also breaks the code on Chicoma head node (I was just compiling the docs, but CPU training will be the same on the head node). I removed my changes mainly because of the logic of the custom kernel part. "Auto" basically means "true" which in turn implies GPU must be available. If we follow this logic, the right thing to do is to set IMO, "auto" should be equal to "false" for CPU case, so that these problems will not show up at all. Not sure if Nick agrees with this. |
@tautomer I don't understand what you mean about cupy breaking too. USE_CUSTOM_KERNELS=True is important for CPU too because there is a numba implementation on the CPU which strongly outperforms pure pytorch. It is true that 'cupy' and 'triton' options should not be available unless there is a GPU. |
I did probably too much to fix this, but it is fixed now. Also merged changes from #101. And documentation updates! And settings functionality updates! I tested this on a machine without cuda in pytorch, on a machine with cuda in pytorch, but no GPU, a machine on with an old, non-triton-compatible GPU, and a machine with a newer, triton-compatible GPU. |
I cannot reproduce this error anymore on Chicoma, but I found the error message from my clipboard
I do not know what setup can lead to this error, but very likely an edge case. I added an extra try/except for this error try:
if not cupy.cuda.is_available():
if torch.cuda.is_available():
warnings.warn("cupy.cuda.is_available() returned False: Custom kernels will fail on GPU tensors.")
except RuntimeError as e:
warnings.warn(f"Cupy encountered a RuntimeError with the message of {e}") Anyway, this error has gone away. Not sure if they changed something during the current DST that fixed the error. |
This is when trying to use cuda 12 for cudatoolkit when the nvidia driver is not up to date for it. The DST did indeed update the cuda driver. I think everything is OK in hippynn. If someone is trying to use cupy and they get cupy errors, there is probably a limit to the kinds of edge cases we can cover. |
I see. This makes sense. I guess we can build a Q&A or Wiki section to cover this kind of issues. People may encounter similar problems but cannot find a clue. |
Only run triton check if cuda is available. This change fixes minor bug with recent changes that broke our cpu-only running of hippynn.