-
Notifications
You must be signed in to change notification settings - Fork 666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto3DSeg in Azure fails - new to Auto3DSeg #1554
Comments
Hi @rfrs, from the error message, It looks like your data wasn't prepared quite correctly, you could refer to the " Hope it helps, thanks! |
Dear @KumoLiu, thanks for the prompt answer. Do you, for example, the file (images and labels) paths in the jason file? Best |
|
Dear @KumoLiu
The What is wrong there? Best |
Hi @rfrs, the key in the JSON should be "image" instead of "imageTesting".
|
Thanks @KumoLiu for your prompt answer. It was a rookie mistake :( With the change the data was found but another problem appeared... the error log is as follows:
Any further suggestions? Thanks a lot. |
Also, is there a way to specify epoch number in the yaml file for example? Best |
Can you share the JSON and YAML files you're using to train Auto3D? How many folds did you create? |
Hi @diazandr3s, thanks for message. I added the json and the yaml files in the zipped folder. Also, i was trying to do fold 0, so one fold only, thus training all the data in one go. I was doing the same with nnUNet. Could this be the problem as well? Also, i ask again, can i set from the beginning the number of epochs the training will take? Thanks for all the help. Best Rui |
Hi @rfrs,
I strongly recommend that you use the fake data and follow this tutorial to run through the process first, understand auto3dseg, and then carry out your own tasks. |
Hi @rfrs, Thanks for sending the files. |
Dear @diazandr3s and @KumoLiu thank you so much for your support so far. Following the tutorial and also having a minimum of 2 folds sorted the issues, although i am still struggling with errors at the essembling step: not enough vRAM despite having a A100 80GB GPU in azure. Two questions:
Thank you for all. Best wishes. |
Hi @rfrs, Thanks for the update.
Do you mean for inference or training?
The number of epochs and other hyperparameters are updated/changed after the data analysis. As it says here:
Yes, you can specify a different work_dir path. Here is an example of how you can use Auto3D with a single backbone network (segresnet) and a specific work dir:
Hope this helps, |
Dear @diazandr3s, thank you so much. It worked for me and i could set the work_dir and also epoch number. Auto3DSeg is working for me in Azure and i could generate models. One more question. Thanks |
I have been following the tutorial here https://github.com/Project-MONAI/MONAILabel/tree/main/sample-apps#radiology and created the files, yet, i get the error message stating |
Hi @rfrs, This is a very good question! Another way of consuming one model in MONAI Label is to modify the Radiology app - Segmentation model. For this, you should update the label names and indexes, network architecture and pre-transforms. Please give it a try. Otherwise, I'd suggest we move this conversation (consuming Bundle generated models in Auto3D in MONAI Label) to the MONAI Core repo. I hope this makes sense. |
Dear @diazandr3s, thanks for the reply. How can we move this discussion into the MONAI Core repo? |
Hi @rfrs, I'd suggest we start a discussion in the MONAI Core repo with a title like: Once you created the discussion, please link this conversation so others can also comment there. |
Dear @diazandr3s and @KumoLiu , once more i have issues when running Auto3DSeg on Azure/Cloud setting.
So far, in all MONAI/Auto3DSeg tests i never had multiprocessing errors, so i am really not sure what is wrong... Any help is greatly appreciated. Thanks for all. Best wishes |
Hi @rfrs, This is strange.
Please try this and let us know, |
Dear @diazandr3s i apologise for the delayed answer. In the meantime i do have another issue. The error is as follows:
So it seems there is an problem with the train.py script ... but i have never encountered such when training in a compute instance. Would you have some suggestions? Thank you for all. Best wishes |
Hi @rfrs, How many GPUs are you using here? Are they interconnected? What are the cluster specs? Please try running the AutoRunner instead. As it is explained here: https://github.com/Project-MONAI/tutorials/tree/main/auto3dseg#1-run-with-minimal-input-using-autorunner Let us know the output. I'd like to understand why you're getting this error. |
Hi @diazandr3s, I am having a similar problem related to this thread. I am using autoseg in a shared cluster environment with access to either v100 or a100 GPUs with 80gb (I’ve tried both). I was able to run the hello_world example without an issue, which is great! I then tried to run similar code using the Task_04_Prostate data as a real world example, but the code failed as the gpu ran out of memory. The image file sizes from the hello world example are quite small compared to the images in the prostate data (I suspect this is because the helloworld images are primarily empty). On the monai tutorial it looks like it should be possible to run on a single gpu. Any ideas why this might be happening? |
Hi @barrettfletcher, usually the datasets come in the compressed .nii.gz format, and are uncompressed during GPU-loading. Always count with at least 2-3x more vRAM needed than the dataset file size. Also, depending on the dataset pixel scale, the smaller the voxel the more vRAM will be needed for compuation. We can downscale the voxel size or also load smaller patches. Cheers |
Hi @rfrs,
I would like to try loading smaller patches if you think that will fix this issue. I've looked around the documentation for Autoseg but I can't find where the patch size is defined. Do you know where that might be? Cheers, |
Hi @barrettfletcher, Sorry for the late reply. Also, how many images are you using for training? Full log helps to better understand the issue. Can you share the full log? |
Dear all, i am starting to use Auto3DSeg to develop a segmentation model for vertebrae from CT images. I am using a A100 GPU in an Azure environment.
I installed pytorch via conda and installed MONAI via the git repository
git clone https://github.com/Project-MONAI/MONAI.git
. I created the yaml file and the files have been properly assigned. Yet i get the following (long) error when i try to run the pipeline.Your help in troubleshooting this is very much appreciated, especially since i am new to MONAI/Auto3DSeg.
Thanks for all.
Best
Rui
The text was updated successfully, but these errors were encountered: