-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training of an nnUNet model #27
Comments
To install: conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r nnunet/requirements.txt
pip install --upgrade git+https://github.com/FabianIsensee/hiddenlayer.git The MSD dataset was converted to the nnUNet format using the following command: python nnunet/convert_msd_to_nnunet.py --input ~/net/ms-lesion-agnostic/msd_data/dataset_2024-07-24_seed42_lesionOnly.json -o ~/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_raw/ --tasknumber 101 The environment variables were set export nnUNet_raw="/home/plbenveniste/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_raw"
export nnUNet_results="/home/plbenveniste/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_results"
export nnUNet_preprocessed="/home/plbenveniste/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_preprocessed" The nnUNet raw data was preprocessed: nnUNetv2_plan_and_preprocess -d 101 --verify_dataset_integrity -c 3d_fullres I got a message saying: ...
Warning: Direction mismatch between segmentation and corresponding images.
Direction images: (0.9998246020407097, -0.002916572623419852, 0.01850023567035481, 0.0008972689266514793, -0.9792064375432301, -0.2028643810866037, -0.018707219531766597, -0.2028453752936667, 0.9790320649327531).
Direction seg: (0.999824551454073, -0.002916848812073564, 0.01850292634995937, 0.0008975451891515451, -0.9792064339919263, -0.20286437942668814, -0.01870990973509375, -0.20284538846575453, 0.9790320144286973).
Image files: ['/home/plbenveniste/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_raw/Dataset101_msLesionAgnostic/imagesTr/msLesionAgnostic_640_0000.nii.gz'].
Seg file: /home/plbenveniste/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_raw/Dataset101_msLesionAgnostic/labelsTr/msLesionAgnostic_640.nii.gz
Traceback (most recent call last):
File "/home/plbenveniste/miniconda3/envs/venv_nnunet/bin/nnUNetv2_plan_and_preprocess", line 8, in <module>
sys.exit(plan_and_preprocess_entry())
File "/home/plbenveniste/miniconda3/envs/venv_nnunet/lib/python3.9/site-packages/nnunetv2/experiment_planning/plan_and_preprocess_entrypoints.py", line 184, in plan_and_preprocess_entry
extract_fingerprints(args.d, args.fpe, args.npfp, args.verify_dataset_integrity, args.clean, args.verbose)
File "/home/plbenveniste/miniconda3/envs/venv_nnunet/lib/python3.9/site-packages/nnunetv2/experiment_planning/plan_and_preprocess_api.py", line 47, in extract_fingerprints
extract_fingerprint_dataset(d, fingerprint_extractor_class, num_processes, check_dataset_integrity, clean,
File "/home/plbenveniste/miniconda3/envs/venv_nnunet/lib/python3.9/site-packages/nnunetv2/experiment_planning/plan_and_preprocess_api.py", line 30, in extract_fingerprint_dataset
verify_dataset_integrity(join(nnUNet_raw, dataset_name), num_processes)
File "/home/plbenveniste/miniconda3/envs/venv_nnunet/lib/python3.9/site-packages/nnunetv2/experiment_planning/verify_dataset_integrity.py", line 220, in verify_dataset_integrity
raise RuntimeError(
RuntimeError: Some images have errors. Please check text output above to see which one(s) and what's going on. I need to deal with this when doing the dataset conversion (using |
I experienced something similar in the past. If you are sure that your images and labels are in the same space, which is something I would expect (you can check this, for example, by opening them in FSLeyes and check if you get a red warning in the left-down corner), then you can fix ITK direction using this script: https://gist.github.com/valosekj/a03195d9060b0e164faff95102129feb Alternatively, you can maybe try to change "overwrite_image_reader_writer" to |
Thanks @valosekj for your message. Surprisingly, I still had a dimension mismatch for some images as we can see below, but it didn't cause any problem. Warning messagesDirection images: (0.9995568285353207, -4.2305939833177105e-05, 0.029768181674008378, 0.0023210087268018707, 0.9970655078825915, -0.07651788804874493, -0.02967759166276047, 0.07655306134333943, 0.9966237344998277).
Direction seg: (0.999557527771378, 0.0011393518202445663, 0.029722897004697196, 0.001139351755611898, 0.9970662055766669, -0.07653550708132914, -0.029722896109700924, 0.0765355053186275, 0.9966237331859261).
Image files: ['/home/plbenveniste/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_raw/Dataset201_msLesionAgnostic/imagesTr/msLesionAgnostic_1371_0000.nii.gz'].
Seg file: /home/plbenveniste/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_raw/Dataset201_msLesionAgnostic/labelsTr/msLesionAgnostic_1371.nii.gz
Warning: Direction mismatch between segmentation and corresponding images.
Direction images: (0.9998727930631436, -0.01594979726329166, 4.048400076738005e-05, -0.015482158666859506, -0.9699406718540176, 0.24284850674579794, 0.003834117254632673, 0.24281823871084504, 0.9700642252615967).
Direction seg: (0.9998746193770184, -0.015715991708088715, 0.001937302249608592, -0.01571599139066678, -0.9699406460725575, 0.24283357955510604, 0.001937302317948398, 0.24283358643519834, 0.9700660284230386).
Image files: ['/home/plbenveniste/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_raw/Dataset201_msLesionAgnostic/imagesTr/msLesionAgnostic_1184_0000.nii.gz'].
Seg file: /home/plbenveniste/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_raw/Dataset201_msLesionAgnostic/labelsTr/msLesionAgnostic_1184.nii.gz
Warning: Direction mismatch between segmentation and corresponding images.
Direction images: (0.9999650507186807, -3.2498976463678855e-05, -0.008360399655654377, -0.0019455056336788246, 0.9716359354515495, -0.23647371604735118, 0.008130950068450082, 0.23648172843373924, 0.9716019171123297).
Direction seg: (0.9999655147443567, -0.0009890025352992376, -0.008245677081396018, -0.0009890025711708239, 0.9716363921293472, -0.2364777896768553, 0.008245677411101177, 0.2364777861953439, 0.9716019060289138).
Image files: ['/home/plbenveniste/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_raw/Dataset201_msLesionAgnostic/imagesTr/msLesionAgnostic_1341_0000.nii.gz'].
Seg file: /home/plbenveniste/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_raw/Dataset201_msLesionAgnostic/labelsTr/msLesionAgnostic_1341.nii.gz
Warning: Direction mismatch between segmentation and corresponding images.
Direction images: (0.9999461782630069, 1.4256992099983761e-05, 0.010374988239862817, 0.0021222265209242816, 0.9785737025935498, -0.20588591831855255, -0.010155625623308788, 0.2058968391915934, 0.9785210004170319).
Direction seg: (0.9999467397067836, 0.0010682421584383053, 0.01026531114786726, 0.0010682421705256642, 0.9785742550834928, -0.20589144819860944, -0.01026531092755063, 0.20589144262566617, 0.9785209936152197).
Image files: ['/home/plbenveniste/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_raw/Dataset201_msLesionAgnostic/imagesTr/msLesionAgnostic_1349_0000.nii.gz'].
Seg file: /home/plbenveniste/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_raw/Dataset201_msLesionAgnostic/labelsTr/msLesionAgnostic_1349.nii.gz
Warning: Direction mismatch between segmentation and corresponding images.
Direction images: (0.9998737875777984, -0.015846944828360025, 0.0011327979091779968, -0.01587049846636809, -0.9995553321726754, 0.025244112225836113, -0.0007322521810542541, 0.025258904663096143, 0.9996806747991213).
Direction seg: (0.9998742224498159, -0.01585872568560471, 0.00020027293067782567, -0.015858725394327734, -0.9995553321235096, 0.025251513487364926, 0.00020027292187368913, 0.025251511699089922, 0.9996811096331424).
Image files: ['/home/plbenveniste/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_raw/Dataset201_msLesionAgnostic/imagesTr/msLesionAgnostic_1199_0000.nii.gz'].
Seg file: /home/plbenveniste/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_raw/Dataset201_msLesionAgnostic/labelsTr/msLesionAgnostic_1199.nii.gz
Warning: Direction mismatch between segmentation and corresponding images.
Direction images: (0.999964081679984, 4.2173231885602473e-05, -0.00847546859402508, -0.001320598932424257, 0.9885497176643242, -0.15088973555037227, 0.008372058788976455, 0.15089550665014348, 0.9885142660263687).
Direction seg: (0.9999643151625024, -0.0006392129841412547, -0.008423765271167316, -0.000639212927685442, 0.9885499483048401, -0.15089264395156346, 0.008423764504104498, 0.1508926476448042, 0.9885142640245324).
Image files: ['/home/plbenveniste/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_raw/Dataset201_msLesionAgnostic/imagesTr/msLesionAgnostic_1353_0000.nii.gz'].
Seg file: /home/plbenveniste/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_raw/Dataset201_msLesionAgnostic/labelsTr/msLesionAgnostic_1353.nii.gz
Warning: Direction mismatch between segmentation and corresponding images.
Direction images: (0.999967560007628, 2.4250750962094284e-05, -0.008054708809685418, -0.003024104554356132, 0.9279712714617492, -0.3726394708944929, 0.007465502262755987, 0.37265173923324924, 0.9279412408107873).
Direction seg: (0.9999687649609699, -0.001499927893738001, -0.007760110282334256, -0.0014999279201193896, 0.9279724411610171, -0.37264582077690295, 0.007760110738075229, 0.3726458085930611, 0.9279412012330923).
Image files: ['/home/plbenveniste/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_raw/Dataset201_msLesionAgnostic/imagesTr/msLesionAgnostic_2180_0000.nii.gz'].
Seg file: /home/plbenveniste/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_raw/Dataset201_msLesionAgnostic/labelsTr/msLesionAgnostic_2180.nii.gz
Warning: Direction mismatch between segmentation and corresponding images.
Direction images: (0.9999675164049507, 2.735942794877295e-05, -0.008060110927302988, -0.0030159611672683187, 0.9286202535678794, -0.3710192836777322, 0.007474631305437654, 0.3710315672756014, 0.928590181808478).
Direction seg: (0.9999687169716581, -0.0014943018706426896, -0.007767376468637484, -0.0014943019824960953, 0.9286214214917144, -0.3710256655337067, 0.007767376625409159, 0.37102563605312655, 0.9285901266856461).
Image files: ['/home/plbenveniste/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_raw/Dataset201_msLesionAgnostic/imagesTr/msLesionAgnostic_2183_0000.nii.gz'].
Seg file: /home/plbenveniste/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_raw/Dataset201_msLesionAgnostic/labelsTr/msLesionAgnostic_2183.nii.gz The dataset is being preprocessed using: nnUNetv2_plan_and_preprocess -d 201 --verify_dataset_integrity -c 3d_fullres 2d |
The model is now training on kronos and koios using:
CUDA_VISIBLE_DEVICES=2 nnUNetv2_train 201 3d_fullres 0 && CUDA_VISIBLE_DEVICES=2 nnUNetv2_train 201 3d_fullres 1
CUDA_VISIBLE_DEVICES=3 nnUNetv2_train 201 3d_fullres 2 && CUDA_VISIBLE_DEVICES=3 nnUNetv2_train 201 3d_fullres 3
CUDA_VISIBLE_DEVICES=1 nnUNetv2_train 201 2d 0 && CUDA_VISIBLE_DEVICES=1 nnUNetv2_train 201 3d_fullres 4 |
I was faced with this issue: Error message2024-07-31 14:20:25.881668: unpacking dataset...
Error when checking /home/plbenveniste/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_preprocessed/Dataset201_msLesionAgnostic/nnUNetPlans_3d_fullres/msLesionAgnostic_793.npy and /home/plbenveniste/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_preprocessed/Dataset201_msLesionAgnostic/nnUNetPlans_3d_fullres/msLesionAgnostic_793_seg.npy, fixing...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/plbenveniste/miniconda3/envs/venv_nnunet/lib/python3.9/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/home/plbenveniste/miniconda3/envs/venv_nnunet/lib/python3.9/multiprocessing/pool.py", line 51, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "/home/plbenveniste/miniconda3/envs/venv_nnunet/lib/python3.9/site-packages/nnunetv2/training/dataloading/utils.py", line 40, in _convert_to_npy
np.load(seg_npy, mmap_mode='r')
File "/home/plbenveniste/miniconda3/envs/venv_nnunet/lib/python3.9/site-packages/numpy/lib/_npyio_impl.py", line 464, in load
raise EOFError("No data left in file")
EOFError: No data left in file
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/plbenveniste/miniconda3/envs/venv_nnunet/bin/nnUNetv2_train", line 8, in <module>
sys.exit(run_training_entry())
File "/home/plbenveniste/miniconda3/envs/venv_nnunet/lib/python3.9/site-packages/nnunetv2/run/run_training.py", line 274, in run_training_entry
run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights,
File "/home/plbenveniste/miniconda3/envs/venv_nnunet/lib/python3.9/site-packages/nnunetv2/run/run_training.py", line 210, in run_training
nnunet_trainer.run_training()
File "/home/plbenveniste/miniconda3/envs/venv_nnunet/lib/python3.9/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1287, in run_training
self.on_train_start()
File "/home/plbenveniste/miniconda3/envs/venv_nnunet/lib/python3.9/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 847, in on_train_start
unpack_dataset(self.preprocessed_dataset_folder, unpack_segmentation=True, overwrite_existing=False,
File "/home/plbenveniste/miniconda3/envs/venv_nnunet/lib/python3.9/site-packages/nnunetv2/training/dataloading/utils.py", line 66, in unpack_dataset
p.starmap(_convert_to_npy, zip(npz_files,
File "/home/plbenveniste/miniconda3/envs/venv_nnunet/lib/python3.9/multiprocessing/pool.py", line 372, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File "/home/plbenveniste/miniconda3/envs/venv_nnunet/lib/python3.9/multiprocessing/pool.py", line 771, in get
raise self._value
EOFError: No data left in file
Exception in thread Thread-1:
Traceback (most recent call last):
File "/home/plbenveniste/miniconda3/envs/venv_nnunet/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/home/plbenveniste/miniconda3/envs/venv_nnunet/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/home/plbenveniste/miniconda3/envs/venv_nnunet/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 125, in results_loop
raise e
File "/home/plbenveniste/miniconda3/envs/venv_nnunet/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 103, in results_loop
raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the "
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message
Exception in thread Thread-2:
Traceback (most recent call last):
File "/home/plbenveniste/miniconda3/envs/venv_nnunet/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/home/plbenveniste/miniconda3/envs/venv_nnunet/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/home/plbenveniste/miniconda3/envs/venv_nnunet/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 125, in results_loop
raise e
File "/home/plbenveniste/miniconda3/envs/venv_nnunet/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 103, in results_loop
raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the "
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message
Looking into this now. I found the solution in this issue: MIC-DKFZ/nnUNet#441 . I should just deleted all the .npy files and when I run training, I should wait for one of the trainings to have reached GPU stage before launching the others. |
The inference were performed with the 2d model on koios using : CUDA_VISIBLE_DEVICES=1 nnUNetv2_predict -i /home/plbenveniste/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_raw/Dataset201_msLesionAgnostic/imagesTs/ -o /home/plbenveniste/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_results/Dataset201_msLesionAgnostic/nnUNetTrainer__nnUNetPlans__2d/fold_0/test_set -d 201 -c 2d -f 0 -chk checkpoint_best.pth The results were computed using: python nnunet/evaluate_predictions.py -pred-folder ~/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_results/Dataset201_msLesionAgnostic/nnUNetTrainer__nnUNetPlans__2d/fold_0/test_set/ -label-folder ~/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_raw/Dataset201_msLesionAgnostic/labelsTs -image-folder ~/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_raw/Dataset201_msLesionAgnostic/imagesTs/ -conversion-dict ~/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_raw/Dataset201_msLesionAgnostic/conversion_dict.json -output-folder ~/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_results/Dataset201_msLesionAgnostic/nnUNetTrainer__nnUNetPlans__2d/fold_0/test_set/ And the plots were done using: python nnunet/plot_performance.py --pred-dir-path /home/plbenveniste/net/ms-lesion-agnostic/nnunet_experiments/nnUNet_results/Dataset201_msLesionAgnostic/nnUNetTrainer__nnUNetPlans__2d/fold_0/test_set/ --data-json-path /home/plbenveniste/net/ms-lesion-agnostic/msd_data/dataset_2024-07-24_seed42_lesionOnly.json --split test |
When comparing the results of the nnUNet compared to the results of the monai Unet (#21 (comment)), it seems that the 2d nnUnet underperforms, but its performances are more regular (lower variance). Also need to look into the results of the 3D model |
In this issue, I detail the exploration of training an nnUNet model.
The code is in branch
plb/new_nnunet
.The script
nnunet/convert_msd_to_nnunet.py
takes the json file of an MSD dataset and converts it to the nnUNet format.I created the associated virtual environment:
conda create venv_nnunet python=3.9
on koios.The text was updated successfully, but these errors were encountered: