-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset aggregation #1
Comments
For the training of a first nnUNet model (model called Dataset101_singleClassNnunetMsLesion), I aggregated the lesion segmentation in a json file with the path to each lesion-segmentation file and the corresponding image (therefore, an image can appear more than once if it has multiple segmentation file) into a training dictionnary. It was split 80% for training and 20% for testing. If the image didn't include any segmentation file, it was added to an inference dictionnary. I obtained the following:
This makes for a total of:
Note We are currently manually labelling images in CanProCo, therefore the numbers above are about to change. Furthermore, when running |
Correction of header was done using Output of correction:
This is the issue that deals with the modified data: neuropoly/data-management#301 |
I just remember that we also have a lot of data from UMass (git-annex data : umass-ms-* (3 datasets)) |
I updated the new code to aggregate the following datasets, which are labelled:
The command ran on python ms-lesion-agnostic/monai/1_create_msd_data.py -pd ~/net/ms-lesion-agnostic/data/ -po ~/net/ms-lesion-agnostic/msd_data/ --lesion-only --canproco-exclude canproco/exclude.yml The output is the following: Total number of derivatives in the root directory: 4407
Number of images in train set: 1636
Number of images in validation set: 569
Number of images in test set: 544
Total number of images in the dataset: 2749 The total number of images in the dataset (2749) is different from the total number of derivatives (4407) because we decided to keep only those which have lesions. The output is the following file: |
for now, but maybe in the future it would be desirable to develop a model that also has good specificity (ie: high true negative rate) |
There was an issue in the code when gathering segmentations from python ms-lesion-agnostic/monai/1_create_msd_data.py -pd ~/net/ms-lesion-agnostic/data/ -po ~/net/ms-lesion-agnostic/msd_data/ --lesion-only --canproco-exclude canproco/exclude.yml This is the output of the code: Total number of derivatives in the root directory: 4407
Number of images in train set: 1712
Number of images in validation set: 590
Number of images in test set: 569
Total number of images in the dataset: 2871 |
Here is an issue to describe the aggregation of available datasets.
The dataset which are of interest for this project are:
Labeled datasets
Unlabeled datasets:
The text was updated successfully, but these errors were encountered: