Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YOLO pipeline for 2d MS lesion detection #10

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

cspino
Copy link
Collaborator

@cspino cspino commented Mar 31, 2024

Code base for training and validating a yolov8 model for MS lesion detection.

File description

Complete pipeline (in the order they should be called)

1- train_test_val_from_BIDS.py

  • Takes all scans from the canproco database (path given as input) that have a *lesion-manual.nii.gz file
  • Splits them into 3 datasets (train, test, val) according to the specified proportions
  • Saves the 3 lists of filenames in a json file
    Example:
    {"train": ["sub-tor127_ses-M12_PSIR", "sub-tor077_ses-M0_PSIR"],
    "test": ["sub-van211_ses-M0_PSIR"],
    "val": ["sub-tor017_ses-M0_PSIR"]}

13 - complete_pre_process.py

  • Includes all steps to get the yolo dataset ready for training

  • Calls scripts 2, 3 and 4:

    2- sc_seg_from_list.py

    • Takes the list generated by [1], and for each image:
      • Checks if the corresponding spinal cord segmentation file (*_seg-manual.nii.gz) is present
      • If not, creates it using the spinal cord toolbox
    • sct returns an error with a few of the images
      • A new json file is created without those images

    3- make_yolo_dataset.py

    • Takes the list generated by [2] (or [1] if there were no errors during [2]) and creates the YOLO formatted dataset (as described here: Data pre-processing for YOLO detection model #3 )
      • Extracts ground truth bounding box coordinates from segmentation file and saves them to txt files
      • Saves every slice that contains part of the spinal cord as a png (by checking *_seg-manual.nii.gz file)
      • Saves a yml file that points to the train, test, val folders (this is the file that has to be given to the ultralytics library to train or predict)

    4- modify_unlabeled_proportion.py

    • Was added to test the effect of having fewer images in the training set that don't contain any lesions (i.e. unlabeled slices)
    • Takes the path to the YOLO dataset generated by [3] as input
    • Creates a new dataset with the specified ratio of unlabeled slices

5- yolo_training.py

6- yolo_inference.py

  • Takes the path to a folder of png images (either 'val' or 'test' folder generated by [3] or [4]) and the model saved by [5] as input
  • Predicts the lesion bounding boxes for each image in the given folder
  • Predicted boxes are saved in a txt file for each slice. Format is the same as the ground truth txt files generated by [3]
  • Has the option of also saving confidence values for each predicted box in txt files

7 - validation.py

  • Takes the folder of predictions (generated with [6]) and the folder of ground truth boxes (generated with [3] or [4])
  • Both ground truth and prediction boxes are processed:
  • Generates metrics (saved to csv file)
    • Counts the number of true positives, false positives and false negatives with the specified IoU threshold
    • Calculates recall and precision

Other files

8- data_utils.py

  • Utilities used for pre-processing

11- test_data_utils.py

  • Unit tests for functions used in pre-processing

10- test_validation.py

  • Unit tests for functions used in validation

11- yolo_hyperparameter_tune.py

  • Takes the path to the yaml file created by [3] or [4] and performs a hyperparameter search
  • Doesn't currently take a config file as input, params need to be modified directly in script
  • Uses ray tune
  • Progress (and results) can be tracked using wandb -- these steps can be followed to log into account: https://docs.ultralytics.com/integrations/weights-biases/#configuring-weights-biases
  • I wasn't able to install ray tune in the same environment that I'm using for everything else (requirements.txt) because of conflicts so I had to set up a separate env

12- PR_curve.py

  • Generates a Precision-Recall curve and PR-AUC from inference results
  • Given predictions must be generated with [6] with the "-k" option (to save confidence values of predictions) and a low confidence threshold (0.01)

What needs to be improved

  • Move post-processing steps (merging to get a set of boxes per volume) from 'yolo_inference.py' to 'validation.py' as discussed above
  • 3D validation as discussed here Validation of YOLO detection results #11
    • Show boxes on relevant slices only
    • Calculate IoU in 3D
    • Only merge boxes on consecutive slices
  • Add a way to generate PR curve and PR-AUC
  • More unit tests need to be added
  • Reduce the number of manual steps: either merge scripts together or create larger scripts that call the current scripts
    • Pre-processing scripts 2, 3 and 4 could be merged into one
    • I decided to leave training, inference and validation as separate scripts
  • Have training and hyperparameter search scripts take a config file as input (I've juste been directly modifying the scripts)
  • I had trouble managing the dependencies, so currently the packages needed for the hyperparameter search (ray[tune] and wandb) are not included in the requirements file --> I wasn't able to fix this

@cspino cspino self-assigned this Mar 31, 2024
@plbenveniste plbenveniste marked this pull request as ready for review June 12, 2024 15:16
@plbenveniste
Copy link
Collaborator

@jcohenadad Should we merge this PR ?

The code looks very clean, and it would be in a subfolder in the repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants