-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Project] Medical semantic seg dataset: Pcam (#2684)
- Loading branch information
1 parent
942b054
commit 30e3b49
Showing
9 changed files
with
345 additions
and
1 deletion.
There are no files selected for viewing
153 changes: 153 additions & 0 deletions
153
projects/medical/2d_image/histopathology/pcam/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,153 @@ | ||
# PCam (PatchCamelyon) | ||
|
||
## Description | ||
|
||
This project supports **`Patch Camelyon (PCam) `**, which can be downloaded from [here](https://opendatalab.com/PCam). | ||
|
||
### Dataset Overview | ||
|
||
PatchCamelyon is an image classification dataset. It consists of 327680 color images (96 x 96px) extracted from histopathologic scans of lymph node sections. Each image is annotated with a binary label indicating presence of metastatic tissue. PCam provides a new benchmark for machine learning models: bigger than CIFAR10, smaller than ImageNet, trainable on a single GPU. | ||
|
||
### Statistic Information | ||
|
||
| Dataset Name | Anatomical Region | Task Type | Modality | Num. Classes | Train/Val/Test images | Train/Val/Test Labeled | Release Date | License | | ||
| ------------------------------------ | ----------------- | ------------ | -------------- | ------------ | --------------------- | ---------------------- | ------------ | ------------------------------------------------------------- | | ||
| [Pcam](https://opendatalab.com/PCam) | throax | segmentation | histopathology | 2 | 327680/-/- | yes/-/- | 2018 | [CC0 1.0](https://creativecommons.org/publicdomain/zero/1.0/) | | ||
|
||
| Class Name | Num. Train | Pct. Train | Num. Val | Pct. Val | Num. Test | Pct. Test | | ||
| :---------------: | :--------: | :--------: | :------: | :------: | :-------: | :-------: | | ||
| background | 214849 | 63.77 | - | - | - | - | | ||
| metastatic tissue | 131832 | 36.22 | - | - | - | - | | ||
|
||
Note: | ||
|
||
- `Pct` means percentage of pixels in this category in all pixels. | ||
|
||
### Visualization | ||
|
||
![pcam](https://raw.githubusercontent.com/uni-medical/medical-datasets-visualization/main/2d/semantic_seg/histopathology/pcam/pcam_dataset.png?raw=true) | ||
|
||
### Dataset Citation | ||
|
||
``` | ||
@inproceedings{veeling2018rotation, | ||
title={Rotation equivariant CNNs for digital pathology}, | ||
author={Veeling, Bastiaan S and Linmans, Jasper and Winkens, Jim and Cohen, Taco and Welling, Max}, | ||
booktitle={International Conference on Medical image computing and computer-assisted intervention}, | ||
pages={210--218}, | ||
year={2018}, | ||
} | ||
``` | ||
|
||
### Prerequisites | ||
|
||
- Python v3.8 | ||
- PyTorch v1.10.0 | ||
- pillow(PIL) v9.3.0 9.3.0 | ||
- scikit-learn(sklearn) v1.2.0 1.2.0 | ||
- [MIM](https://github.com/open-mmlab/mim) v0.3.4 | ||
- [MMCV](https://github.com/open-mmlab/mmcv) v2.0.0rc4 | ||
- [MMEngine](https://github.com/open-mmlab/mmengine) v0.2.0 or higher | ||
- [MMSegmentation](https://github.com/open-mmlab/mmsegmentation) v1.0.0rc5 | ||
|
||
All the commands below rely on the correct configuration of `PYTHONPATH`, which should point to the project's directory so that Python can locate the module files. In `pcam/` root directory, run the following line to add the current directory to `PYTHONPATH`: | ||
|
||
```shell | ||
export PYTHONPATH=`pwd`:$PYTHONPATH | ||
``` | ||
|
||
### Dataset Preparing | ||
|
||
- download dataset from [here](https://opendatalab.com/PCam) and decompress data to path `'data/'`. | ||
- run script `"python tools/prepare_dataset.py"` to format data and change folder structure as below. | ||
- run script `"python ../../tools/split_seg_dataset.py"` to split dataset and generate `train.txt`, `val.txt` and `test.txt`. If the label of official validation set and test set cannot be obtained, we generate `train.txt` and `val.txt` from the training set randomly. | ||
|
||
```shell | ||
mkdir data & cd data | ||
pip install opendatalab | ||
odl get PCam | ||
mv ./PCam/raw/pcamv1 ./ | ||
rm -rf PCam | ||
cd .. | ||
python tools/prepare_dataset.py | ||
python ../../tools/split_seg_dataset.py | ||
``` | ||
|
||
```none | ||
mmsegmentation | ||
├── mmseg | ||
├── projects | ||
│ ├── medical | ||
│ │ ├── 2d_image | ||
│ │ │ ├── histopathology | ||
│ │ │ │ ├── pcam | ||
│ │ │ │ │ ├── configs | ||
│ │ │ │ │ ├── datasets | ||
│ │ │ │ │ ├── tools | ||
│ │ │ │ │ ├── data | ||
│ │ │ │ │ │ ├── train.txt | ||
│ │ │ │ │ │ ├── val.txt | ||
│ │ │ │ │ │ ├── images | ||
│ │ │ │ │ │ │ ├── train | ||
│ │ │ │ | │ │ │ ├── xxx.png | ||
│ │ │ │ | │ │ │ ├── ... | ||
│ │ │ │ | │ │ │ └── xxx.png | ||
│ │ │ │ │ │ ├── masks | ||
│ │ │ │ │ │ │ ├── train | ||
│ │ │ │ | │ │ │ ├── xxx.png | ||
│ │ │ │ | │ │ │ ├── ... | ||
│ │ │ │ | │ │ │ └── xxx.png | ||
``` | ||
|
||
### Divided Dataset Information | ||
|
||
***Note: The table information below is divided by ourselves.*** | ||
|
||
| Class Name | Num. Train | Pct. Train | Num. Val | Pct. Val | Num. Test | Pct. Test | | ||
| :---------------: | :--------: | :--------: | :------: | :------: | :-------: | :-------: | | ||
| background | 171948 | 63.82 | 42901 | 63.6 | - | - | | ||
| metastatic tissue | 105371 | 36.18 | 26461 | 36.4 | - | - | | ||
|
||
### Training commands | ||
|
||
To train models on a single server with one GPU. (default) | ||
|
||
```shell | ||
mim train mmseg ./configs/${CONFIG_FILE} | ||
``` | ||
|
||
### Testing commands | ||
|
||
To test models on a single server with one GPU. (default) | ||
|
||
```shell | ||
mim test mmseg ./configs/${CONFIG_FILE} --checkpoint ${CHECKPOINT_PATH} | ||
``` | ||
|
||
<!-- List the results as usually done in other model's README. [Example](https://github.com/open-mmlab/mmsegmentation/tree/dev-1.x/configs/fcn#results-and-models) | ||
You should claim whether this is based on the pre-trained weights, which are converted from the official release; or it's a reproduced result obtained from retraining the model in this project. --> | ||
|
||
## Checklist | ||
|
||
- [x] Milestone 1: PR-ready, and acceptable to be one of the `projects/`. | ||
|
||
- [x] Finish the code | ||
- [x] Basic docstrings & proper citation | ||
- [ ] Test-time correctness | ||
- [x] A full README | ||
|
||
- [ ] Milestone 2: Indicates a successful model implementation. | ||
|
||
- [ ] Training-time correctness | ||
|
||
- [ ] Milestone 3: Good to be a part of our core package! | ||
|
||
- [ ] Type hints and docstrings | ||
- [ ] Unit tests | ||
- [ ] Code polishing | ||
- [ ] Metafile.yml | ||
|
||
- [ ] Move your modules into the core package following the codebase's file hierarchy structure. | ||
|
||
- [ ] Refactor your modules into the core package following the codebase's file hierarchy structure. |
17 changes: 17 additions & 0 deletions
17
...d_image/histopathology/pcam/configs/fcn-unet-s5-d16_unet_1xb16-0.0001-20k_pcam-512x512.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
_base_ = [ | ||
'mmseg::_base_/models/fcn_unet_s5-d16.py', './pcam_512x512.py', | ||
'mmseg::_base_/default_runtime.py', | ||
'mmseg::_base_/schedules/schedule_20k.py' | ||
] | ||
custom_imports = dict(imports='datasets.pcam_dataset') | ||
img_scale = (512, 512) | ||
data_preprocessor = dict(size=img_scale) | ||
optimizer = dict(lr=0.0001) | ||
optim_wrapper = dict(optimizer=optimizer) | ||
model = dict( | ||
data_preprocessor=data_preprocessor, | ||
decode_head=dict(num_classes=2), | ||
auxiliary_head=None, | ||
test_cfg=dict(mode='whole', _delete_=True)) | ||
vis_backends = None | ||
visualizer = dict(vis_backends=vis_backends) |
17 changes: 17 additions & 0 deletions
17
...2d_image/histopathology/pcam/configs/fcn-unet-s5-d16_unet_1xb16-0.001-20k_pcam-512x512.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
_base_ = [ | ||
'mmseg::_base_/models/fcn_unet_s5-d16.py', './pcam_512x512.py', | ||
'mmseg::_base_/default_runtime.py', | ||
'mmseg::_base_/schedules/schedule_20k.py' | ||
] | ||
custom_imports = dict(imports='datasets.pcam_dataset') | ||
img_scale = (512, 512) | ||
data_preprocessor = dict(size=img_scale) | ||
optimizer = dict(lr=0.001) | ||
optim_wrapper = dict(optimizer=optimizer) | ||
model = dict( | ||
data_preprocessor=data_preprocessor, | ||
decode_head=dict(num_classes=2), | ||
auxiliary_head=None, | ||
test_cfg=dict(mode='whole', _delete_=True)) | ||
vis_backends = None | ||
visualizer = dict(vis_backends=vis_backends) |
17 changes: 17 additions & 0 deletions
17
.../2d_image/histopathology/pcam/configs/fcn-unet-s5-d16_unet_1xb16-0.01-20k_pcam-512x512.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
_base_ = [ | ||
'mmseg::_base_/models/fcn_unet_s5-d16.py', './pcam_512x512.py', | ||
'mmseg::_base_/default_runtime.py', | ||
'mmseg::_base_/schedules/schedule_20k.py' | ||
] | ||
custom_imports = dict(imports='datasets.pcam_dataset') | ||
img_scale = (512, 512) | ||
data_preprocessor = dict(size=img_scale) | ||
optimizer = dict(lr=0.01) | ||
optim_wrapper = dict(optimizer=optimizer) | ||
model = dict( | ||
data_preprocessor=data_preprocessor, | ||
decode_head=dict(num_classes=2), | ||
auxiliary_head=None, | ||
test_cfg=dict(mode='whole', _delete_=True)) | ||
vis_backends = None | ||
visualizer = dict(vis_backends=vis_backends) |
18 changes: 18 additions & 0 deletions
18
...histopathology/pcam/configs/fcn-unet-s5-d16_unet_1xb16-0.01lr-sigmoid-20k_pcam-512x512.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
_base_ = [ | ||
'mmseg::_base_/models/fcn_unet_s5-d16.py', './pcam_512x512.py', | ||
'mmseg::_base_/default_runtime.py', | ||
'mmseg::_base_/schedules/schedule_20k.py' | ||
] | ||
custom_imports = dict(imports='datasets.pcam_dataset') | ||
img_scale = (512, 512) | ||
data_preprocessor = dict(size=img_scale) | ||
optimizer = dict(lr=0.01) | ||
optim_wrapper = dict(optimizer=optimizer) | ||
model = dict( | ||
data_preprocessor=data_preprocessor, | ||
decode_head=dict( | ||
num_classes=2, loss_decode=dict(use_sigmoid=True), out_channels=1), | ||
auxiliary_head=None, | ||
test_cfg=dict(mode='whole', _delete_=True)) | ||
vis_backends = None | ||
visualizer = dict(vis_backends=vis_backends) |
42 changes: 42 additions & 0 deletions
42
projects/medical/2d_image/histopathology/pcam/configs/pcam_512x512.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
dataset_type = 'PCamDataset' | ||
data_root = 'data/' | ||
img_scale = (512, 512) | ||
train_pipeline = [ | ||
dict(type='LoadImageFromFile'), | ||
dict(type='LoadAnnotations'), | ||
dict(type='Resize', scale=img_scale, keep_ratio=False), | ||
dict(type='RandomFlip', prob=0.5), | ||
dict(type='PhotoMetricDistortion'), | ||
dict(type='PackSegInputs') | ||
] | ||
test_pipeline = [ | ||
dict(type='LoadImageFromFile'), | ||
dict(type='Resize', scale=img_scale, keep_ratio=False), | ||
dict(type='LoadAnnotations'), | ||
dict(type='PackSegInputs') | ||
] | ||
train_dataloader = dict( | ||
batch_size=16, | ||
num_workers=4, | ||
persistent_workers=True, | ||
sampler=dict(type='InfiniteSampler', shuffle=True), | ||
dataset=dict( | ||
type=dataset_type, | ||
data_root=data_root, | ||
ann_file='train.txt', | ||
data_prefix=dict(img_path='images/', seg_map_path='masks/'), | ||
pipeline=train_pipeline)) | ||
val_dataloader = dict( | ||
batch_size=1, | ||
num_workers=4, | ||
persistent_workers=True, | ||
sampler=dict(type='DefaultSampler', shuffle=False), | ||
dataset=dict( | ||
type=dataset_type, | ||
data_root=data_root, | ||
ann_file='val.txt', | ||
data_prefix=dict(img_path='images/', seg_map_path='masks/'), | ||
pipeline=test_pipeline)) | ||
test_dataloader = val_dataloader | ||
val_evaluator = dict(type='IoUMetric', iou_metrics=['mIoU', 'mDice']) | ||
test_evaluator = dict(type='IoUMetric', iou_metrics=['mIoU', 'mDice']) |
31 changes: 31 additions & 0 deletions
31
projects/medical/2d_image/histopathology/pcam/datasets/pcam_dataset.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
from mmseg.datasets import BaseSegDataset | ||
from mmseg.registry import DATASETS | ||
|
||
|
||
@DATASETS.register_module() | ||
class PCamDataset(BaseSegDataset): | ||
"""PCamDataset dataset. | ||
In segmentation map annotation for PCamDataset, | ||
0 stands for background, which is included in 2 categories. | ||
``reduce_zero_label`` is fixed to False. The ``img_suffix`` | ||
is fixed to '.png' and ``seg_map_suffix`` is fixed to '.png'. | ||
Args: | ||
img_suffix (str): Suffix of images. Default: '.png' | ||
seg_map_suffix (str): Suffix of segmentation maps. Default: '.png' | ||
reduce_zero_label (bool): Whether to mark label zero as ignored. | ||
Default to False. | ||
""" | ||
METAINFO = dict(classes=('background', 'metastatic tissue')) | ||
|
||
def __init__(self, | ||
img_suffix='.png', | ||
seg_map_suffix='.png', | ||
reduce_zero_label=False, | ||
**kwargs) -> None: | ||
super().__init__( | ||
img_suffix=img_suffix, | ||
seg_map_suffix=seg_map_suffix, | ||
reduce_zero_label=reduce_zero_label, | ||
**kwargs) |
49 changes: 49 additions & 0 deletions
49
projects/medical/2d_image/histopathology/pcam/tools/prepare_dataset.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
import os | ||
|
||
import h5py | ||
import numpy as np | ||
from PIL import Image | ||
|
||
root_path = 'data/' | ||
|
||
tgt_img_train_dir = os.path.join(root_path, 'images/train/') | ||
tgt_mask_train_dir = os.path.join(root_path, 'masks/train/') | ||
tgt_img_val_dir = os.path.join(root_path, 'images/val/') | ||
tgt_img_test_dir = os.path.join(root_path, 'images/test/') | ||
|
||
os.system('mkdir -p ' + tgt_img_train_dir) | ||
os.system('mkdir -p ' + tgt_mask_train_dir) | ||
os.system('mkdir -p ' + tgt_img_val_dir) | ||
os.system('mkdir -p ' + tgt_img_test_dir) | ||
|
||
|
||
def extract_pics_from_h5(h5_path, h5_key, save_dir): | ||
f = h5py.File(h5_path, 'r') | ||
for i, img in enumerate(f[h5_key]): | ||
img = img.astype(np.uint8).squeeze() | ||
img = Image.fromarray(img) | ||
save_image_path = os.path.join(save_dir, str(i).zfill(8) + '.png') | ||
img.save(save_image_path) | ||
|
||
|
||
if __name__ == '__main__': | ||
|
||
extract_pics_from_h5( | ||
'data/pcamv1/camelyonpatch_level_2_split_train_x.h5', | ||
h5_key='x', | ||
save_dir=tgt_img_train_dir) | ||
|
||
extract_pics_from_h5( | ||
'data/pcamv1/camelyonpatch_level_2_split_valid_x.h5', | ||
h5_key='x', | ||
save_dir=tgt_img_val_dir) | ||
|
||
extract_pics_from_h5( | ||
'data/pcamv1/camelyonpatch_level_2_split_test_x.h5', | ||
h5_key='x', | ||
save_dir=tgt_img_test_dir) | ||
|
||
extract_pics_from_h5( | ||
'data/pcamv1/camelyonpatch_level_2_split_train_mask.h5', | ||
h5_key='mask', | ||
save_dir=tgt_mask_train_dir) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters