Skip to content

Latest commit

 

History

History
40 lines (33 loc) · 3.05 KB

COVIDx.md

File metadata and controls

40 lines (33 loc) · 3.05 KB

COVIDx Dataset

Update 06/26/2020: Released new dataset with over 14000 CXR images containing 473 COVID-19 train samples. Test dataset remains the same for consistency.
Update 05/13/2020: Released new dataset with 258 COVID-19 train and 100 COVID-19 test samples. There are constantly new xray images being added to covid-chestxray-dataset, Figure1, Actualmed and COVID-19 radiography database so we included train_COVIDx3.txt and test_COVIDx3.txt, which are the xray images we used for training and testing of the CovidNet-CXR3 models.

The current COVIDx dataset is constructed by the following open source chest radiography datasets:

Steps to generate the dataset

  1. Download the datasets listed above
  • git clone https://github.com/ieee8023/covid-chestxray-dataset.git
  • git clone https://github.com/agchung/Figure1-COVID-chestxray-dataset.git
  • git clone https://github.com/agchung/Actualmed-COVID-chestxray-dataset.git
  • go to this link to download the COVID-19 Radiography database. Only the COVID-19 image folder and metadata file is required. The overlaps between covid-chestxray-dataset are handled
  • go to this link to download the RSNA pneumonia dataset
  1. Create a data directory and within the data directory, create a train and test directory
  2. Use create_COVIDx_v3.ipynb to combine the three dataset to create COVIDx. Make sure to remember to change the file paths.
  3. We provide the train and test txt files with patientId, image path and label (normal, pneumonia or COVID-19). The description for each file is explained below:

COVIDx data distribution

Chest radiography images distribution

Type Normal Pneumonia COVID-19 Total
train 7966 5459 473 13898
test 100 100 100 300

Patients distribution

Type Normal Pneumonia COVID-19 Total
train 7966 5444 320 13730
test 100 98 74 272