Building a public COVID-19 dataset of X-ray and CT scans

In the context of a COVID-19 pandemic, is it crucial to streamline diagnosis. Last year, our team developed Chester, an artificially intelligent (AI) chest X-ray radiology assistant tool that can recognize features such as consolidation, opacity, and edema [Cohen, 2019].

We now wish to build a public database of pneumonia cases with chest X-ray or CT images, specifically COVID-19 cases as well as MERS, SARS, and ARDS. Data will be collected from public sources as well as through agreements with hospitals and physicians with the consent of their patients. 

Our team believes that this database can dramatically improve identification of  COVID-19. Notably, this would provide essential data to train and test our system. Using the images to develop deep learning based models that can identify COVID-19 characteristic pneumonia, we could ultimately offer a free prototype tool on Chester’s existing platform that could be used by physicians worldwide.

Objectives: 

  • Build a public open dataset of chest X-ray and CT images of patients which are suspected positive for COVID-19 or other viral and bacterial pneumonias. 
  • Develop methods to make supervised COVID-19 prognostic predictions from chest X-rays and CT scans.
  • Deploying a prototype of this system using the Chester platform.
  • Conduct continuous retrospective and prospective clinical validation of the AI platform using lab validated COVID-19 cases.

The tasks are as follows using chest X-ray or CT (preference for X-ray) as input to predict these tasks:

  • Healthy vs Pneumonia (prototype already implemented Chester with ~74% AUC, validation study here)
  • Bacterial vs Viral vs COVID-19 Pneumonia (not relevant to do anymore)
  • Survival/severity of patient

An extended writeup is here

The dataset website is here: https://github.com/ieee8023/covid-chestxray-dataset