Breathing Problem Classification using CNN

Overview

In this Project, we use CNN on X-rays and CT scan images of various patients from across the world Cohen, Joseph Paul, et al. "Covid-19 image data collection: Prospective predictions are the future." arXiv preprint arXiv:2006.11988 (2020). to predict various breathing diseases they. A total of 22 breathing issues are there including. We have treated the problem as a multilabel classification problem here as a single patient often have more than one disease.

Datasets

The following dataset has been used for this project : Cohen, Joseph Paul, et al. "Covid-19 image data collection: Prospective predictions are the future." arXiv preprint arXiv:2006.11988 (2020). which can be found here.

Model Architecture

There are 3 models that have been initially finetuned to work directly on the images to detect diseases: regnet_y_3_2gf, efficientnet_v2_s and swin_v2_t. The final classification layer was changed from 1000 neurons to 22 for our task. Their architecture of their final layer after the change is as follows:

RegNet

Before: (fc): Linear(in_features=1512, out_features=1000, bias=True)

After: (fc): Linear(in_features=1512, out_features=22, bias=True)

EfficientNet

Before: (classifier): Sequential(
(0): Dropout(p=0.2, inplace=False)
(1): ReLU()
(2): Linear(in_features=1280, out_features=1000, bias=True)
)

After: (classifier): Sequential(
(0): Dropout(p=0.2, inplace=False)
(1): ReLU()
(2): Linear(in_features=1280, out_features=22, bias=True)
)

Swinv2

Before: (head): Linear(in_features=768, out_features=1000, bias=True)

After: (head): Linear(in_features=768, out_features=22, bias=True)

Preprocessing

The following columns were binary encoded to be used as metadata later on in the development.

sex unique values: ['M' 'F' nan]
After conversion unique values: [ 0.  1. nan]

RT_PCR_positive unique values: ['Y' nan 'Unclear']
After conversion unique values: [ 1. nan  0.]

survival unique values: ['Y' nan 'N']
After conversion unique values: [ 1. nan  0.]

intubated unique values: ['N' 'Y' nan]
After conversion unique values: [ 0.  1. nan]

intubation_present unique values: ['N' 'Y' nan]
After conversion unique values: [ 0.  1. nan]

went_icu unique values: ['N' 'Y' nan]
After conversion unique values: [ 0.  1. nan]

in_icu unique values: ['N' 'Y' nan]
After conversion unique values: [ 0.  1. nan]

modality unique values: ['X-ray' 'CT']
After conversion unique values: [0 1]

The values in the view column were onehot encoded as well

view unique values: ['PA' 'AP' 'L' 'Axial' 'AP Supine' 'Coronal' 'AP Erect']
After encoding columns: ['view_AP', 'view_AP Erect','view_AP Supine', 'view_Axial', 'view_Coronal', 'view_L', 'view_PA']

The distribution of diseases was as follows initially:

The different diseases were broken down and made it into different labels as follows:

'Pneumonia', 'Viral', 'COVID-19', 'SARS', 'Fungal', 'Pneumocystis', 'Bacterial', 'Streptococcus', 'No Finding', 'Chlamydophila', 'E.Coli', 'Klebsiella', 'Legionella', 'Unknown', 'Lipoid', 'Varicella', 'Mycoplasma', 'Influenza', 'todo', 'Tuberculosis', 'H1N1', 'Aspergillosis', 'Herpes ', 'Aspiration', 'Nocardia', 'MERS-CoV', 'Staphylococcus', 'MRSA'

The distribution after doing this became as follows:

Afterwards this multilabel data was used to train the models.

Training

The scripts for fine-tuning the models is present in Scripts folder. The training loop as well as custom dataset class are present in the utils.py file.

The models are trained using AdamW Optimizer with learning rate set at 0.005 for the first three initial models that are trained only on images for a maximum for 100 epochs with an early stopping rule if validation loss does not improve after 15 epochs.

Evaluation

Training time metrics

RegNet_2gf

RegNet_8gf

EfficientNet

SwinV2

Test time evaluation

Usage

Dependencies

All the dependencies in the project are mentioned in requirements.txt file. To install all dependencies run the following command in your terminal:

pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
Data		Data
Models		Models
Notebooks		Notebooks
Scripts		Scripts
utils		utils
README.md		README.md
requirements.txt		requirements.txt

Prashant-Tiwari26/Breathing-Problem-Classification

Folders and files

Latest commit

History

Repository files navigation

Breathing Problem Classification using CNN

Overview

Table of Contents

Datasets

Model Architecture

RegNet

EfficientNet

Swinv2

Preprocessing

Training

Evaluation

Training time metrics

RegNet_2gf

RegNet_8gf

EfficientNet

SwinV2

Test time evaluation

Usage

Dependencies

About

Topics

Resources

Stars

Watchers

Forks

Languages