Skip to content

Finetuning pretrained CNN for multilabel breathing disease classification

Notifications You must be signed in to change notification settings

Prashant-Tiwari26/Breathing-Problem-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Breathing Problem Classification using CNN

Overview

In this Project, we use CNN on X-rays and CT scan images of various patients from across the world Cohen, Joseph Paul, et al. "Covid-19 image data collection: Prospective predictions are the future." arXiv preprint arXiv:2006.11988 (2020). to predict various breathing diseases they. A total of 22 breathing issues are there including. We have treated the problem as a multilabel classification problem here as a single patient often have more than one disease.

Table of Contents

Datasets

The following dataset has been used for this project : Cohen, Joseph Paul, et al. "Covid-19 image data collection: Prospective predictions are the future." arXiv preprint arXiv:2006.11988 (2020). which can be found here.

Model Architecture

There are 3 models that have been initially finetuned to work directly on the images to detect diseases: regnet_y_3_2gf, efficientnet_v2_s and swin_v2_t. The final classification layer was changed from 1000 neurons to 22 for our task. Their architecture of their final layer after the change is as follows:

RegNet

Before: (fc): Linear(in_features=1512, out_features=1000, bias=True)

After: (fc): Linear(in_features=1512, out_features=22, bias=True)

EfficientNet

Before: (classifier): Sequential(
(0): Dropout(p=0.2, inplace=False)
(1): ReLU()
(2): Linear(in_features=1280, out_features=1000, bias=True)
)

After: (classifier): Sequential(
(0): Dropout(p=0.2, inplace=False)
(1): ReLU()
(2): Linear(in_features=1280, out_features=22, bias=True)
)

Swinv2

Before: (head): Linear(in_features=768, out_features=1000, bias=True)

After: (head): Linear(in_features=768, out_features=22, bias=True)

Preprocessing

The following columns were binary encoded to be used as metadata later on in the development.

sex unique values: ['M' 'F' nan]
After conversion unique values: [ 0.  1. nan]

RT_PCR_positive unique values: ['Y' nan 'Unclear']
After conversion unique values: [ 1. nan  0.]

survival unique values: ['Y' nan 'N']
After conversion unique values: [ 1. nan  0.]

intubated unique values: ['N' 'Y' nan]
After conversion unique values: [ 0.  1. nan]

intubation_present unique values: ['N' 'Y' nan]
After conversion unique values: [ 0.  1. nan]

went_icu unique values: ['N' 'Y' nan]
After conversion unique values: [ 0.  1. nan]

in_icu unique values: ['N' 'Y' nan]
After conversion unique values: [ 0.  1. nan]

modality unique values: ['X-ray' 'CT']
After conversion unique values: [0 1]

The values in the view column were onehot encoded as well

view unique values: ['PA' 'AP' 'L' 'Axial' 'AP Supine' 'Coronal' 'AP Erect']
After encoding columns: ['view_AP', 'view_AP Erect','view_AP Supine', 'view_Axial', 'view_Coronal', 'view_L', 'view_PA']

The distribution of diseases was as follows initially:

Alt text

The different diseases were broken down and made it into different labels as follows:

'Pneumonia', 'Viral', 'COVID-19', 'SARS', 'Fungal', 'Pneumocystis', 'Bacterial', 'Streptococcus', 'No Finding', 'Chlamydophila', 'E.Coli', 'Klebsiella', 'Legionella', 'Unknown', 'Lipoid', 'Varicella', 'Mycoplasma', 'Influenza', 'todo', 'Tuberculosis', 'H1N1', 'Aspergillosis', 'Herpes ', 'Aspiration', 'Nocardia', 'MERS-CoV', 'Staphylococcus', 'MRSA'

The distribution after doing this became as follows:

Alt text

Afterwards this multilabel data was used to train the models.

Training

The scripts for fine-tuning the models is present in Scripts folder. The training loop as well as custom dataset class are present in the utils.py file.

The models are trained using AdamW Optimizer with learning rate set at 0.005 for the first three initial models that are trained only on images for a maximum for 100 epochs with an early stopping rule if validation loss does not improve after 15 epochs.

Evaluation

Training time metrics

RegNet_2gf

Alt text

RegNet_8gf

Alt text

EfficientNet

Alt text

SwinV2

Alt text

Test time evaluation

Usage

Dependencies

All the dependencies in the project are mentioned in requirements.txt file. To install all dependencies run the following command in your terminal:

pip install -r requirements.txt

About

Finetuning pretrained CNN for multilabel breathing disease classification

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published