CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models [Paper]

What is CLAP4CLIP? • Get going • What is in this repo? • Language-aware knowledge • Uncertainty-related ablations • Cite

What is CLAP4CLIP?

CLAP4CLIP is a general probabilistic finetuning framework for the pre-trained CLIP model on downstream class-incremental learning tasks.

The framework is general because (as depicted below) it supports a diverse range of prompt styles including hand-crafted prompts like Continual-CLIP, task-conditioned prompts like CoOp, instance-conditioned prompts like AttriCLIP, and multi-modal prompts like MaPLe:

Get going

Clone this github repository:

git clone https://github.com/srvCodes/clap4clip.git
cd clap4clip
mkdir ckpt/

Download models: Download the pretrained ViT-B-16.pt and ViT-L-14.pt checkpoints to ckpt/ directory.
Download datasets: We suggest following the mammoth library to download all the datasets into the repo datasets/. Instructions for ImageNet-R can be found here.

What is in this repo?

This repo is designed with the aim of benchmarking various finetuning methods for class-incremental learning with the pre-trained CLIP model.

The instructions below depict how to run the models provided with the initial release on CIFAR100 (check the repo scripts/):

CLAP4CLIP with hand-crafted prompts (our base CLAP model):

python3 main_incremental_submit.py --lasp --beta 15 --db_name cifar100 --use-vga --expandable-adapter --finetuning --finetune-epochs 2 --num-run 10 --compute-ece --compute-bwt --train_batch 32 --exemplar-selector random --root ../path_to_datasets/ --multi-gpu --gpus 0,1 --default-gpu 0 --model clclip_var --epochs 5 --forward-times 20 --arch ViT-B-16  --method er --variational

Continual-CLIP (zero-shot):

python3 main_incremental_submit.py --db_name cifar100 --num-run 10 --compute-ece --compute-bwt --train_batch 32 --root ../path_to_datasets/ --multi-gpu --gpus 0,1 --default-gpu 0 --model clclip --arch ViT-B-16

CLIP-Adapter:

python3 main_incremental_submit.py --db_name cifar100 --finetuning --finetune-epochs 2 --num-run 10 --compute-ece --compute-bwt --train_batch 32 --exemplar-selector random --root ../path_to_datasets/ --multi-gpu --gpus 0,1 --default-gpu 0 --model clip_adapter --epochs 5 --arch ViT-B-16 --method er

We plan to release the following models upon the acceptance of our paper:

CoOp
MaPLe
AttriCLIP
CLAP4CLIP with support for CoOp/MaPLe/AttriCLIP

Language-aware knowledge

Past-task distribution regularization (for reducing forgetting in general): Can be evoked by passing the argument --lasp --beta $\gamma$ where $\gamma$ is the loss weight used in Eq. (12) in our paper.
Weight initialization (for reducing stability gap): Currently, controlled by commenting/uncommenting this line.

Uncertainty-related ablations

In our paper, we show the out-of-the-box perks of uncertainty-aware modelling for the following two tasks:

Post-hoc novel data detection (PhNDD)

PhNDD is a post-hoc setting proposed in our paper for evaluating the novel data detection capabilities of a finetuning algorithm within the continual learning setting. To evoke this, simply pass the argument --eval-ood-score in the script.

Exemplar selection

For all but the zero-shot models, the repo implements the following exemplar selection criteria: Random, Herding, Entropy, Variance, Variance of entropy, and Energy scores. These can simply be evoked by passing the value x to the argument --exemplar-selector, where x can be {random, icarl, entropy, variance, distance, var_entropy, energy}.

Cite

If you want to cite this framework feel free to use this preprint citation:

@article{jha_clap4clip,
  title={CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models},
  author={Jha, Saurav and Gong, Dong and Yao, Lina},
  journal={arXiv preprint arXiv:2403.19137},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
classifier		classifier
clip		clip
dataset		dataset
imagenet_split		imagenet_split
images		images
scripts		scripts
utils		utils
README.md		README.md
main_incremental_submit.py		main_incremental_submit.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

classifier

classifier

clip

clip

dataset

dataset

imagenet_split

imagenet_split

images

images

scripts

scripts

utils

utils

README.md

README.md

main_incremental_submit.py

main_incremental_submit.py

requirements.txt

requirements.txt

Repository files navigation

CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models [Paper]

What is CLAP4CLIP?

Get going

What is in this repo?

Language-aware knowledge

Uncertainty-related ablations

Post-hoc novel data detection (PhNDD)

Exemplar selection

Cite

About

Releases

Packages

Languages

srvCodes/clap4clip

Folders and files

Latest commit

History

Repository files navigation

CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models [Paper]

What is CLAP4CLIP?

Get going

What is in this repo?

Language-aware knowledge

Uncertainty-related ablations

Post-hoc novel data detection (PhNDD)

Exemplar selection

Cite

About

Topics

Resources

Stars

Watchers

Forks

Languages