Skip to content
/ shine Public

[CVPR'24 Highlight] SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection

License

Notifications You must be signed in to change notification settings

naver/shine

Repository files navigation

SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection

Mingxuan Liu · Tyler L. Hayes · Elisa Ricci · Gabriela Csurka · Riccardo Volpi

CVPR 2024 ✨Highlight✨

Installation

Requirements:

  • Linux or macOS with Python ≥ 3.8
  • PyTorch ≥ 1.8.2. Install them together at pytorch.org to make sure of this. Note, please check PyTorch version matches that is required by Detectron2.
  • Detectron2: follow Detectron2 installation instructions.
  • OpenAI API (optional, if you want to construct hierarchies using LMMs)

Setup environment

# Clone this project repository under your workspace folder
git clone https://github.com/naver/shine.git --recurse-submodules
cd shine
# Create conda environment and install the dependencies
conda env create -n shine -f shine.yml
# Activate the working environment
conda activate shine
# Install Detectron2 under your workspace folder
# (Please follow Detectron2 official instructions)
cd ..
git clone git@github.com:facebookresearch/detectron2.git
cd detectron2
pip install -e .

Our project uses two submodules, CenterNet2 and Deformable-DETR. If you forget to add --recurse-submodules, do git submodule init and then git submodule update.

Set your OpenAI API Key to the environment variable (optional: if you want to generate hierarchies)

export OPENAI_API_KEY=YOUR_OpenAI_Key

OvOD Models Preparation

SHiNe is training-free. So we just need to download off-the-shelf OvOD models and apply SHiNe on top of them. You can download the models:

and put (or, softlink via ln -s command) under the models folder in this repository as:

SHiNe
    └── models
          ├── codet
            ├── CoDet_OVLVIS_R5021k_4x_ft4x.pth
            └── CoDet_OVLVIS_SwinB_4x_ft4x.pth
          ├── detic
            ├── coco_ovod
              ├── BoxSup_OVCOCO_CLIP_R50_1x.pth
              ├── Detic_OVCOCO_CLIP_R50_1x_caption.pth
              ├── Detic_OVCOCO_CLIP_R50_1x_max-size.pth
              └── Detic_OVCOCO_CLIP_R50_1x_max-size_caption.pth
            ├── cross_eval
              ├── BoxSup-C2_L_CLIP_SwinB_896b32_4x.pth
              ├── BoxSup-C2_LCOCO_CLIP_SwinB_896b32_4x.pth
              ├── Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.pth
              ├── Detic_LI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.pth
              └── Detic_LI_CLIP_SwinB_896b32_4x_ft4x_max-size.pth
            ├── lvis_ovod
              ├── BoxSup-C2_Lbase_CLIP_R5021k_640b64_4x.pth
              ├── BoxSup-C2_Lbase_CLIP_SwinB_896b32_4x.pth
              ├── Detic_LbaseCCcapimg_CLIP_R5021k_640b64_4x_ft4x_max-size.pth
              ├── Detic_LbaseCCimg_CLIP_R5021k_640b64_4x_ft4x_max-size.pth
              ├── Detic_LbaseI_CLIP_R5021k_640b64_4x_ft4x_max-size.pth
              └── Detic_LbaseI_CLIP_SwinB_896b32_4x_ft4x_max-size.pth
            ├── lvis_std
              ├── BoxSup-C2_L_CLIP_R5021k_640b64_4x.pth
              ├── BoxSup-DeformDETR_L_R50_4x.pth
              ├── Detic_DeformDETR_LI_R50_4x_ft4x.pth
              └── Detic_LI_CLIP_R5021k_640b64_4x_ft4x_max-size.pth
          ├── vldet
            ├── lvis_base.pth
            ├── lvis_base_swinB.pth
            ├── lvis_vldet.pth
            └── lvis_vldet_swinB.pth

Datasets Preparation

You can download the datasets:

and put (or, softlink via ln -s command) under the datasets folder in this repository as:

SHiNe
    └── datasets
          ├── inat
          ├── fsod
          ├── imagenet2012
          ├── coco
          └── lvis

Run SHiNe on OvOD

Example of applying SHiNe on Detic for OvOD task using iNat dataset:

# Vanilla OvOD (baseline)
bash scripts_local/Detic/inat/swin/baseline/inat_detic_SwinB_LVIS-IN-21K-COCO_baseline.sh
 
# SHiNe using dataset-provided hierarchy
bash scripts_local/Detic/inat/swin/shine_gt/inat_detic_SwinB_LVIS-IN-21K-COCO_shine_gt.sh

# SHiNe using LLM-generated synthetic hierarchy
bash scripts_local/Detic/inat/swin/shine_llm/inat_detic_SwinB_LVIS-IN-21K-COCO_shine_llm.sh

Run SHiNe on Zero-shot classification

Example of applying SHiNe on CLIP zero-shot transfer task using ImageNet-1k dataset:

# Vanilla CLIP Zero-shot transfer (baseline)
bash scripts_local/Classification/imagenet1k/baseline/imagenet1k_vitL14_baseline.sh

# SHiNe using WordNet hierarchy
bash scripts_local/Classification/imagenet1k/shine_wordnet/imagenet1k_vitL14_shine_wordnet.sh

# SHiNe using LLM-generated synthetic hierarchy
bash scripts_local/Classification/imagenet1k/shine_llm/imagenet1k_vitL14_shine_llm.sh

SHiNe Construction (optional)

Example of constructing SHiNe classifier for OvOD task using iNat dataset:

# SHiNe using dataset-provided hierarchy
bash scripts_build_nexus/inat/build_inat_nexus_gt.sh
# SHiNe using LLM-generated synthetic hierarchy
bash scripts_build_nexus/inat/build_inat_nexus_llm.sh

Hierarchy Tree Planting (optional)

Example of building hierarchy trees using either dataset-provided or llm-generated hierarchy entities.

Dataset-provided Hierarchy

# Build hierarchy tree for iNat using dataset-provided hierarchy
bash scripts_plant_hrchy/inat/plant_inat_tree_gt.sh

# Build hierarchy tree for ImageNet-1k using WordNet hierarchy
bash scripts_plant_hrchy/imagenet1k/plant_imagenet1k_tree_wordnet.sh

LLM-generated Hierarchy

# Build hierarchy tree for iNat using LLM-generated synthetic hierarchy
bash scripts_plant_hrchy/inat/plant_inat_tree_llm.sh

# Build hierarchy tree for ImageNet-1k using LLM-generated synthetic hierarchy
bash scripts_plant_hrchy/imagenet1k/plant_imagenet1k_tree_llm.sh

License

This project is licensed under the LICENSE file.

Citation

If you find our work useful for your research, please cite our paper using the following BibTeX entry:

@inproceedings{liu2024shine,
  title={{SH}i{N}e: Semantic Hierarchy Nexus for Open-vocabulary Object Detection},
  author={Liu, Mingxuan and Hayes, Tyler L. and Ricci, Elisa and Csurka, Gabriela and Volpi, Riccardo},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2024},
}

Acknowledgment

SHiNe is built upon the awesome works iNat, FSOD, BREEDS, Hierarchy-CLIP, Detic, VLDet, and CoDet. We sincerely thank them for their work and contributions.

About

[CVPR'24 Highlight] SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published