Skip to content

Codebase for the Paper: Learning Visual Styles from Audio-Visual Associations (ECCV 2022, in PyTorch)

License

Notifications You must be signed in to change notification settings

Tinglok/avstyle

Repository files navigation

Learning Visual Styles from Audio-Visual Associations




This repository contains the official codebase for Learning Visual Styles from Audio-Visual Associations. We manipulate the style of an image to match a sound. After training with an unlabeled dataset of egocentric hiking videos, our model learns visual styles for a variety of ambient sounds, such as light and heavy rain, as well as physical interactions, such as footsteps. We thank Taesung and Junyan for sharing codes of CUT.

Learning Visual Styles from Audio-Visual Associations
Tingle Li, Yichen Liu, Andrew Owens, Hang Zhao
Tsinghua University, University of Michigan and Shanghai Qi Zhi Institute
In ECCV 2022

Prerequisites

  • Linux or macOS
  • Python 3
  • NVIDIA GPU + CUDA CuDNN

Quick Start

  • Clone this repo:

    git clone https://github.com/Tinglok/avstyle avstyle
    cd avstyle
  • Install PyTorch 1.7.1 and other dependencies.

    For pip users, please type the command pip install -r requirements.txt.

    For Conda users, you can create a new Conda environment using conda env create -f environment.yaml.

Datasets

Into the wild

We provide Youtube ID in dataset/Into-the-Wild/metadata.xlsx. Please see youtube-dl to download the videos to dataset/Into-the-Wild/youtube first.

Then process them using:

python ./dataset/Into-the-Wild/split.py

so that the videos are split into 3s video clips.

Then run the command:

python ./dataset/Into-the-Wild/video2jpg.py

to extract the corresponding images.

Finally download trainA and trainB to dataset\Into-the-Wild.

The Greatest Hits

Please follow the instruction from Visually Indicated Sounds to download this dataset.

Training and Test

  • Train our model on the Into the Wild dataset:
python train.py --dataroot ./datasets/Into-the-Wild --name hiking

The checkpoints will be stored at ./checkpoints/hiking/.

  • Train our model on the Greatest Hits dataset:
python train.py --dataroot ./datasets/Greatest-Hits --name material

The checkpoints will be stored at ./checkpoints/material/.

  • Test our model on the Into the Wild dataset:
python test.py --dataroot ./datasets/Into-the-Wild --name hiking --eval

The test results will be saved to a html file at ./results/hiking/latest_train/index.html.

  • Test our model on the Greatest Hits dataset:
python test.py --dataroot ./datasets/Greatest-Hits --name material --eval

The test results will be saved to a html file at ./results/material/latest_train/index.html.

Pre-trained Model

Pre-trained models on Into-the-Wild and the Greatest Hits datasets are avaliable at this URL.

Citation

If you use this code for your research, please consider citing our paper.

@inproceedings{li2021learning,
  author={Tingle Li and Yichen Liu and Andrew Owens and Hang Zhao},
  title={{Learning Visual Styles from Audio-Visual Associations}},
  year=2022,
  booktitle={European Conference on Computer Vision (ECCV)}
}

Releases

No releases published

Packages

No packages published

Languages