Video-Harmonization-Dataset-HYouTube

Video composition means cutting the foregrounds from one video and pasting them on the backgrounds from another video, resulting in a composite video. However, the inserted foregrounds may be incompatible with the backgrounds in terms of color and illumination statistics. Video harmonization aims to adjust the foregrounds in the composite video to make them compatible with the backgrounds, resulting in a harmonious video. Here are three examples of video harmonization, in which the top row contains composite videos and the bottom row contains the corresponding ground-truth harmonious videos.

HYouTube Dataset

Dataset Construction: Our dataset HYouTube is the first public video harmonization dataset built upon large-scale video object segmentation dataset Youtube-VOS-2018. Given real videos with object masks, we adjust their foregrounds using Lookup Tables (LUTs) to produce synthetic composite videos. We employ in total 100 candidate LUTs, in which one LUT corresponds to one type of color transfer. Given a video sample, we first randomly select an LUT from 100 candidate LUTs to transfer the foreground of each frame. The transferred foregrounds and the original backgrounds form the composite frames, and the composite frames form composite video samples. We provide the script lut_transfer_sample.py to generate composite video based on real video, foreground mask, and LUT. Our dataset includes 3194 pairs of synthetic composite video samples and real video samples, which are split to 2558 training pairs and 636 test pairs. Each video sample contains 20 consecutive frames with the foreground mask for each frame. Our HYouTube dataset can be downloaded from Baidu Cloud (access code: dk07) or OneDrive.

HYoutube File Structure: Download the HYoutue dataset. We show the file structure below:

├── Composite: 
     ├── videoID: 
              ├── objectID
                        ├── imgID.jpg
              
              ├── ……
     ├── ……
├── Mask: 
     ├── videoID: 
              ├── objectID
                        ├── imgID.png
              
              ├── ……
     ├── ……
├── Ground-truth: 
     ├── videoID: 
              ├── imgID.jpg
     ├── ……
├── train.txt
└── test.txt
└── transfer.py

Real Composite Videos

For evaluation, we also create 100 real composite videos. Specifically, we collect video foregrounds with masks from a video matting dataset as well as video backgrounds from Vimeo-90k Dataset and Internet. Then, we create composite videos via copy-and-paste and finally select 100 composite videos which look reasonable w.r.t. foreground placement but inharmonious w.r.t. color/illumination. 100 real composite videos can be downloaded from Baidu Cloud (access code: nf9b) or OneDrive.

Color Transfer Method

Prerequisites

Python
os
numpy
cv2
PIL
pillow_lut

Demo

We provide the script lut_transfer_sample.py to generate composite video based on real video, foreground mask, and LUT.

python lut_transfer_sample.py

Before you run the code, you should change the path of real video directroy, the path of video mask directroy, the path of LUT and the storage path to generate your composite video.

Our CO2Net

Official implementation of Deep Video Harmonization with Color Mapping Consistency

Prepare

Linux
Python 3
NVIDIA GPU + CUDA CuDNN
Clone this repo:

git clone https://github.com/bcmi/Video-Harmonization-Dataset-HYouTube.git
cd Video-Harmonization-Dataset-HYouTube
cd CO2Net

Prepare Dataset

Download HYoutube from Baidu Cloud (access code: dk07) or OneDrive.

Install cuda package

We provide two CUDA operations here for LUT calculation. Please make sure that you have already installed CUDA.

cd CO2Net
cd trilinear
. ./setup.sh

cd CO2Net
cd tridistribute
. ./setup.sh

Install python package

pip install -r requirements.txt

Train

We adopt a two-stage training strategy. In the first stage, we train an image harmonization backbone on HYoutube. In the second stage, we fix the backbone and train refinement module.

For stage 1: Backbone training, we provide the links for two backbones: iSSAM [WACV2021] and RainNet[CVPR2021].You can follow the same paths of their repos to train your own backbone model (iSSAM and RainNet). We release our trained iSSAM backbone here (./final_models/issam_backbone.pth).

For stage 2: Refinement module training, you can directly train by

python3  scripts/my_train.py --gpu=1 --dataset_path <Your path to HYouTube> --train_list ./train_list.txt --val_list ./test_frames.txt --backbone <Your backbone model> --backbone_type <Your backbone type, we provide 'issam' and 'rain' here> --previous_num 8 --future_num 8 --use_feature --normalize_inside --exp_name <exp name>

But since we adopt two-stage training strategy, we highly recommend you to calculate and store the LUT results first by using

python3  scripts/evaluate_model.py --gpu=0 --dataset_path <Your path to HYouTube> --val_list ./all_frames.txt --backbone_type <Your backbone type> --backbone <Your backbone model> --previous_num 8 --future_num 8 --write_lut_output <directory to store lut output> --write_lut_map <directory to store lut map>

then you can use

python3  scripts/my_train.py --gpu=1 --dataset_path  <Your path to HYouTube> --train_list ./train_list.txt --val_list ./test_frames.txt --backbone_type <Your backbone type> --backbone  <Your backbone model> --previous_num 8 --future_num 8 --use_feature --normalize_inside --exp_name <exp_name> --lut_map_dir <directory to store lut map> --lut_output_dir <directory to store lut output>

It can directly read LUT result without the need to read all neigboring frames, which will speed up the training process.

Notice that you can also decide your own number of previous and future neigbors by changing Arguments previous_num/future_num. The Argument Use_feature decides whether to use final feature of backbone model. You can refer to Table 2 in the paper for more information.

Evaluate

We release our backbone of iSSAM(./final_models/issam_backbone.pth), our framework's result with iSSAM as backbone(./final_models/issam_final.pth). To compar with our method, we also use Huang et al.'s way to train iSSAM and release it in ./final_models/issam_huang.pth. Notice the architecture of obtained model by Huang et al.'s method is totally the same as iSSAM. So you can treat it as another checkpoint of backbone.

python3  scripts/evaluate_model.py --gpu=0 --dataset_path <Your path to HYouTube> --val_list ./test_frames.txt --backbone ./final_models/issam_backbone.pth --previous_num 8 --future_num 8  --use_feature --checkpoint ./final_models/issam_final.pth

Or evaluate without refinement module, it will test the result of LUT output.

python3  scripts/evaluate_model.py --gpu=0 --dataset_path <Your path to HYouTube> --val_list ./test_frames.txt --backbone ./final_models/issam_backbone.pth --previous_num 8 --future_num 8

To evaluate Huang's result, run

python3  scripts/evaluate_model.py --gpu=0 --dataset_path <Your path to HYouTube> --val_list ./test_frames.txt --backbone ./final_models/issam_huang.pth --previous_num 1 --future_num 0

and see the metrics of backbone.

The expected quantitative results are in the following table.

	MSE	FMSE	PSNR	fSSIM
Backbone	28.90	203.77	37.38	0.8817
Huang	27.89	199.89	37.44	0.8821
Ours	26.50	186.72	37.61	0.8827

Your can also use your own backbone or whole models. Please replace Arguments checkpoint/backbone by your own model.

Evaluate temporal consistency

You need to download TL test set, which is a sub test set for calculating temporal loss (TL). You also need to prepare Flownet2, which is used for flow calculation and image warping. TL test set is generated from FlowNet2 and the next unannotated frame of HYouTube. For more information, please see Section 3 in the supplementary.

Prepare FlowNetV2

Please follow command of FlowNet2 to install and download FlowNet2 weight. Please put FlowNet directory on ./ and its weight on ./flownet/FlowNet2_checkpoint.pth.tar

Prepare TL dataset

Please download TL test set from Baidu Cloud (access code: 3v1s) or OneDrive.

Prepare result numpy files and Evaluate

You need to store the both numpy result of candidate model on both HYoutube's test set and TL test set.

python3  scripts/evaluate_model.py --gpu=0 --dataset_path <Your path to HYouTube> --val_list ./test_frames.txt --backbone <Your backbone model> --previous_num 8 --future_num 8 --checkpoint <Your checkpoint> --write_npy_result --result_npy_dir <Directory to store numpy result>

python3  scripts/evaluate_model.py --gpu=0 --dataset_path <Your path to TL_TestSet> --val_list ./future_list.txt --backbone <Your backbone model> --previous_num 8 --future_num 8 --checkpoint <Your checkpoint> --write_npy_result --result_npy_dir <Directory to store numpy future result>

Also, to evaluate TL of backbone, you can store the results of backbone using

python3  scripts/evaluate_model.py --gpu=0 --dataset_path <Your path to HYouTube> --val_list ./test_frames.txt --backbone <Your backbone model> --previous_num 8 --future_num 8 --checkpoint <Your checkpoint> --write_npy_backbone --backbone_npy_dir <Directory to store numpy result>

python3  scripts/evaluate_model.py --gpu=0 --dataset_path <Your path to TL_TestSet> --val_list ./future_list.txt --backbone <Your backbone model> --previous_num 8 --future_num 8 --checkpoint <Your checkpoint> --write_npy_result --result_npy_dir <Directory to store numpy future result>

Then calculate TL loss using

python3  scripts/evaluate_flow.py --dataset_path <Your path to HYouTube> --dataset_path_next <Your path to HYouTube_Next> --cur_result <result of current numpy dir> --next_result <result of next numpy dir>

The expected quantitative results of released models are in the following table.

	Tl
Backbone	6.48
Huang	6.49
Ours	5.11

Baselines

Huang et al.

We implement a simple version of Temporally Coherent Video Harmonization Using Adversarial Networks using iSSAM as backbone. You can train it by:

python3  scripts/evaluate_model.py --gpu=0 --dataset_path <Your path to HYouTube> --val_list ./all_frames.txt --backbone_type <Your backbone type> --backbone <Your backbone model> --previous_num 8 --future_num 8 --write_lut_output <directory to store lut output> --write_lut_map <directory to store lut map> 
cd ..
cd issam_huang
python3 train.py models/fixed256/improved_ssam.py --gpu=0 --worker=1 --dataset_path <Your path to HYouTube> --train_list ./train_frames.txt --val_list ./test_frames.txt

Other Resources

Bibtex

If you find this work useful for your research, please cite our paper using the following BibTeX [arxiv]:

@article{hyoutube2021,
  title={Deep Video Harmonization with Color Mapping Consistency},
  author={Xinyuan Lu, Shengyuan Huang, Li Niu, Wenyan Cong, Liqing Zhang},
  journal={IJCAI},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
CO2Net		CO2Net
Example		Example
Huang et al.		Huang et al.
LUT color transfer		LUT color transfer
README.md		README.md
test_list.txt		test_list.txt
train_list.txt		train_list.txt

bcmi/Video-Harmonization-Dataset-HYouTube

Folders and files

Latest commit

History

Repository files navigation