Skip to content

Towards Explainable Metrics for Conditional Image Synthesis Evaluation (ACL 2024)

License

Notifications You must be signed in to change notification settings

TIGER-AI-Lab/VIEScore

Repository files navigation

VIEScore

arXiv

contributors license GitHub Hits

This repository hosts our work's code VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation.

VIEScore is a Visual Instruction-guided Explainable metric for evaluating any conditional image generation tasks.

🔥 🔥 🔥 Check out our [Project Page and Leaderboard] for more results and analysis!
Metrics in the future would provide the score and the rationale, enabling the understanding of each judgment. Which method (VIEScore or traditional metrics) is “closer” to the human perspective?

📰 News

  • 2024 May 23: We released all the results and notebook to visualize the results.
  • 2024 May 23: Added Gemini-1.5-pro results.
  • 2024 May 16: Added GPT4o results and we found that GPT4o achieve on par correlation with human across all tasks!
  • 2024 May 15: VIEScore is accepted to ACL2024 (main)!
  • 2024 Jan 11: Code is released!
  • 2023 Dec 24: Paper available on Arxiv. Code coming Soon!

Project Structure

  • imagen_museum : helpers to fetch image data from ImagenMuseum
  • mllm_tools: Plug-and-Play MLLMs.
  • _questions: prompt folder
  • _answers: results folder
  • run.py: script to run VIEScore.
  • clean_result.py: script to clear nonsense results according to banned_reasonings.txt.
  • count_entries.py: script to count the number of entries.

Running VIEScore

$ python3 run.py --help
usage: run.py [-h] [--task {tie,mie,t2i,cig,sdig,msdig,sdie}] [--mllm {gpt4v, gpt4o, llava,blip2,fuyu,qwenvl,cogvlm,instructblip,openflamingo, gemini}] [--setting {0shot,1shot}] [--context_file CONTEXT_FILE]
              [--guess_if_cannot_parse]

Run different task on VIEScore.

optional arguments:
  -h, --help            show this help message and exit
  --task {tie,mie,t2i,cig,sdig,msdig,sdie}
                        Select the task to run
  --mllm {gpt4v, gpt4o, llava,blip2,fuyu,qwenvl,cogvlm,instructblip,openflamingo, gemini}
                        Select the MLLM model to use
  --setting {0shot,1shot}
                        Select the incontext learning setting
  --context_file CONTEXT_FILE
                        Which context file to use.
  --guess_if_cannot_parse
                        Guess a value if the output cannot be parsed.

For example, you can run:

python3 run.py --task t2i --mllm gpt4v --setting 0shot --context_file context.txt
  • Available context files are in _questions folder.

After running the experiment, you can count the results or clean it up:

python3 count_entries.py <your_answers_dir>
python3 clean_result.py <your_answers_dir>

Compute Correlations

Refer to analyze_json.ipynb notebook.

Paper Results

Citation

Please kindly cite our paper if you use our code, data, models or results:

@misc{ku2023viescore,
                title={VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation}, 
                author={Max Ku and Dongfu Jiang and Cong Wei and Xiang Yue and Wenhu Chen},
                year={2023},
                eprint={2312.14867},
                archivePrefix={arXiv},
                primaryClass={cs.CV}
            }