Skip to content

Enriching MS-COCO with Chinese sentences and tags for cross-lingual multimedia tasks

License

Notifications You must be signed in to change notification settings

li-xirong/coco-cn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

COCO-CN

COCO-CN is a bilingual image description dataset enriching MS-COCO with manually written Chinese sentences and tags. The new dataset can be used for multiple tasks including image tagging, captioning and retrieval, all in a cross-lingual setting.

Chinese sentences COCO-CN train COCO-CN val COCO-CN test
human written
human translation
machine translation (baidu)

coco-cn annotation examples

Progress

  • version 201805: 20,341 images (training / validation / test: 18,341 / 1,000 / 1,000), associated with 22,218 manually written Chinese sentences and 5,000 manually translated sentences. Data is freely available upon request. Please submit your request via Google Form.
  • Precomputed image features: ResNext-101
  • COCO-CN-Results-Viewer: A lightweight tool to inspect the results of different image captioning systems on the COCO-CN test set, developed by Emiel van Miltenburg at the Tilburg University.
  • NUS-WIDE100: An extra test set.

Citation

If you find COCO-CN useful, please consider citing the following paper: