From Captions to Visual Concepts and Back

設立年月日:April 9, 2015年

teaser We introduce a novel approach for automatically generating image descriptions. Visual detectors, language models, and deep multimodal similarity models are learned directly from a dataset of image captions. Our system is state-of-the-art on the official Microsoft COCO benchmark, producing a BLEU-4 score of 29.1%. Human judges consider the captions to be as good as or better than humans 34% of the time.

人数

Jianfeng Gao

Distinguished Scientist & Vice President

詳細はこちら