View article

Visual genome: Connecting language and vision using crowdsourced dense image annotations

Authors

Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, Michael S Bernstein, Li Fei-Fei

Publication date

2017/5

Journal

International journal of computer vision

Volume

123

Pages

32-73

Publisher

Springer US

Description

Despite progress in perceptual tasks such as image classification, computers still perform poorly on cognitive tasks such as image description and question answering. Cognition is core to tasks that involve not just recognizing, but reasoning about our visual world. However, models used to tackle the rich content in images for cognitive tasks are still being trained using the same datasets designed for perceptual tasks. To achieve success at cognitive tasks, models need to understand the interactions and relationships between objects in an image. When asked “What vehicle is the person riding?”, computers will need to identify the objects in an image as well as the relationships riding(man, carriage) and pulling(horse, carriage) to answer correctly that “the person is riding a horse-drawn carriage.” In this paper, we present the Visual Genome dataset to enable the modeling of such relationships. We collect dense …

Total citations

Cited by 5388

20162017201820192020202120222023202457 163 284 488 677 880 1067 1267 471

Scholar articles

Visual genome: Connecting language and vision using crowdsourced dense image annotations

R Krishna, Y Zhu, O Groth, J Johnson, K Hata, J Kravitz… - International journal of computer vision, 2017

Cited by 5388 Related articles All 15 versions