Authors
Cewu Lu, Ranjay Krishna, Michael Bernstein, Li Fei-Fei
Publication date
2016
Conference
Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14
Pages
852-869
Publisher
Springer International Publishing
Description
Visual relationships capture a wide variety of interactions between pairs of objects in images (e.g. “man riding bicycle” and “man pushing bicycle”). Consequently, the set of possible relationships is extremely large and it is difficult to obtain sufficient training examples for all possible relationships. Because of this limitation, previous work on visual relationship detection has concentrated on predicting only a handful of relationships. Though most relationships are infrequent, their objects (e.g. “man” and “bicycle”) and predicates (e.g. “riding” and “pushing”) independently occur more frequently. We propose a model that uses this insight to train visual models for objects and predicates individually and later combines them together to predict multiple relationships per image. We improve on prior work by leveraging language priors from semantic word embeddings to finetune the likelihood of a predicted …
Total citations
201720182019202020212022202320247111616117821222020060
Scholar articles
C Lu, R Krishna, M Bernstein, L Fei-Fei - Computer Vision–ECCV 2016: 14th European …, 2016