Abstract
Understanding complex visual scenes is one of fundamental problems in computer vision, but learning in this domain is challenging due to the inherent richness of the visual world and the vast number of possible scene configurations. Current state of the art approaches to scene understanding often employ deep networks which require large and densely annotated datasets. This goes against the seemingly intuitive learning abilities of humans and our ability to generalise from few examples to unseen situations. In this paper, we propose a unified framework for learning visual representation of words denoting attributes such as “blue” and relations such as “left of” based on Gaussian models operating in a simple, unified feature space. The strength of our model is that it only requires a small number of weak annotations and is able to generalize easily to unseen situations such as recognizing object relations in unusual configurations. We demonstrate the effectiveness of our model on the pr edicate detection task. Our model is able to outperform the state of the art on this task in both the normal and zero-shot scenarios, while training on a dataset an order of magnitude smaller. (Less)
Original language | English |
---|---|
Title of host publication | Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - (Volume 5) |
Editors | Giovanni Maria Farinella, Petia Radeva, Jose Braz, Kadi Bouatouch |
Publisher | SCITEPRESS - Science and Technology Publications |
Pages | 146-156 |
Volume | 5 VISAPP |
ISBN (Print) | 9789897584886 |
DOIs | |
Publication status | Published - 8 Feb 2021 |
Event | 16th International Conference on Computer Vision Theory and Applications (VISAPP 2021) - Online Duration: 8 Feb 2021 → 10 Feb 2021 Conference number: 16 http://www.visapp.visigrapp.org/?y=2021 |
Publication series
Name | International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications |
---|---|
Publisher | SciTePress Digital Library (Science and Technology Publications, Lda) |
Volume | 5 |
ISSN (Print) | 2184-4321 |
Conference
Conference | 16th International Conference on Computer Vision Theory and Applications (VISAPP 2021) |
---|---|
Abbreviated title | VISAPP 2021 |
Period | 8/02/21 → 10/02/21 |
Internet address |
Keywords
- Few-shot learning
- Learning models
- Attribute learning
- Relation learning
- Scene understanding