Abstract
In this paper we introduce two novel methods for object recognition from video. Our major contributions are (i) the use of dense, overlapping local descriptors as means of accurately capturing the appearance of generic, even untextured objects, (ii) a framework for employing such sets for recognition using video, (iii) a detailed empirical examination of different aspects of the proposed model and (iv) a comparative performance evaluation on a large object database. We describe and compare two bag-of-visual-words (BoVW)-based representations of an object's appearance in a video sequence, one using a per-sequence bag-of-words and one using a set of per-frame bag-of-words. Empirical results demonstrate the effectiveness of both representations with a somewhat favourable performance of the former.
Original language | English |
---|---|
Title of host publication | 2015 22nd International Conference on Systems, Signals and Image Processing - Proceedings of IWSSIP 2015 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 89-92 |
Number of pages | 4 |
ISBN (Print) | 9781467383530 |
DOIs | |
Publication status | Published - 30 Oct 2015 |
Event | 22nd International Conference on Systems, Signals and Image Processing, IWSSIP 2015 - London, United Kingdom Duration: 10 Sept 2015 → 12 Sept 2015 |
Conference
Conference | 22nd International Conference on Systems, Signals and Image Processing, IWSSIP 2015 |
---|---|
Country/Territory | United Kingdom |
City | London |
Period | 10/09/15 → 12/09/15 |
Keywords
- Dense
- Features
- Histogram
- Matching
- Overlap