Targeted Adaptable Sample for Accurate and Efficient Quantile Estimation in Non-Stationary Data Streams

Ognjen Arandjelović*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

The need to detect outliers or otherwise unusual data, which can be formalized as the estimation a particular quantile of a distribution, is an important problem that frequently arises in a variety of applications of pattern recognition, computer vision and signal processing. For example, our work was most proximally motivated by the practical limitations and requirements of many semi-automatic surveillance analytics systems that detect abnormalities in closed-circuit television (CCTV) footage using statistical models of low-level motion features. In this paper, we specifically address the problem of estimating the running quantile of a data stream with non-stationary stochasticity when the absolute (rather than asymptotic) memory for storing observations is severely limited. We make several major contributions: (i) we derive an important theoretical result that shows that the change in the quantile of a stream is constrained regardless of the stochastic properties of data; (ii) we describe a set of high-level design goals for an effective estimation algorithm that emerge as a consequence of our theoretical findings; (iii) we introduce a novel algorithm that implements the aforementioned design goals by retaining a sample of data values in a manner adaptive to changes in the distribution of data and progressively narrowing down its focus in the periods of quasi-stationary stochasticity; and (iv) we present a comprehensive evaluation of the proposed algorithm and compare it with the existing methods in the literature on both synthetic datasets and three large “real-world” streams acquired in the course of operation of an existing commercial surveillance system. Our results and their detailed analysis convincingly and comprehensively demonstrate that the proposed method is highly successful and vastly outperforms the existing alternatives, especially when the target quantile is high-valued and the available buffer capacity severely limited.

Original languageEnglish
Pages (from-to)848-870
Number of pages23
JournalMachine Learning and Knowledge Extraction
Volume1
Issue number3
DOIs
Publication statusPublished - Sept 2019

Keywords

  • abnormality
  • auxiliary
  • flexible
  • histogram
  • median
  • surveillance

Fingerprint

Dive into the research topics of 'Targeted Adaptable Sample for Accurate and Efficient Quantile Estimation in Non-Stationary Data Streams'. Together they form a unique fingerprint.

Cite this