TY - JOUR
T1 - Perspective-assisted prototype-based learning for semi-supervised crowd counting
AU - Qian, Yifei
AU - Zhang, Liangfei
AU - Guo, Zhongliang
AU - Hong, Xiaopeng
AU - Arandjelović, Ognjen
AU - Donovan, Carl R.
N1 - Funding: This work is funded by the National Natural Science Foundation of China (62076195, 62206271), and the Fundamental Research Funds for the Central Universities (AUGA5710011522).
PY - 2024/10/21
Y1 - 2024/10/21
N2 - To alleviate the burden of labeling data to train crowd counting models, we propose a prototype-based learning approach for semi-supervised crowd counting with an embeded understanding of perspective. Our key idea is that image patches with the same density of people are likely to exhibit coherent appearance changes under similar perspective distortion, but differ significantly under varying distortions. Motivated by this observation, we construct multiple prototypes for each density level to capture variations in perspective. For labeled data, the prototype-based learning assists the regression task by regularizing the feature space and modeling the relationships within and across different density levels. For unlabeled data, the learnt perspective-embedded prototypes enhance differentiation between samples of the same density levels, allowing for a more nuanced assessment of the predictions. By incorporating regression results, we categorize unlabeled samples as reliable or unreliable, applying tailored consistency learning strategies to enhance model accuracy and generalization. Since the perspective information is often unavailable, we propose a novel pseudo-label assigner based on perspective self-organization which requires no additional annotations and assigns image regions to distinct spatial density groups, which mainly reflect the differences in average density among regions. Extensive experiments on four crowd counting benchmarks demonstrate the effectiveness of our approach.
AB - To alleviate the burden of labeling data to train crowd counting models, we propose a prototype-based learning approach for semi-supervised crowd counting with an embeded understanding of perspective. Our key idea is that image patches with the same density of people are likely to exhibit coherent appearance changes under similar perspective distortion, but differ significantly under varying distortions. Motivated by this observation, we construct multiple prototypes for each density level to capture variations in perspective. For labeled data, the prototype-based learning assists the regression task by regularizing the feature space and modeling the relationships within and across different density levels. For unlabeled data, the learnt perspective-embedded prototypes enhance differentiation between samples of the same density levels, allowing for a more nuanced assessment of the predictions. By incorporating regression results, we categorize unlabeled samples as reliable or unreliable, applying tailored consistency learning strategies to enhance model accuracy and generalization. Since the perspective information is often unavailable, we propose a novel pseudo-label assigner based on perspective self-organization which requires no additional annotations and assigns image regions to distinct spatial density groups, which mainly reflect the differences in average density among regions. Extensive experiments on four crowd counting benchmarks demonstrate the effectiveness of our approach.
KW - Perspective analysis
KW - Representation learning
KW - Task analysis
KW - Consistency regularization
U2 - 10.1016/j.patcog.2024.111073
DO - 10.1016/j.patcog.2024.111073
M3 - Article
SN - 0031-3203
VL - 158
JO - Pattern Recognition
JF - Pattern Recognition
M1 - 111073
ER -