Semi-supervised crowd counting with contextual modeling: facilitating holistic understanding of crowd scenes

Research output: Contribution to journalArticlepeer-review


To alleviate the heavy annotation burden for training a reliable crowd counting model and thus make the model more practicable and accurate by being able to benefit from more data, this paper presents a new semi-supervised method based on the mean teacher framework. When there is a scarcity of labeled data available, the model is prone to overfit local patches. Within such contexts, the conventional approach of solely improving the accuracy of local patch predictions through unlabeled data proves inadequate. Consequently, we propose a more nuanced approach: fostering the model’s intrinsic ‘subitizing’ capability. This ability allows the model to accurately estimate the count in regions by leveraging its understanding of the crowd scenes, mirroring the human cognitive process. To achieve this goal, we apply masking on unlabeled data, guiding the model to make predictions for these masked patches based on the holistic cues. Furthermore, to help with feature learning, herein we incorporate a fine-grained density classification task. Our method is general and applicable to most existing crowd counting methods as it doesn’t have strict structural or loss constraints. In addition, we observe that the model trained with our framework shows strong contextual modeling capabilities, which allows it to make robust predictions even when some local details of patches are lost. Our method achieves the state-of-the-art performance, surpassing previous approaches by a large margin on challenging benchmarks such as ShanghaiTech A and UCF-QNRF. The code is available at:
Original languageEnglish
Pages (from-to)1-12
Number of pages12
JournalIEEE Transactions on Circuits and Systems for Video Technology
VolumeEarly Access
Publication statusPublished - 22 Apr 2024


  • Context modeling
  • Crowd Analysis
  • Data models
  • Dense prediction
  • Head
  • Mask Regularization
  • Measurement uncertainty
  • Predictive models
  • Scene understanding
  • Task analysis
  • Uncertainty


Dive into the research topics of 'Semi-supervised crowd counting with contextual modeling: facilitating holistic understanding of crowd scenes'. Together they form a unique fingerprint.

Cite this