Abstract
Crowd counting is a computer vision task, which involves estimating the number of objects in images and videos. Traditional methods that rely on handcrafted features often fail to produce satisfactory results due to their limited ability to capture the intricate details and variations in real-world scenes. This research studies how to employ deep learning techniques to achieve robust counting system. Furthermore, recognizing the data-hungry nature of deep learning, this research also aims to reduce the need for annotated data, making deep counting models more practical.The research begins with an application in the ecology domain. Manned aircraft have been used in ecological monitoring and wildlife research for decades. However, post-processing the large volumes of data they capture presents significant challenges. We propose a simple method based on density map estimation, which achieves automatic counting with significantly improved accuracy over detection-based methods.
The focus then shifts to human counting. Automated crowd counting has made remarkable progress due to the development of convolutional neural networks (CNNs). However, the capability of these systems is limited by the locally attentive receptive fields of CNNs. To address this limitation, we introduce Crowd U-Transformer, a 'U-shaped' multi-scale Transformer network designed to capture long-term dependencies between pixels. Additionally, we propose a novel loss function that improves counting performance in foreground areas.
Training a counting system typically requires a large amount of annotated data, which is labor-intensive. Thus, this research extends to developing semi-supervised algorithms for crowd counting. Overfitting to local details is a common issue in semi-supervised crowd counting. To address this, we propose MRC-Crowd, a framework that enhances contextual modeling by guiding the model to make predictions for masked patches based on holistic cues.
Finally, to tackle perspective distortion, which causes significant appearance variations and heterogeneity in density distribution, we introduce another semi-supervised learning algorithm based on perspective-assisted prototype-based learning. This approach improves feature learning for patches of varying and similar density levels, enhancing the performance of semi-supervised crowd counting.
Date of Award | 3 Dec 2024 |
---|---|
Original language | English |
Awarding Institution |
|
Supervisor | Carl Robert Donovan (Supervisor) |
Keywords
- Crowd counting
- Pattern recognition
- Computer vision
- Neural networks
Access Status
- Full text open