TY - CONF
T1 - Segmentation Assisted U-shaped Multi-scale Transformer for Crowd Counting
AU - Qian, Yifei
AU - Zhang, Liangfei
AU - Hong, Xiaopeng
AU - Donovan, Carl R.
AU - Arandjelović, Ognjen
N1 - Funding Information:
This work is supported by the Fundamental Research Funds for the Central Universities (AUGA5710011522). The authors would like to thank the China Scholarship Council - University of St Andrews Scholarships (No.201908060250) funds L. Zhang for her PhD.
Funding Information:
This work is supported by the Fundamental Research Funds for the Central Universities (AUGA5710011522). The authors would like to thank the China Scholarship Council – University of St Andrews Scholarships (No.201908060250) funds L. Zhang for her PhD.
Publisher Copyright:
© 2022. The copyright of this document resides with its authors. It may be distributed unchanged freely in print or electronic forms.
PY - 2022
Y1 - 2022
N2 - Automated crowd counting has made remarkable progress recently in computer vision thanks to the development of CNNs. However, this application area has run into bottlenecks since CNNs, by their nature, are limited by locally attentive receptive fields and are incapable of modelling larger-scale dependencies. To address this problem, we introduce a multi-scale transformer-based crowd-counting network, termed Crowd U-Transformer (CUT) which extracts and aggregates semantic and spatial features from multiple levels. In this design, we use crowd segmentation as an attention module to gain fine-grained features. Also, we propose a loss function that better focuses on the counting performance in the foreground area. Experimental results on four widely used benchmarks are presented and our method shows state-of-the-art performances.
AB - Automated crowd counting has made remarkable progress recently in computer vision thanks to the development of CNNs. However, this application area has run into bottlenecks since CNNs, by their nature, are limited by locally attentive receptive fields and are incapable of modelling larger-scale dependencies. To address this problem, we introduce a multi-scale transformer-based crowd-counting network, termed Crowd U-Transformer (CUT) which extracts and aggregates semantic and spatial features from multiple levels. In this design, we use crowd segmentation as an attention module to gain fine-grained features. Also, we propose a loss function that better focuses on the counting performance in the foreground area. Experimental results on four widely used benchmarks are presented and our method shows state-of-the-art performances.
UR - http://www.scopus.com/inward/record.url?scp=85158972295&partnerID=8YFLogxK
M3 - Paper
AN - SCOPUS:85158972295
T2 - 33rd British Machine Vision Conference Proceedings, BMVC 2022
Y2 - 21 November 2022 through 24 November 2022
ER -