Short and long range relation based spatio-temporal transformer for micro-expression recognition

Liangfei Zhang, Xiaopeng Hong, Ognjen Arandjelovic, Guoying Zhao

Research output: Contribution to journalArticlepeer-review

6 Downloads (Pure)


Being spontaneous, micro-expressions are useful in the inference of a person's true emotions even if an attempt is made to conceal them. Due to their short duration and low intensity, the recognition of micro-expressions is a difficult task in affective computing. The early work based on handcrafted spatio-temporal features which showed some promise, has recently been superseded by different deep learning approaches which now compete for the state of the art performance. Nevertheless, the problem of capturing both local and global spatio-temporal patterns remains challenging. To this end, herein we propose a novel spatio-temporal transformer architecture – to the best of our knowledge, the first purely transformer based approach (i.e. void of any convolutional network use) for micro-expression recognition. The architecture comprises a spatial encoder which learns spatial patterns, a temporal aggregator for temporal dimension analysis, and a classification head. A comprehensive evaluation on three widely used spontaneous micro-expression data sets, namely SMIC-HS, CASME II and SAMM, shows that the proposed approach consistently outperforms the state of the art, and is the first framework in the published literature on micro-expression recognition to achieve the unweighted F1-score greater than 0.9 on any of the aforementioned data sets.
Original languageEnglish
Pages (from-to)1973-1985
Number of pages13
JournalIEEE Transactions on Affective Computing
Issue number4
Early online date10 Oct 2022
Publication statusPublished - 10 Oct 2022


  • Emotion recognition
  • Long-term optical flow
  • Self-attention mechanism
  • Temporal aggregator


Dive into the research topics of 'Short and long range relation based spatio-temporal transformer for micro-expression recognition'. Together they form a unique fingerprint.

Cite this