Abstract
Human activity recognition (HAR) has long been an active research topic as it enables us to infer human behaviors and daily routines from sensor data collected on wearables or on sensors embedded in a pervasive sensing environment. In recent years, deep learning has been widely used in HAR for feature extraction and multimodal fusion, and has achieved promising performance on activity recognition. However, they often require a large number of labeled data for training. To directly tackle this challenge, this paper proposes SelfVis, a novel visualization-based self-supervised learning technique, which aims to extract effective features without the need of labeled data. To achieve this goal, it encodes time-series IMU sensor readings into images and then employs ResNet, a pre-trained, state-of-the-art convolutional neural network (CNN) as the backbone feature extractor. It leverages the fact that there exist multiple sensors often being used and uses sensor identifications that are generated automatically as a prediction target during the self-supervised learning process. With these two, SelfVis has achieved high activity recognition accuracy even when only a small number of labeled data are available; that is, with only 1 training data, SelfVis has demonstrated the ability to achieve higher performance than state-of-the-art techniques by up to 0.46 in macro F1-scores.
Original language | English |
---|---|
Pages (from-to) | 1-12 |
Number of pages | 12 |
Journal | IEEE Transactions on Emerging Topics in Computing |
Volume | Early Access |
Early online date | 3 May 2024 |
DOIs | |
Publication status | E-pub ahead of print - 3 May 2024 |
Keywords
- Self-supervised learning
- Data visualization
- Feature extraction
- Visualization
- Human activity recognition
- Deep learning
- Data models