Research has shown the complementarity of camera- and inertial-based data for modeling human activities, yet datasets with both egocentric video and inertial-based sensor data remain scarce. In this paper, we introduce WEAR, an outdoor sports dataset for both vision- and inertial-based human activity recognition (HAR). Data from 22 participants performing a total of 18 different workout activities was collected with synchronized inertial (acceleration) and camera (egocentric video) data recorded at 11 different outside locations. WEAR provides a challenging prediction scenario in changing outdoor environments using a sensor placement, in line with recent trends in real-world applications. Benchmark results show that through our sensor placement, each modality interestingly offers complementary strengths and weaknesses in their prediction performance. Further, in light of the recent success of single-stage Temporal Action Localization (TAL) models, we demonstrate their versatility of not only being trained using visual data, but also using raw inertial data and being capable to fuse both modalities by means of simple concatenation.
We provide subject-wise raw and processed acceleration and egocentric-video data. 3D-accelerometer data (50Hz±8g) was collected using four open-source Bangle.js smartwatches running a custom, open-source firmware. The watches and were placed in a fixed orientation on the left and right wrists and ankles of each participant (see Figure below). Egocentric video data (1080p@60FPS) was captured using a GoPro Hero 8 action camera, which was mounted using a head strap on each participant's head. The camera was tilted downwards in a 45 degree angle during recording.
Each participant performed a set of 18 workout activities. These activities include running-, stretching- and strength-based exercises, with base activities like push-ups being complemented with complex variations that alter and/ or extend the performed movement during the exercise. Activities were divided across multiple recording sessions, with each session consisting of uninterrupted data streams of all modalities. Each participant was tasked to perform each exercise for at least 90 seconds, but had the freedom to choose the order of activities and take breaks as desired.
Watch the recording of the talk I gave on WEAR at Prof. Dr. Schönlieb's group at the University of Cambridge (15.09.2023).
The full dataset can be downloaded this link. The download folder is divided into 3 subdirectories
In order to reproduce any experiments mentioned in the paper, please refer to the instructions provided in the GitHub repository.
If you want to contribute to the WEAR dataset and check out our How-To: Record your own data guide or get in touch with us via marius.bock@uni-siegen.de.
WEAR is offered under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. You are free to use, copy, and redistribute the material for non-commercial purposes provided you give appropriate credit, provide a link to the license, and indicate if changes were made. If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. You may not use the material for commercial purposes.
@article{bock2024wear,
author={Bock, Marius and Kuehne, Hilde and Van Laerhoven, Kristof and Moeller, Michael},
title = {WEAR: An Outdoor Sports Dataset for Wearable and Egocentric Activity Recognition},
year = {2024},
volume = {8},
number = {4},
journal = {Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. (IMWUT)},
numpages = {21},
articleno = {175},
doi = {10.1145/3699776},
url={https://dl.acm.org/doi/10.1145/3699776}
}