Recorder

Our recording framework consists of a variety of high-precision motion track and visual streaming systems, which are bundled with a suite of ancillary devices, tools and customized scripts. The framework supports high-quality synchronized streaming, saving and visualization of human throw&catch activities from multiple sensors with varying sampling rates and data formats.

If you intend to record human demonstrations similar to ours, we provide step-by-step instructions and scripts in the dataset’s GitHub repository.

Device Details

Our recording framework utilizes multiple motion track and visual streaming systems, with their specifications briefly outlined below. We encourage users to consult our technical paper for a quick overview of their deployment and refer to the official product pages for detailed technical specifications.

Device Manufacturer Recording Content FPS Resolution
① Gloves StretchSense MoCap Pro Primary’s Hand Pose 120 -
②⑤ Tracker OptiTrack 6D Head Pose 240 -
③ Event Camera Prophesee Event - 1280x720
④ ZED Camera Stereolabs RGB-D 60 1280x720


We implement the Precision Time Protocol (PTP) to synchronize their clocks with a precision of sub-milliseconds (approximately 0.3 ms). The maximum offset across data streams is ≤ 1 frame at 60 FPS, as assessed during manual annotation. We encourage users to consult the provided technical document in the GitHub repository, which offers a detailed explanation of how we process and synchronize each data modality.