ClusterSLAM Dataset

ClusterSLAM synthetic dataset download page

Overview of the synthetic dataset
Overview of the synthetic dataset


ClusterSLAM is a practical backend for stereo visual SLAM which can simultaneously discover individual rigid bodies and compute their motions in dynamic environments. It has been demonstrated to show its effectiveness for simultaneous tracking of ego-motion and multiple objects. We release the 10 dynamic sequences rendered and simulated using SUNCG and CARLA dataset to facilitate research in our community.

Dataset Statistics

The statistics are listed as follows. In total we have over 3000 frames with 60+ different dynamic instances.

Sequence Name # Frames # Dyn. Obj. # Landmarks Total Dist. (m) Download Link
SUNCG-1-1 190 2 748 1.94 Google Drive
SUNCG-1-2 250 2 2595 21.10 Google Drive
SUNCG-2-1 300 3 381 6.03 Google Drive
SUNCG-2-2 200 3 370 6.01 Google Drive
SUNCG-3-1 200 5 554 3.61 Google Drive
SUNCG-3-2 200 5 620 11.37 Google Drive
CARLA-S1 200 5 2402 120.92 Google Drive
CARLA-S2 200 8 4179 164.70 Google Drive
CARLA-L1 750 14 13600 480.87 Google Drive
CARLA-L2 600 17 10486 367.62 Google Drive

Data Format

We have zipped each sequence into individual packs with the name <Sequence Name>.tar.gz. The unzipped folder structure is as following:

.                       // unzipped base folder
├── images
│   ├── left
│   │   └── %04d.png    // {# Frames} images captured from left camera.
│   └── right
│       └── %04d.png    // {# Frames} images captured from right camera.
├── landmarks
│   ├── left
│   │   └── %04d.txt    // Detected features from left camera.
│   └── right
│       └── %04d.txt    // Detected features from right camera.
├── pose
│   └── %04d.txt        // Trajectory of camera and moving instances.
├── shapes
│   └── %d.pcd          // Point cloud of the static scene and dynamic shapes.
├── instrinsic.txt      // Stereo camera intrinsic.
└── landmark_mapping.txt	// Landmark to cluster id mapping.

The format of each line of the feature text file is:

<landmark id> <u> <v>

For each trajectory file under pose/ directory, each line in each file represents the pose of each cluster except for the first line - that line represents the camera pose. You may notice that for some frames pose is still valid for invisible cluster - during our evaluation we eliminate these pose. The format for the pose is:

<x> <y> <z>  <qx> <qy> <qz> <qw>
Translation       Rotation

which can be read simply using pyquaternion or Eigen library.

The point cloud file in the shapes/ folder is the ground-truth point cloud for both static scene and dynamic clusters. For the moving instances, by applying the point cloud with the transforms in pose/, you can get their absolute world coordinates in each frame.

The file intrinsic.txt has two $3 \times 4$ projection matrices for the left and right camera, respectively. There is no rotation between cameras so stereo rectifying is not necessary.

Lastly, Each line in landmark_mapping.txt maps from landmark id to cluster id, this cluster id shares the same indices as the ground-truth point cloud and the line ordering of the trajectories files. Each line has the following format:

<landmark id> <cluster id>


  • Email:

  • Citation:

  title={ClusterSLAM: A SLAM Backend for Simultaneous Rigid Body Clustering and Motion Estimation},
  author={Huang, Jiahui and Yang, Sheng and Zhao, Zishuo and Lai, Yu-Kun and Hu, Shi-Min},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},

Change Log

  • March, 2020: The 10 dynamic sequences used in our ICCV 2019 paper are released.

Terms of Use