Human10: A real-world human body dataset

Human10 is a real-world human body dataset, which contains 10 4D sequences of human actions and 10054 frames in total. For each frame, there are depthmaps and masks of 4 views along with corresponding triangle meshs. In addition, 4 fixed camera calibration parameters are provided in each sequence.

    author = {Cao, Yan-Pei and Liu, Zheng-Ning and Kuang, Zheng-Fei and Kobbelt, Leif and Hu, Shi-Min},
    title = {Learning to Reconstruct High-quality 3D Shapes with Cascaded Fully Convolutional Networks},
    booktitle = {The European Conference on Computer Vision (ECCV)},
    month = {September},
    year = {2018}


CTW: Chinese Text in the Wild

We provide a newly created dataset of Chinese text with about 1 million Chinese characters annotated by experts in over 30 thousand street view images. This is a challenging dataset with good diversity. It contains planar text, raised text, text in cities, text in rural areas, text under poor illumination, distant text, partially occluded text, etc. For each character in the dataset, the annotation includes its underlying character, its bounding box, and 6 attributes. The attributes indicate whether it has complex background, whether it is raised, whether it is handwritten or printed, etc.

Tsinghua-Tencent 100k

Tsinghua-Tencent 100k: Traffic-Sign Detection and Classification in the Wild

Although promising results have been achieved in the areas of traffic-sign detection and classification, few works have provided simultaneous solutions to these two tasks for realistic real world images. We make two contributions to this problem. Firstly, we have created a large traffic-sign benchmark from 100000 Tencent Street View panoramas, going beyond previous benchmarks. We call this benchmark Tsinghua-Tencent 100K. It provides 100000 images containing 30000 traffic-sign instances. These images cover large variations in illuminance and weather conditions. Each traffic-sign in the benchmark is annotated with a class label, its bounding box and pixel mask. Secondly, we demonstrate how a robust end-to-end convolutional neural network (CNN) can simultaneously detect and classify traffic-signs. Most previous CNN image processing solutions target objects that occupy a large proportion of an image, and such networks do not work well for target objects occupying only a small fraction of an image like the traffic-signs here. Experimental results show the robustness of our network and its superiority to alternatives. The benchmark, source code and the CNN model introduced in this paper is publicly available.

    author = {Zhu, Zhe and Liang, Dun and Zhang, Songhai and Huang, Xiaolei and Li, Baoli and Hu, Shimin},
    title = {Traffic-Sign Detection and Classification in the Wild},
    booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year = {2016}