Overview

Tencent-MVSE is a large-scale benchmark dataset for the multi-modal video similarity evaluation task. The features of Tencent-MVSE includes

  • A total of 1,135,705 videos, including 1 million videos for unsupervise pre-training 63,613 videos for training, and 63,960 videos for evaluation.
  • Rich meta-data for each video, including title, ASR text, categories, and tags.
  • We annotate 67,854 video pairs for training and 67,887 video pairs for evaluation.
  • The 328 categories and 64,903 tags are all manually annotated by human.


Download

You can download the Tencent-MVSE dataset from here.

The folder is in the following architecture:

tencent-mvse/
├── annotations
│   ├── pairwise.json
│   ├── pairwise.tsv
│   ├── pointwise.json
│   ├── test-dev.json
│   ├── test_dev.tsv
│   ├── test-std.json
│   └── test_std.tsv
└── features
    ├── clip
    │   ├── pairwise.zip
    │   ├── pointwise_0.zip
    │   ├── pointwise_1.zip
    │   ├── ...
    │   ├── pointwise_20.zip
    │   ├── test-dev.zip
    │   └── test-std.zip
    ├── efficientnetb3
    │   └── ...
    └── resnet50
        └── ...

where *.json store the meta-data, *.tsv are annotation scores, and *.zip contain video features.

Statistics




Citation

If you intend to publish results based on the Tencent-MVSE dataset, please kindly include the following reference:

@inproceedings{zeng2019tencent,
  title={Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation},
  author={Zhaoyang Zeng, Yongsheng Luo, Zhenhua Liu, Fengyun Rao, Dian Li, Weidong Guo, Zhen Wen},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2022}
}
Copyright © 2022. Tencent QQ Browser Lab. All rights reserved.