We are proud to announce the EPIC-KITCHENS VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced and cooked - where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands. VISOR introduces an annotation pipeline, AI-powered in parts, for scalability and quality, and introduces:
271K masks covering 36 hours of untrimmed video
14.9M high quality automatic interpolations
Goal: Track segments through video and occlusion
Goal: Identify contact with 67K in-hand object masks
Goal: Name and point to where things came from with 222 test cases
You can click through and see the annotations for a full sequence. We show the image on the left, the annotations on the right, and the legend for the annotation below.
Part of VISOR is a new collection of 14.9M new masks that are interpolated between our sparse annotation.
Click on
any of the images below to see some clips of new dense annotations.
Mouseover an image and you can see what hands are up to in EPIC-KITCHENS. We'll show you a hand that's at your mouse cursor.
VISOR is now available for download.
Annotation and sparse frames are available at the University of Bristol data repository, data.bris, at https://doi.org/10.5523/bris.2v6cgv1x04ol22qp9rm9x2j6a7
The above repos contain everything you need to replicate our paper's results and visualise annotations. We are not releasing any further code or models.
Read our NeurIPS 2022 paper EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations on ArXiv and Open Review
When using these annotations, cite our EPIC-KITCHENS VISOR Benchmark paper:
@inproceedings{VISOR2022,
title={EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations},
author={Darkhalil, Ahmad and Shan, Dandan and Zhu, Bin and Ma, Jian and Kar, Amlan and Higgins, Richard and Fidler, Sanja and Fouhey, David and Damen, Dima},
booktitle = {Proceedings of the Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks},
year = {2022}
}
Also cite the EPIC-KITCHENS-100 paper where the videos originate:
@ARTICLE{Damen2022RESCALING,
title={Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100},
author={Damen, Dima and Doughty, Hazel and Farinella, Giovanni Maria and and Furnari, Antonino
and Ma, Jian and Kazakos, Evangelos and Moltisanti, Davide and Munro, Jonathan
and Perrett, Toby and Price, Will and Wray, Michael},
journal = {International Journal of Computer Vision (IJCV)},
year = {2022},
volume = {130},
pages = {33–55},
Url = {https://doi.org/10.1007/s11263-021-01531-2}
}
The underlying data that power VISOR, EPIC-KITCHENS-55 and EPIC-KITCHENS-100, were collected as a tool for research in computer vision. The dataset may have unintended biases (including those of a societal, gender or racial nature).
The VISOR dataset is copyright by us and published under the Creative Commons Attribution-NonCommercial 4.0 International License. This means that you must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. You may not use the material for commercial purposes.
For commercial licenses of EPIC-KITCHENS and VISOR annotations, email us at uob-epic-kitchens@bristol.ac.uk
The work on VISOR was supported by the following: