EPIC-KITCHENS Dataset

EPIC-KITCHENS VISOR

We are proud to announce the EPIC-KITCHENS VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced and cooked - where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands. VISOR introduces an annotation pipeline, AI-powered in parts, for scalability and quality, and introduces:

Sparse Annotations

271K masks covering 36 hours of untrimmed video

Dense Annotations

14.9M high quality automatic interpolations

Video Object Segmentation

Goal: Track segments through video and occlusion

Hand Object Segmentation

Goal: Identify contact with 67K in-hand object masks

Where Did This Come From?

Goal: Name and point to where things came from with 222 test cases

Explore EPIC-KITCHENS VISOR

To get a sense of the data, feel free to explore some of the data in VISOR!

Interactive! Watch a Segment.

You can click through and see the annotations for a full sequence. We show the image on the left, the annotations on the right, and the legend for the annotation below.

Frame 1 / 402

Image

Annotation

drawer left hand right hand

Interactive! See Our Dense Annotations.

Part of VISOR is a new collection of 14.9M new masks that are interpolated between our sparse annotation.
Click on any of the images below to see some clips of new dense annotations.

Interactive! What are Hands Doing?

Mouseover an image and you can see what hands are up to in EPIC-KITCHENS. We'll show you a hand that's at your mouse cursor.

Move your mouse here!

Download Data

VISOR is now available for download.

Annotation and sparse frames are available at the University of Bristol data repository, data.bris, at https://doi.org/10.5523/bris.2v6cgv1x04ol22qp9rm9x2j6a7

Code

We make the following codes now public, which replicate the VISOR paper's baseline and provide visualisation support for the annotations

VISOR-VIS: Code to visualise segmentations
VISOR-FrameExtraction: Code to extract frames for dense annotations from the original video
VISOR-VOS: Code to perform semi-supervised video object segmentation. Models and code replicate our first benchmark
VISOR-HOS: Code to perform in-frame hand and active object segmentations. Models and code replicate our second baseline
VISOR-WDTCF: Code to replicate our taster benchmark: Where did this come from?

The above repos contain everything you need to replicate our paper's results and visualise annotations. We are not releasing any further code or models.

Paper and Citation

Read our NeurIPS 2022 paper EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations on ArXiv and Open Review

When using these annotations, cite our EPIC-KITCHENS VISOR Benchmark paper:

@inproceedings{VISOR2022,
           title={EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations},
           author={Darkhalil, Ahmad and Shan, Dandan and Zhu, Bin and Ma, Jian and Kar, Amlan and Higgins, Richard and Fidler, Sanja and Fouhey, David and Damen, Dima},
           booktitle   = {Proceedings of the Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks},
           year      = {2022}
}

Also cite the EPIC-KITCHENS-100 paper where the videos originate:

@ARTICLE{Damen2022RESCALING,
           title={Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100},
           author={Damen, Dima and Doughty, Hazel and Farinella, Giovanni Maria  and and Furnari, Antonino 
           and Ma, Jian and Kazakos, Evangelos and Moltisanti, Davide and Munro, Jonathan 
           and Perrett, Toby and Price, Will and Wray, Michael},
           journal   = {International Journal of Computer Vision (IJCV)},
           year      = {2022},
           volume = {130},
           pages = {33–55},
           Url       = {https://doi.org/10.1007/s11263-021-01531-2}
}

Disclaimer

The underlying data that power VISOR, EPIC-KITCHENS-55 and EPIC-KITCHENS-100, were collected as a tool for research in computer vision. The dataset may have unintended biases (including those of a societal, gender or racial nature).

Copyright

The VISOR dataset is copyright by us and published under the Creative Commons Attribution-NonCommercial 4.0 International License. This means that you must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. You may not use the material for commercial purposes.

For commercial licenses of EPIC-KITCHENS and VISOR annotations, email us at uob-epic-kitchens@bristol.ac.uk

The Team

VISOR is the result of a collaboration of the Universities of Bristol, Michigan, and Toronto.

Ahmad Dar Khalil*

University of Bristol

Dandan Shan*

University of Michigan

Bin Zhu*

University of Bristol

Jian Ma*

University of Bristol

Amlan Kar

University of Toronto

Richard Higgins

University of Michigan

Sanja Fidler

University of Toronto

David Fouhey

University of Michigan

Dima Damen

University of Bristol

Research Funding

The work on VISOR was supported by the following:

Segmentation annotations were funded by charitable unrestricted donation from Procter and Gamble as well as charitable unrestricted donation from DeepMind.
Research at the University of Bristol is supported by UKRI Engineering and Physical Sciences Research Council (EPSRC) Doctoral Training Program (DTP), EPSRC Fellowship UMPIRE (EP/T004991/1) and EPSRC Program Grant Visual AI (EP/T028572/1).
The project acknowledges the use of the ESPRC funded Tier 2 facility, JADE and University of Bristol's Blue Crystal 4 facility.
Research at the University of Michigan is based upon work supported by the National Science Foundation under Grant No. 2006619.
Research at the University of Toronto is in part sponsored by NSERC. S.F. also acknowledges support through the Canada CIFAR AI Chair program.

Watch the Trailer

EPIC-KITCHENS VISOR

Sparse Annotations

Dense Annotations

Video Object Segmentation

Hand Object Segmentation

Where Did This Come From?

Explore EPIC-KITCHENS VISOR

To get a sense of the data, feel free to explore some of the data in VISOR!

Interactive! Watch a Segment.

Interactive! See Our Dense Annotations.

Interactive! What are Hands Doing?

Download Data

Code

Paper and Citation

Disclaimer

Copyright

The Team

Ahmad Dar Khalil*

University of Bristol

Dandan Shan*

University of Michigan

Bin Zhu*

University of Bristol

Jian Ma*

University of Bristol

Amlan Kar

University of Toronto

Richard Higgins

University of Michigan

Sanja Fidler

University of Toronto

David Fouhey

University of Michigan

Dima Damen

University of Bristol

Research Funding