|Description (include details on usage, files and paper references)||The YouTube-Objects dataset is composed of videos collected from YouTube by querying for the names of 10 object classes. It contains between 9 and 24 videos for each class. The duration of each video varies between 30 seconds and 3 minutes. The videos are weakly annotated, i.e. we ensure that each video contains at one object of the corresponding class.
In addition to the videos, this release also includes several materials from our paper 
Bounding-boxes annotations. For evaluation purposes we annotated the object location in a few hundred video frames for each class (see sec. 6.1 ).
Point tracks and motion segments. As produced by .
Tubes. Spatio-temporal bounding-boxes as described in section 3.2 . We include all candidate tubes (yellow in the fig. above) as well as the tube automatically selected by our method (blue).
 A. Prest, C. Leistner, J. Civera, C. Schmid and V. Ferrari.
Learning Object Class Detectors fromWeakly Annotated Video
Computer Vision and Pattern Recognition (CVPR), 2012.