Yet Another Computer Vision Index To Datasets (YACVID)

This website provides a list of frequently used computer vision datasets. Wait, there is more!
There is also a description containing common problems, pitfalls and characteristics and now a searchable TAG cloud.
Plus, this is open for crowd editing (if you pass the ultimate turing test)! - Questions? yacvid [at] hayko [dot] at

Content, Design and Idea © by Hayko Riemenschneider, 2011-2016. Texts and Images are subject of copyright by the respective authors.

Hey! If you're reading this, why not help and update the description of the dataset you're working on?

Add a new dataset



2d   3d   4d   aachen   abdomen   abrupt   accelerometer   action   activity   address   adhead   adjustment   aerial   aesthetic   aesthetics   age   aic   aircraft   airplane   airport   ambiguous   analysis   and   anger   animal   animation   annotation   anomaly   apartment   api   appearance   applelogo   architecture   articulated   aspect   attention   attribute   attributes   authentication   autonomous   avoid   axis   babyface   background   balance   baseline   behavior   belgium   benchmark   benchmarking   bike   bilateral   binary   biology   biometric   biometry   blender   body   bone   bottle   boundingbox   brand   bremen   buffy   building   bullseye   bundle   bunny   byu   calibration   caltech   camera   canada   captioning   capture   car   cardinal   categorization   category   celebrity   cell   centered   chair   challenges;   change   chest   chromaticity   church   circle   cities   city   classification   clustering   clutter   cnn   co-segmentation   coco   code   codebook   coffee   color   community   comparison   conditions   constancy   context   contour   copyright   cosegmentation   counting   cover   cow   cross-view   crowd   ct   cutting   database;   dataset   dataset;   day   decomposition   deep   deformation   dense   depth   description   descriptor   detail   detection   detection;   dichromatic   disgust   disparity   dogs   domain   driving   dubrovnik   duplicate   dynamic   ear   ecocentric   edge   egocentric   ellipses   emotion   estimation   evaluation   event   expression   eye   facade   face   facial   fear   feature   field   fine-grained   fingerprints   fingertip   first-person   fish   fisheye   fitting   flickr   flight   floorplan   flow   fly   flying   food   foot   foreground   foreground;   fov   frames   frontview   fundus   gait   game   genetic   genome   geography   geometry   geotag   germany   gesture   getry   gif   giraffe   gis   global   google   gps   grammar   graphics   graz   ground   ground-truth;   groundtruth   group   hand   hands   handwritten   head   heart   heat   hierarchy   high-definition   highlight   highway   holes   horse   human   identification   illumination   image   imagenet   images   imdb   indoor   inertial   initialization   inserts   instance   intake   interaction   interactive   interest   internet   invariance   ir   isar   joy   keyframe   kimia   kinect   label   labeling   laboratory   landmark   lane   language   large   laser   lattice   layout   learning   letter   leuven   lidar   light   lightfield   lighting   limited   line   lisbon   liver   local   localization   location   logo   lowlevel   machine   manhattan   map   mask   match   matching   material   medial   medical   memorability   mesh   milling   mirror   mobile   model   modeling   modelling   monitoring   mono   montage   motion   motion-capture-data   motorbike   mouse   movement   movie   movies   moving   mpeg   mug   multi-camera;   multi-class   multi-mode   multi-sensor;   multi-view   multilabel   multiple   multitarget   multiview   naming   natural   nature   navigation   network   neutral   newyork   night   noise   nude   number   object   objects   occlusion   ocr   odometry   omnidirection   omnidirectional   open-view   optical   optimization   organ   osnabrueck   outdoor   overhead   overlap   oxford   pair   pairwise   panorama   panoramio   paris   parsing   part   partial   pasadena   pascal   patch   path   pedestrian   people   person   perspective   photogrammetry   physics   pittsburgh   place   plane   planning   point   pointcloud   popularity   pornography   pose   presentation   pressure   primitive   procedural   profile   proposal   ptz   quality   radar   randomnoise   rank   ranking   ransac   rate   ratio   re-identification   real   recognition   recognition;   reconstruction   rectification   rectified   reflection   registration   regular   remote   removal   repetition   retina   retinal   retrieval   rgb   rgbd   road   robot   robust   rome   room   rotation   sad   saliency   sampling   sanfrancisco   scale   scan   scanner   scene   scenes   search   segmentation   semantic   sense   sensing   sequence   sfm   shadow   shadows   shape   shapes   sheffield   shoes   shots   shutter   sideview   sign   similarity   single   singletarget   singleview   skeleton   sketch   skin   sky   slam   soccer   social   software   source   spain   sphere   sport   stability   stabilization   static   stationary   stereo   stereovision   stochastic   street   streetside   streetview   structure   structure-from-motion   structures   study   stuff   stylization   subpixel   subtraction;   summarization   summary   superresolution   supervised   surface   surprise   surveillance   swan   switzerland   symmetry   synthetic   target   taxonomy   temporal   text   texture   therapy   thermal   things   time   time-series   tiny   tool   tools   tracking   traffic   trajectory   transfer   transportation   triangulation   truth   tuberculosis   type   uas   ultrasound   understanding   uneven   unmanned   unsupervised   urban   user   vanishing   variation   vehicle   vehicles   video   video2gif   videos   videosurveillance   view   viewpoint   vision   visual   volleyball   vt   water   weakly   wear   wearable   weather   webcam   white   wide   wikipedia   wild   world   xray   year   zoom   zurich  
«showing 524 tags of 524 total tags for 372 datasets (1.41) »


scene
DID Name Description Tags URL Date Views
363 ETH/Yahoo Video2Gif dataset The Video2GIF dataset contains over 100,000 pairs of GIFs and their source videos. The GIFs were collected from two popular GIF websites (makeagif.com, gifsoup.... video2gif highlight video summarization gif summary scene understanding link 2017-02-22 79
291 MIT Places205 Places205 dataase contains 2.5 million images from 205 scene categories for the academic public. The image dataset contains 2,448,873 images from 205 scene c... place recognition urban scene feature learning link 2016-02-24 608
263 Crowd Dataset The crowd datasets are collected from a variety of sources, such as UCF and data-driven crowd datasets. The sequences are diverse, representing dense crowd in t... crowd video detection anomaly scene understanding human pedestrian link 2016-11-05 1117
239 CUHK crowd dataset CUHK crowd dataset introduces the largest publicly available crowd dataset of 474 videos from 215 crowded scenes. It has been used in the paper: Scene-Ind... crowd analysis, group detection and analysis, scene understanding link 2016-09-14 1029
234 UMD Dynamic Scene Recognition The UMD Dynamic Scene Recognition dataset consists of 13 classes and 10 videos per class and is used to classify dynamic scenes. The dataset has been describ... scene recognition classification dynamic video motion link 2017-01-05 775
228 MPI VehicleScenes Abstract Scene understanding has (again) become a focus of computer vision research, leveraging advances in detection, context modeling, and tracking. In thi... semantic segmentation scene understanding classification 3d car pedestrian link 2014-06-10 917
212 Polo Instance Segmentation The Polo instance segmentation dataset is a semantic segmentation task for Hough transform based segmentation masks. It consists of supervised segmentation for ... semantic segmentation horse human outdoor mask scene understanding n/a 2016-01-21 658
181 All I Have Seen (AIHS) The All I Have Seen (AIHS) dataset is created to study the properties of total visual input in humans, for around two weeks Nebojsa Jojic wore a camera capturin... video summary user study clustering similarity outdoor indoor scene 3d link 2013-09-05 598
101 CIFAR-10 / 100 The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. ... classification, tiny, color, patch, scene, object link 2013-08-08 628
20 CALTECH 256 The CALTECH 256 dataset by Li Fei-Fei contains 30607 images for 256 categories.... classification centered object scene image link 2013-08-08 637
19 CALTECH 101 The CALTECH 101 dataset by Li Fei-Fei contains images for 101 categories with about 40 to 800 images per category. Most categories have about 50 images at rough... classification centered object scene image link 2013-08-08 673


total views: 7719 5 queries in 3.3140182495117E-5s 8.5115432739258E-5s 0.00014901161193848s 2.0980834960938E-5s 0.0010130405426025s and total 0.0066521167755127s