Yet Another Computer Vision Index To Datasets (YACVID)

This website provides a list of frequently used computer vision datasets. Wait, there is more!
There is also a description containing common problems, pitfalls and characteristics and now a searchable TAG cloud.
Plus, this is open for crowd editing (if you pass the ultimate turing test)! - Questions? yacvid [at] hayko [dot] at

Content, Design and Idea © by Hayko Riemenschneider, 2011-2016. Texts and Images are subject of copyright by the respective authors.

Hey! If you're reading this, why not help and update the description of the dataset you're working on?

Add a new dataset



2d   3d   4d   aachen   abdomen   abrupt   accelerometer   action   activity   address   adhead   adjustment   aerial   aesthetic   aesthetics   age   aic   aircraft   airplane   airport   alignment   amazon   ambiguous   analysis   and   anger   animal   animation   annotation   anomaly   apartment   api   appearance   applelogo   architecture   articulated   aspect   attention   attribute   attributes   authentication   automatic   autonomous   avoid   axis   babyface   background   balance   baseline   behavior   belgium   benchmark   benchmarking   bike   bilateral   binary   biology   biometric   biometry   blender   blur   body   bone   bottle   boundingbox   brand   bremen   buffy   building   bullseye   bundle   bunny   byu   cad   calibration   caltech   camera   canada   captioning   capture   car   cardinal   categorization   category   celebrity   cell   centered   chair   challenge   change   chemistry   chest   chromaticity   church   circle   cities   city   classification   clothing   clustering   clutter   cnn   co-segmentation   coco   code   codebook   coffee   color   community   comparison   conditions   constancy   context   contour   cooking   copyright   cosegmentation   counting   cover   cow   crepe   cross-view   crowd   ct   cutting   dance   data   dataset   day   decomposition   deep   defocus   deformation   dense   depth   description   descriptor   detail   detection   dichromatic   disgust   disparity   dogs   domain   driving   dubrovnik   duplicate   dynamic   ear   ecocentric   edge   egocentric   ellipses   emotion   endtoend   estimation   evaluation   event   expression   eye   facade   face   facial   fear   feature   field   fine-grained   fingerprints   fingertip   first-person   fish   fisheye   fitting   flickr   flight   floorplan   flow   fly   flying   food   foot   foreground   fov   frames   frontview   fundus   gait   game   gaze   gender   genetic   genome   geography   geometry   geotag   geotagged   germany   gesture   getry   gif   giraffe   gis   global   google   gps   grammar   graphics   graz   ground   groundtruth   group   hand   hands   handwritten   hd   head   heart   heat   hierarchy   high-definition   highlight   highway   holes   horse   human   identification   illumination   image   imagenet   images   imdb   indoor   inertial   initialization   inserts   instance   intake   interaction   interactive   interest   internet   invariance   ir   isar   joy   kernels   keyframe   kimia   kinect   label   labeling   laboratory   landmark   lane   language   large   large-scale   laser   lattice   layout   learning   letter   leuven   lidar   light   lightfield   lighting   limited   line   lisbon   liver   local   localization   location   logo   lowlevel   machine   manhattan   map   mask   match   matching   material   medial   medical   medicine   memorability   mesh   metadata   milling   mirror   mobile   model   modeling   modelling   monitoring   mono   montage   motion   motion-capture-data   motorbike   mouse   movement   movie   mpeg   mug   multi-camera   multi-class   multi-mode   multi-sensor   multi-spectral   multi-view   multilabel   multiple   multitarget   multiview   naming   natural   nature   navigation   network   neutral   newyork   night   noise   normal   nude   number   object   objects   occlusion   ocr   odometry   omnidirection   omnidirectional   open-view   operation   optical   optimization   organ   original   osnabrueck   outdoor   overhead   overlap   oxford   pair   pairwise   panorama   panoramio   parallel   paris   parsing   part   partial   pasadena   pascal   patch   path   pattern   pedestrian   people   person   perspective   phase   photogrammetry   physics   pittsburgh   place   plane   planning   point   pointcloud   polygon   popularity   pornography   pose   presentation   pressure   primitive   privacy   procedural   profile   proposal   ptz   quality   question   radar   randomnoise   rank   ranking   ransac   rate   ratio   re-identification   real   realism   recipe   recognition   reconstruction   rectification   rectified   reflection   registration   regression   regular   reidentification   remote   removal   rendering   repetition   resolution   retina   retinal   retrieval   rgb   rgbd   road   robot   robust   rome   room   rotation   sad   saliency   sampling   sanfrancisco   satellite   scale   scan   scanner   scene   scenes   search   segmentation   semantic   sense   sensing   sequence   sfm   shadow   shadows   shape   shapes   sheffield   shoes   shots   shutter   sideview   sign   similarity   simultaneous   single   singletarget   singleview   skeleton   sketch   skin   sky   slam   soccer   social   software   source   space   spain   sphere   sport   stability   stabilization   static   stationary   stereo   stereovision   stochastic   street   streetside   streetview   structure   structure-from-motion   structured   structures   study   stuff   stylization   subpixel   subtraction   summarization   summary   superresolution   supervised   surface   surgery   surprise   surveillance   swan   switzerland   symmetry   synthetic   table   target   taxonomy   temporal   text   texture   texture-less   therapy   thermal   things   time   time-series   tiny   tool   tools   top-view   tracking   traffic   trajectory   transfer   transportation   triangulation   truth   tuberculosis   type   uas   ultrasound   understanding   uneven   unmanned   unsupervised   urban   user   vanishing   variation   vehicle   vehicles   video   videosurveillance   view   viewpoint   vision   visual   volleyball   vqa   vt   water   wavelength   weakly   wear   wearable   weather   webcam   white   wide   wikipedia   wild   workflow   world   xray   year   zoom   zurich  
«showing 562 tags of 562 total tags for 409 datasets (1.37) »


indoor
DID Name Description Tags URL Date Views
394 Matterport 2D-3D-Semantics Data The 2D-3D-S dataset provides a variety of mutually registered modalities from 2D, 2.5D and 3D domains, with instance-level semantic and geometric annotations. I... 3d panorama semantic segmentation depth normal indoor building reconstruction large-scale link 2017-07-27 52
378 TVPR (Top View Person Re-identification) The TVPR dataset includes 23 registration sessions. Each of the 23 folders contains the video of one registration session. Acquisitions have been performed duri... person reidentification identification recognition people gender clothing video depth top-view indoor link 2017-05-10 113
376 ScanNet ScanNet is an RGB-D video dataset containing 2.5 million views in more than 1500 scans, annotated with 3D camera poses, surface reconstructions, and instance-le... scene indoor synthetic cad room layout rendering realism 3d segmentation object recognition link 2017-05-12 160
375 SUNCG: Indoor Scenes The SUNCG dataset is a Large 3D Model Repository for Indoor Scenes. SUNCG is an ongoing effort to establish a richly-annotated, large-scale dataset of 3D s... scene indoor synthetic room layout rendering realism 3d segmentation object recognition link 2017-05-02 133
374 SceneNet RGB-D Synthetic Indoor SceneNet RGB-D is dataset comprised of 5 million Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth. It expands the previous work of Scene... scene indoor synthetic robot navigation rendering 3d reconstruction trajectory lighting segmentation slam link 2017-05-02 119
366 Multi-Camera Action Dataset An indoor action recognition dataset which consists of 18 classes performed by 20 individuals. Each action is individually performed for 8 times (4 daytime and ... indoor video Multi-Camera Action Recognition Cross-View Recognition Open-View Recognition link 2017-09-12 173
355 IMPART multi-modal/multi-view The multi-modal/multi-view datasets are created in a cooperation between University of Surrey and Double Negative within the EU FP7 IMPART project. The sourc... multi-view multi-mode video rgbd lidar 3d model color indoor outdoor dynamic action face human emotion link 2017-01-01 300
331 EuRoC MAV Dataset This web page presents visual-inertial datasets collected on-board a Micro Aerial Vehicle (MAV). The datasets contain stereo images, synchronized IMU measuremen... aerial vehicles, indoor, global shutter, slam link 2016-07-18 617
327 PIROPO Database: People in Indoor ROoms with Perspective and Omnidirectional cameras The PIROPO database (People in Indoor ROoms with Perspective and Omnidirectional cameras) comprises multiple sequences recorded in two different indoor rooms, u... people surveillance perspective omnidirectional fisheye indoor room detection human link 2017-02-16 695
295 Rent3D The Rent3D dataset comprises floorplans and images. The goal of this work is to enable a 3D virtual-tour of an apartment given a small set of monocular images o... indoor building reconstruction layout floorplan apartment urban link 2015-07-13 550
286 HDA Person Dataset - ISR Lisbon The High Definition Analytics (HDA) dataset is a multi-camera High-Resolution image sequence dataset for research on High-Definition surveillance: Pedestrian De... Video Surveillance Pedestrian Detection Re-Identification Multiview Tracking Benchmark Indoor High-Definition Camera Network lisbon human link 2017-01-26 1587
271 Labeling in 3D Scenes This dataset package contains the software and data used for Detection-based Object Labeling on the RGB-D Scenes Dataset as implemented in the paper: Detecti... 3d kinect reconstruction indoor depth object recognition link 2015-03-16 707
270 B3DO: Berkeley 3D Object Dataset For the first few decades of the fields existence, computer vision has been focused on algorithmic, logical approaches to perception. But it was only with the a... 3d kinect reconstruction indoor depth object recognition link 2015-03-16 637
181 All I Have Seen (AIHS) The All I Have Seen (AIHS) dataset is created to study the properties of total visual input in humans, for around two weeks Nebojsa Jojic wore a camera capturin... video summary user study clustering similarity outdoor indoor scene 3d link 2013-09-05 723
168 Mall Dataset The Mall dataset was collected from a publicly accessible webcam for crowd counting and profiling research. Ground truth: Over 60,000 pedestrians were label... detection tracking crowd counting pedestrian indoor video webcam link 2016-12-06 1535
166 ICG Multi-Camera Datasets The ICG Multi-Camera datasets consist of Easy Data Set (just one person) Medium Data Set (3-5 persons, used for the experiments) Hard Data Set (crowded sc... multiview pedestrian tracking detection object camera calibration graz indoor video multitarget link 2015-06-19 1144
163 TUGRAZ ICG Longterm Pedestrian Dataset The Longterm Pedestrian dataset consists of images from a stationary camera running 24 hours for 7 days at about 1 fps. It used for adaptive detection and back... pedestrian change detection background illumination robust indoor coffee graz multitarget link 2015-06-19 964
104 Make3D Depth The Make3D Depth dataset s designed to learn features to estimate scene depth from a single image. This dataset contains aligned image and range data: Make3... depth, learning, single view, outdoor, indoor link 2013-03-12 1231
15 PETS 2006 The PETS 2006 dataset contains 7 parts showing multi-sensor sequences containing left-luggage scenarios with increasing scene complexity at a train station scen... frontview, indoor, pedestrian, detection, tracking, multitarget link 2015-08-12 1046


total views: 12486 5 queries in 0.00012707710266113s 0.00013303756713867s 0.00017905235290527s 0.00010490417480469s 0.0021100044250488s and total 0.0081460475921631s