This website provides a list of frequently used computer vision datasets. Wait, there is more!
There is also a description containing common problems, pitfalls and characteristics and now a searchable TAG cloud.
Plus, this is open for crowd editing (if you pass the ultimate turing test)! - Questions? yacvid [at] hayko [dot] at
Content, Design and Idea © by Hayko Riemenschneider, 2011-2018. Texts and Images are subject of copyright by the respective authors.
Hey! If you're reading this, why not help and update the description of the dataset you're working on? Add a new dataset! Yay!
«showing 697 tags of 697 total tags for 514 datasets (1.36) »
|484||Flickr30k Entities||We propose to use the visual denotations of linguistic expressions (i.e. the set of images they describe) to define novel denotational similarity metrics, which...||phrase grounding caption text analysis image description flickr association video link||link||2019-01-23||347|
|475||MAE Dataset||The Multimodal Attribute Extraction (MAE) dataset is the first benchmark dataset for the task of multimodal attribute extraction. It is composed of mixed media ...||multimedia multimodal images text attribute recognition pair product search asset retrieval||link||2018-11-20||283|
|415||Total Text Dataset||In order to facilitate a new text detection research, we introduce the Total-Text dataset, which is more comprehensive than the existing text datasets. The Tota...||text detection, text recognition, scene text detection||link||2020-04-16||2292|
|344||Unimore - YACCLAB dataset||The YACCLAB dataset includes both synthetic and real binary images and is suitable for a wide range of applications, ranging from document processing to survail...||Labeling Binary Text Medical Fingerprint Video Surveillance Natural Random Noise||link||2019-01-03||955|
|253||Street View House Number (SVHN)||SVHN is a real-world image dataset for developing machine learning and object recognition algorithms with minimal requirement on data preprocessing and formatti...||street number recognition classification urban detection text real world||link||2017-11-28||1472|
|167||Text and Vision (TVGraz) Dataset||The Text and Vision (TVGraz) dataset is an annotated multi-modal dataset which currently contains 10 visual object categories, 4030 images and associated text. ...||text appearance classification evaluation||link||2020-02-04||1778|
|144||MNIST hand-written letters||The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of ...||text, classification, letter||link||2017-06-03||2216|
|96||USPS Handwritten Digits||Name: Classes Train. Ex. Test. Ex. Features USPS 10 7291 2007 256 8-bit grayscale images of "0" through "9"; handwritten digits; ...||text, recognition, classification, handwritten||link||2019-11-21||4157|
|95||Stroke Width Transform Text||Stroke Width Transform Text dataset is by Boris Epstein and consists of 307 images and XXX text instances. Detecting Text in Natural Scenes with Stroke Wid...||text, detection, recognition, classification||link||2015-04-24||1689|
|94||Chars74K||The Chars74K dataset consists of 64 classes (0-9, A-Z, a-z), 7705 characters obtained from natural images, 3410 hand drawn characters using a tablet PC, 62992 s...||text, detection, recognition, classification||link||2018-08-28||2478|
|93||Street View Text||The Street View Text (SVT) dataset contains 647 words and 3796 letters in 249 images harvested from Google Street View. The dataset is more challenging becaus...||text, detection, recognition, classification, outdoor, urban||link||2014-01-13||1669|
|92||ICDAR 2011||This challenge is set up around three tasks: Text Localisation, Text Segmentation and Word Recognition. Participation in any or all tasks is welcome. Check the ...||text, detection, recognition, classification||link||2016-06-01||1249|
|91||ICDAR 2003||The ICDAR 2003 datasets available for download on this site: Robust Reading , Robust Word Recognition , Robust OCR , Text Locating and Cursive Script . Pleas...||text, detection, recognition, classification||link||2018-05-16||1534|
|27||Idiap/ETHZ Faces and Poses||Idiap/ETHZ Faces and Poses Dataset dataset by L. Jie, B. Caputo and V. Ferrari contains 1703 image-caption pairs. [author] Captions contain the names of some of...||face, pose, pedestrian, text||link||2013-03-11||1369|