Yet Another Computer Vision Index To Datasets (YACVID) - Details

Stand: 2020-06-03 000000m 21:04:51 - Overview

Attribute Current Content New
Name (Institute + Shorttitle)Text and Vision (TVGraz) Dataset 
Description (include details on usage, files and paper references)The Text and Vision (TVGraz) dataset is an annotated multi-modal dataset which currently contains 10 visual object categories, 4030 images and associated text. The visual appearance of the objects in the dataset is challenging and offers a less biased benchmark. The objective of the multi-modal dataset is to provide a common means for evaluation of object categorization research based on text and vision.

The archive "TVGraz_script.tar.gz" contain a python script name "", which will download TVGraz dataset images and text from their respective urls, upon execution and according to the "category_list.txt" file. After downloading the textual data will be in raw format per category per image.

Download: TVGraz dataset capturing tool

TVGraz: Multi-Modal Learning of Object Categories by Combining Textual and Visual Features (bib)
Inayatullah Khan, Amir Saffari, and Horst Bischof
In Proc. Workshop of the Austrian Association for Pattern Recognition, 2009 
URL Link 
Files (#)4030 
References (SKIPPED)
Category (SKIPPED)
Tags (single words, spaced)text appearance classification evaluation 
Last Changed2020-06-03 
Turing (2.12+3.25=?) :-)