|Description (include details on usage, files and paper references)||An Annotated Dataset For Near-Duplicate Detection In Personal Photo Collections
Managing photo collections involves a variety of image quality assessment tasks, e.g. the selection of the best photos. Detecting near-duplicates is a prerequisite for automating these tasks. We created the California-ND dataset to assist researchers in testing algorithms for the detection of near duplicate images.
Contrary to other existing datasets in this domain, California-ND contains 701 photos taken directly from a real users personal photo collection. As a result, while including many challenging non-identical near-duplicate cases without the use of artificial image transformations. The original image sequence was maintained as much as possible.
More importantly, in order to deal with the inevitable subjectivity and ambiguity that near-duplicate cases exhibit, the dataset is annotated by 10 different subjects, including the photographer himself. These annotations can be combined into a non-binary ground truth, representing the probability that a pair of images is considered a near-duplicate.
A. Jinda-Apiraksa, V. Vonikakis, S. Winkler.
California-ND: An annotated dataset for near-duplicate detection in personal photo collections.
Proc. 5th International Workshop on Quality of Multimedia Experience (QoMEX), Klagenfurt, Austria, July 3-5, 2013.