Wednesday, June 3, 2009

Visual Diversification of Image Search Results

Visual Diversification of Image Search Results is a Yahoo paper about image clustering that I liked very much. The paper deploys lightweight clustering techniques, to best capture the discriminative aspects of the resulting set of images that is retrieved.

For earch image, a set of representative features are extracted such as Color histogram, Color layout, Scalable color, CEDD, Edge histogram, Tamura. Strangely enough the authors are not considering SIFT and wavelet signatures, which are quite popular these days.

Authors evaluated three different algorithms:
  • The folding algorithm appreciates the original ranking of the search results as returned by the textual retrieval model. Images higher in the ranking have a larger probability as being selected as a cluster representative. In one linear pass the representatives are selected, the clusters are then formed around them.
  • The maxmin approach also performs representative selection prior to cluster formation,but discards the original ranking and finds representatives that are visually different from each other.
  • Reciprocal election lets all the images cast votes for other images that they are best represented by. Strong voters are then assigned to their corresponding representatives, and taken off the list of candidates. This process is repeated as long as there exist
    unclustered images.
As you can imagine, the number of clusters is never fixed, because it is impossible to predict a priori what a good value is. Evaluation is carried out by using 200 manually created clusters, used as ground truth.

Evaluation measures are the Folwkes-Mallow index and the Variation of Information Criterion. Folding is the best algorithm under the FM index evaluation, and Reciprocal is the best one according to the Variation of Information Criterion.

I reccomend reading the paper if you are interested in Image search and in Clustering. Just one observation to the authors. Since the target of this work is an use in production, I would have appreciated a comparison of the time needed to extract the features and to cluster the results with the three different algorithms.

No comments:

Post a Comment