I am structured: Cluster Me, Don't Just Rank me

Sihem Amer-Yahia 1
1 LIG Laboratoire d'Informatique de Grenoble - HADAS
LIG - Laboratoire d'Informatique de Grenoble
Abstract : A large number of online applications are built over high dimensional data. That is the case for shopping where products have several features (e.g., price and color), dating where personal pro?les are described using several dimensions (e.g., physical features and political views), and entertainment (e.g., movie genre and director, restaurant ambiance and location). In addition, in some applications, items may be accompanied with qualitative data such as movie and restaurant reviews. The typical way users ?nd items in those applications is by entering a keyword query and receiving a ranked list of relevant results. Ideally, just like in Web search, users would want to spend little time before ?nding a satisfactory item. In practice, due the query output size, the high dimensionality of items, and in some cases, the presence of qualitative data, users tend to spend a lot of time trying to understand correlations between item features and item quality. In this talk, I will argue that the 10-blue links experience we are used to in Web search, keywords as input - ranked list as output, is inappropriate when querying and ranking high dimensional data. I will describe two applications: exploring qualitative data and ranked querying of structured data. Exploring qualitative data is a common activity on collaborative rating sites such as IMDb, CNet and Yelp. The amount of information available on those sites is often daunting. For example, on Yelp, a not-so-popular restaurant Joe’s Shanghai received nearly a thousand ratings, and more popular restaurants routinely exceed that number. Similarly, the movie “The Social Network” received more than 42000 ratings on IMDb after being released for just two months! In practice, a user either spends a lot of time examining items and reviews before making an informed decision. Ranked querying of structured data is typical in applications such as online dating or real estate search. In online dating, a user looking for a partner between 20 and 40 years old, and who sorts the matches by income from higher to lower, will see a large number of matches in their late 30s who hold an MBA degree and work in the ?nancial indus try, before seeing any matches in different age groups and walks of life. Similarly, in online real estate, a user looking for 1- or 2-bedroom apartments sorted by price will see a large number of cheap 1-bedrooms in undesirable neighborhoods before seeing any apartment with di?erent features. Top results in ranked lists tend to be homogeneous, thereby hindering data exploration. In both applications, an alternative to ranking is to cluster results on their attributes and describe the clusters (e.g.,“Woody Allen Comedies liked by Males over 35”, cheap 2 bedrooms with 2 baths). However, not all clusters will be of interest to users given varying item quality and varying reviewers information. When exploring qualitative data, different users are interested in the opinion of different reviewerpopulations. When querying and ranking structured data,different item features correlate differently with item quality. I will discuss two approaches in this talk. Persona-driven search for which we have preliminary ideas in restaurant search, aims to improve the exploration of qualitative data. Rank-aware clustering, aims to unveil hidden correlations between item features and item quality. In that context, I will report our results of a large-scale user study and a performance evaluation over datasets from a leading dating site.
Type de document :
Communication dans un congrès
Invited paper in in 2nd International Workshop on Business intelligencE and the WEB (BEWEB) in conjunction with EDBT, 2011, Berlin, Germany, Germany. 2011
Liste complète des métadonnées

https://hal.inria.fr/hal-01002705
Contributeur : Fabrice Jouanot <>
Soumis le : vendredi 6 juin 2014 - 15:44:04
Dernière modification le : jeudi 11 janvier 2018 - 06:22:06

Identifiants

  • HAL Id : hal-01002705, version 1

Collections

Citation

Sihem Amer-Yahia. I am structured: Cluster Me, Don't Just Rank me. Invited paper in in 2nd International Workshop on Business intelligencE and the WEB (BEWEB) in conjunction with EDBT, 2011, Berlin, Germany, Germany. 2011. 〈hal-01002705〉

Partager

Métriques

Consultations de la notice

136