Using a Social-Based Collaborative Filtering with Classification Techniques

. In this paper, a social-based collaborative filtering model named SBCF is proposed to make personalized recommendations of friends in a social networking context. The social information is formalized and combined with the collaborative filtering algorithm. Furthermore, in order to optimize the performance of the recommendation process, two classification techniques are used: an unsupervised technique applied initially to all users using the Incremental K-means algorithm and a supervised technique applied to newly added users using the K-Nearest Neighbors algorithm (K-NN). Based on the proposed approach, a prototype of a recommender system is developed and a set of experiments has been conducted using the Yelp database.


Introduction
Due to the powerful ability of solving information overload, recommender systems have attracted a lot of attention in the last decade [1]. Recommender systems automatically predict the preferences of users for some given items, in order to provide them with the useful recommendations. There are three main methods to generate recommendations [2]: content-based filtering, collaborative filtering (CF) and a hybrid approaches combining in different way the two aforementioned algorithms [3]. Contentbased filtering algorithms identify features that appear in item contents that users have appreciated before, thereafter suggest more items which contain these relevant features to users. CF is a widely-exploited technique in recommender systems to provide users with items that well suit their preferences [2]. The basic idea is that a prediction for a given item can be generated by aggregating the ratings of users with similar interest. However, one of the shortcomings of this algorithm is the sparsity and cold start problems due to insufficient rating [4].
Nowadays, interactions and sharing knowledge over the Social Web have become the main way of communication between people. The study of social-based recommender systems has emerged as users are unable to reach the relevant information due to the exponential growth of data.
In this paper, we focus on the friends' recommendation in social networks. Our approach first enhanced the CF recommendations with social information. The social dimension is characterized by some social-behavior metrics such as friendship, commitment and trust degrees between users, to cope with problems of rating diversity and sparsity. Then, in order to optimize the performance of the recommendation process, we have used two classification techniques along with collaborative and social similarity measures: (1) an unsupervised technique using the Incremental K-means algorithm, applied initially to all users of the social network; and (2) a supervised technique using the K-Nearest Neighbors algorithm, applied to new users.
The remainder of the paper is organized as follows: Section 2 gives an overview on some related work about social recommendation. Section 3 presents our recommendation approach. In section 4, we give an overview on the experimentation we carried out. Finally, we give the conclusion with some perspectives in Section 5.

Related work
The review of the literature about social recommendation shows that much work proposes various approaches to provide personalized recommendations to users. For instance, Liu and Lee [5] developed a way to increase recommendation effectiveness by incorporating social network information into CF. Chang and Chu [6] proposed a recommendation approach which calculates similarity among users and users' trustability and information collected from the social networks. Banati et al. [7] explored the role of explicit social relationship by presenting two novel similarity metrics. The first metric is based on the social behavior that measures similarity between two users on the basis of "how similar they are in their social relationship". The second metric integrates the social similarity with the interest similarity between two users. Su et al. [8] proposed a music recommender system integrating social information (to cope with problems of rating diversity and sparsity) and collaborative information (to cope with problem of lack of rating information) to predict users' preferences. In order to improve the performance of social recommendation, some studies have used classification techniques, for instance: Guo et al. [9] developed a multi-view clustering method through which users are iteratively clustered on the basis of rating patterns in one view and social trust relationships in the other. Najafabadi et al. [10] tried to improve the accuracy of CF recommendations using clustering and association rules mining on implicit massive data.

3.1
The user-user based collaborative filtering (CF) We chose a memory-based CF approach and used the user-user based recommendation. In this approach, the system offers the possibility of identifying the best neighbors for a given user, using the usage matrix. The matrix can be constructed using the ratings of users on items. We used the Pearson Correlation function to calculate the similarity between users. The collaborative similarities between users allow the identification of neighborhoods and therefore building communities of users who evaluate in the same way, according to a given threshold.

3.2
The social filtering (SocF) The social information of the user's profile is based on two metrics: (1) the friendship; and (2) the credibility degree. Two parameters are considered to identify the credibility of an active user u: Commitment and trust degrees.

3.2.1
Friendship metric. This metric computes the similarity weight between two users , based on their social relationships which is defined as the size of the intersection divide by the size of the union of friend sets: where: F(u ), respectively ' (u ): represent the number of friends of , respectively .

3.2.2
Commitment degree. Two parameters are considered: (1) the participation degree of an active user u, including the degree or rate of evaluations he carried out; and (2) the sociability degree represents his friendship rate in the social network.
where: -and 5 are weights that express a priority level, with -+ 5 = 1 • The participation degree of u: concerns mainly the degree of performed evaluations by u. This degree is calculated based on the number of evaluations performed by u, NbEval (u), according to all the evaluations carried out in the system, NbTotalEval.

Trust degree.
This metric takes into account the seniority level of u and his degree of competence in the social network using the following formula: where: -, 5 are weights that express a priority level, with -+ 5 = 1 • The seniority level of u: is calculated based on the date of his registration in the social network.
• The Competency degree of u: is calculated on two steps, based the following assumption [11]: "A friend is competent if he has evaluated correctly the resources compared to their average evaluations in the social network": -Step 1: Calculate the competency degree of a friend F regarding a given item M N . We start by calculating the average of ratings for each item. Then, we compare the rating given by F for the same item with the average value. where: 0_`(M N ) : is the average evaluation of the item regarding all the users. v^, [ : is the evaluation of the friend F for M N -Step 2: Calculate the global trust degree of the friend, using this formula: where: n represents the number of items evaluated by the friend.

Combining SBCF approach and classification techniques
The main objective of the classification in our recommendation process is to group similar users according to the collaborative / social dimensions. This will reduce the search space for the identification of neighbors and allow us to cluster the different groups. So, each user of the social network will have both a collaborative class and a social one. Moreover, if any user has friends but has not yet made enough evaluations, the system can recommend him other friends based only on the social dimension. Similarly, if a user has performed enough evaluations but has not yet added any friend, the system can recommend him new friends based on the collaborative dimension.

Unsupervised classification using Incremental K-means
We have first used the K-means classical algorithm, which is considered to be the most popular algorithm because of its simplicity and its ability to handle large data sets. However, the disadvantage of this algorithm is that each initialization corresponds to a different solution (local optimum) that can be far from the optimal solution (global optimum). A naive solution to this problem is to run the algorithm several times with different initializations and to retain the best grouping found. The use of this solution is limited because of its cost and that in some cases we cannot explain the partitioning of the clusters where we can find an optimal partition in a single execution [12].
Then, we applied the incremental K-means proposed in [12] which is a variant of the K-means algorithm. This algorithm eliminates the initialization problem of centroids. It is based on the principle of global K-means, which aims to achieve an optimal solution, i.e. instead of having a single center of the whole population (global Kmeans). The algorithm chooses two objects, where each being the center of a cluster so that the two latter are the more distant. The next step is to choose the next center. A simple function allows calculating the distance between the center of the cluster and its neighbors. The farthest element of the center is the elected candidate to be the new centroid. After this operation the clusters are reconstructed by affecting the set of objects where the distance between the object and the center is minimal. This action is repeated until K clusters are obtained.

Collaborative Incremental K-means (Col-Inc-K-means)
The set of users to be classified includes all the social network users. The application of col-Inc-K-means implies the application of the incremental K-means algorithm using the Pearson similarity functions for assigning objects to the corresponding clusters.

Social Incremental K-means (Soc-Inc-K-means)
The set of users to be classified includes all the social network users. The application of soc-Inc-K-means implies the application of the incremental K-means algorithm using the social similarity measure already presented in the previous section.

Supervised classification using K-NN algorithm
We have chosen the k-nearest neighbors algorithm (K-NN), an instance-based classification method. The complexity of this algorithm is equal to O(n), where n being the total number of users of the training set. This algorithm determines for each newly added user, the list of nearest neighbors among those already classified in the clustering step (unsupervised classification). The newly issued ratings cannot be quickly used for updating the classification obtained by Incremental K-Means as this operation is costly in terms of computation time. To overcome this obstacle, we classify newer users using both collaborative and social K-NN algorithms, adapted respective-ly to the collaborative and social classification. The collaborative-based K-NN algorithm (resp. social-based K-NN algorithm) uses the same collaborative distance measure already used with the Col-Inc-K-means (resp. Soc-Inc-K-means).

Recommendation algorithm
We present in this section our algorithms for friends' recommendation. Collaborative and social clusters have already been calculated, before running this algorithm.

SBCF algorithm
The following is the proposed algorithm of recommendation combining the SocF and the CF without using the classification techniques. Leaders are, for example, users who are very active in the social network, i.e. evaluate items correctly (based on the average item evaluations given by other users) and/or those with a significant number of friends, having strong / the most important social affinities in the social network.

Sort Rec-list (u)
: is the function that sorts in descending order the list of recommended friends for u (from the most similar friend to the less similar one).

Display (u, Rec-list): is the function that displays the list of recommended friends
Rec-list for the user u.
The SBCF combination algorithm considers the recommendations made using the CF and then applies the SocF (i.e. the SoF is applied using the generated recommendations of users suggested by the CF).

Recommendation algorithm using SBCF and classification
C-SBCF is the SBCF recommendation of users using the social and collaborative classification techniques.

Evaluation
In this section, we conduct experiments on a real dataset to validate the effectiveness of our approach. We used the Yelp social network (http://www.yelp.com/), which aims to connect people with local businesses and we chose the "Restaurant" category because of its frequency use in this social network. The experiments carried out have two main objectives: 1. Show the contribution of combining social information with the user-based CF recommendations; 2. Compare the use of K-means and incremental K-means in the recommendation process and show the added value of the K-NN algorithm.

Dataset
Before using the Yelp Database, we have performed some pretreatment operations for the inclusion of implicit data (i.e. interests, preferences, commitment and trust degrees, average rating and number of assessments per restaurant, etc.). The resulting database includes 4823 restaurants, 65 categories and 5436 users who have performed 118,709 assessments on these restaurants (only users who have evaluated more than 9 restaurants have been considered).

Evaluation metrics
We considered the following evaluation metrics: • F-measure (F): combines P and R metrics: • Accuracy (A): measures the performance of the system: where: • True Positives (TP): represents the number of recommendations made to users who were originally friends, • True Negatives (TN) : represents the number of recommendations that are not made to users who were not initially friends, • False Positives (FP) : represents the number of recommendations made to users who were not initially friends, • False Negatives (FN): represents the number of recommendations not made to users who were initially friends.

Evaluation results
We started first by evaluating the CF using the Pearson similarity function and we have varied the values of similarity rate from 0.1 to 0.9 (Sim-Threshold).
Then we evaluated the SocF by varying the weights of the three parameters: the friendship as well as the commitment and trust degrees. The results of the tests carried out revealed that the combination α1=0.1; β1=0.6; γ1=0.3 gives the best results in terms of precision and F-measure than the two other combinations of α1; β1and γ1 (see Fig.1), where: α1, β1 and γ1 represent respectively friendship, commitment and trust degrees parameters.

Fig. 1. Identification of the social parameters' weights
Finally, we have evaluated the SBCF using the best parameters for each algorithm. Fig. 2 shows that the integration of social information enhanced the CF recommendation accuracy. We have obtained a better precision and F-measure using the SBCF compared to the CF.

Fig. 2. Contribution of social information on CF recommendations
In order to see the evolution of the two algorithms K-means and incremental Kmeans, we have considered in this experimentation only the social classification and we have simulated the evolution of the social network using a partition of the database including: 200 users, 351 restaurants, 4852 ratings. We fixed the number of social classes (K=3) and the recommendation threshold value to 0.3 and we have varied the number of evaluations (NbE), number of users (NbU) and number of deleted friendships (NbDF) to check whether the system can recommend them again or not. The results obtained are shown in Table1. We have presented the difference between the evolution of social K-means and social Incremental K-means in Fig. 3. To simulate the evolution of the social network, we have sorted the evaluations and users (by registration dates), from the least recent to the most recent. Then, we gradually injected this data into our system. The results presented in Table 2 were obtained with the following parameters for the different iterations (number of added users/ new evaluations): col-K-NN = 6, social K-NN = 6, collaborative threshold = 20000, social threshold = 900, recommendation threshold value = 0.3.
We applied the Soc-K-Means for every 900 new users and col-K-means for every 20000 evaluations. The results presented in Fig.4 were obtained with the following parameters for the different iterations (number of added users/ new evaluations): col-K-NN = 6, social K-NN = 6, collaborative threshold = 20000, social threshold = 900, recommendation threshold value = 0.3. Between two applications of K-means, the K-NN algorithm is applied (Soc-K-NN is applied for each 300 new added users and Col-K-NN is applied for each 7000 evaluations). These results show the performance of the recommendation algorithm (accuracy value between 0.76 and 0.86) and confirm the contribution of the K-NN algorithm given that the system recommends friends between two applications of Incremental K-means.

Conclusion
We presented in this article an enhanced collaborative filtering approach for the recommendation of friends in social networks. Our approach combines the CF recommendation with social information. Furthermore, in order to optimize the performance of the recommendation process, two classification techniques have been used, the incremental K-means and the K-NN algorithms. The results of the evaluations we carried out using the Yelp Dataset show the effectiveness of our system compared to the CF in terms of precision and F-measure. This combination alleviates the cold start problem as the system may suggest to a given user u a list of other appropriate users by using the social aspect. Moreover, the evaluation shows the contribution of using the incremental K-means and the added value of the K-NN algorithm (the obtained accuracy is approximately between 0.76 and 0.86). In our future work, we envisage to make further experiments using other datasets and to enrich our approach with semantic information in order to take into account the user's preferences and to benefit from the advantage of the semantic representation of the user's profile.