Banner Personalization for e-Commerce

. Real-time website personalization is a concept that is being discussed for more than a decade, but has only recently been applied in practice, according to new marketing trends. These trends emphasize on delivering user-speciﬁc content based on behavior and preferences. In this context, banner recommendation in the form of personalized ads is an approach that has attracted a lot of attention. Nevertheless, banner recommendation in terms of e-commerce main page sliders and static ban-ners is even today an underestimated problem, as traditionally only large e-commerce stores deal with it. In this paper we propose an integrated framework for banner personalization in e-commerce that can be applied in small-medium e-retailers. Our approach combines topic-models and a neural network, in order to recommend and optimally rank available banners of an e-commerce store to each user separately. We evaluated our framework against a dataset from an active e-commerce store and show that it outperforms other popular approaches.


Introduction
Personalization allows e-retailers to understand consumers' concerns and propose the right products, at the right time, while not spending exaggerated effort to understand each customer's wills and needs.Personalization techniques became increasingly popular in recent years and are considered key elements in the modern e-commerce world.Many ecommerce stores use large banner or billboard graphics in their designs.Visitors make decisions about a new site in less than a second, so creating a positive first impression and making sure visitors know exactly that they're in the right place is crucial for e-commerce.Banners allow directing customers to the most compelling offers and increase attention to a specific product or brand, resulting to users being showered with hundreds of advertisements on a daily basis.As a result, they often pay little attention to banners appearing on a web page, unless there is increased correspondence between their interests and the subject of the displayed advertisement.But, not all customers are interested in all offers, as gender, age, past views and purchases are only some of the factors that affect users' interests.Thus, increasing the correlation between user interests and the subject of the displayed banners is necessary.Increasing this correspondence can be achieved by personalizing the banners displayed to each customer, which in turn is expected to increase effectiveness [2]; the right person should receive the right message.
In the last years, personalized advertising, also known as interest-based advertising, has attracted a lot of attention.It is practically the main tool for improving advertising relevance for users and for increasing return-of-investment for advertisers.Corporations like Google, Amazon and Alibaba are making inferences about a user's interests based on the sites and products they visit and applications they use and then target their campaigns according to these interests, providing an improved experience for both users and advertisers.At first glance, matching users with advertisements they like may sound like a simple problem, however the task of developing an efficient recommender system is extremely challenging.Training, optimizing, evaluating and deploying personalization solutions require specialized expertise in analytics, applied machine learning, software engineering, and systems operations to name a few.
In this paper we propose a compact banner recommendation framework that is appropriate for e-commerce SMSs.Our work focuses to small and medium sized e-commerce sites that, on the one hand have enough traffic for data analysis (0.1-10 million pageviews/month), but on the other hand, have limited processing and manpower resources, thus complex or resource inefficient solutions are not appropriate.Our focus lies especially on banners, sliders or fixed-position billboard graphics, rather than ads such as Google Ads.The main difference is that the available banners in an e-commerce store is limited and usually their position is fixed, whereas the number of available ads is enormous, thus near realtime decisions should be taken for matching thousands or millions of users with millions of advertisements.On the other hand, banner recommendations require matching thousands or in a case of a large e-commerce store, millions of users with a few dozen banners, where the main goal is to display the most interesting banners in the right order (in case of a slider) to users.Towards this end, we propose a banner recommendation framework that clusters products based on their textual representation and user's session-based information like clicks and purchases.The result of this process is used for training a neural network that optimally matches available banners with users.In addition, we propose four new evaluation metrics, to act as key performance indicators (KPIs), as traditional classification and ranking metrics cannot fully describe the effectiveness of banner recommendation methods for SMEs.
The remainder of this paper is structured as follows: Related work on personalization and banner recommendation is discussed in Section 2. Section 3 introduces four KPIs for evaluation in case of banner recommendations, while Section 4 discusses in detail the proposed framework for banner recommendations in e-commerce and evaluation takes place in Section 5. Finally, Section 6 summarizes work done, discusses future work and concludes the paper.

Related Work
A profusion of papers has been written laying down the fundamental principles of web personalization and surveying the most popular approaches in an effort to define a taxonomy of existing approaches in the field of e-commerce.In [19], Salonen and Karjaluoto discuss areas of interest, research gaps and future directions for web personalization.Their work includes a literature review of the top 20 marketing and information systems journals published during the period of 2005-2015 that shows active research output and the domination of information systems publications.A number of studies have shown that web personalization is well worth the investment in capital and work hours, as it has been observed that website personalization can lead to increased user satisfactions, increased revenue and customer loyalty [20].Other surveys [1] have indicated that web personalization is a major priority for organizations with an online presence.
Personalization naturally starts with collecting information about the individuals one wishes to reach.This information can be explicit or implicit.Explicit information is what a user offers directly and includes ratings, answers to questionnaires, reviews, personal information etc. Implicit information refers to data that the website gathers based on a user's interactions with the website.This process is a discipline of Web Mining [12] and includes monitoring, storing and processing a user's actions in order to reveal potentially useful underlying data (e.g.browsing and/or purchasing patterns).In this line of research there are many approaches in terms of the data that are collected, the ways by which that is done, and how the collected data may be used [17].Nevertheless, this type of mining has become controversial in the last years, as it can be regarded intrusive in terms of the amount of user data collected with limited or without user consent.The lack of transparency raises concerns about privacy and security, thus recent regulations such as GDPR [7] try to adjust the excessive use of personal data.Although various aspects of the legislation and their implementation remain uncertain [21], the fact remains that various practices mentioned in previous sources might have to be revised to allow for user consent, or even be rejected altogether as non-compliant with the new regulatory frameworks.
As far as the problem of personalized recommendations is concerned, the main objective is to use available data to create recommendations that promote different content to every individual, according to his or her distinct interests.Several papers have been written that analyze and survey recommender systems in the field of web personalization [5].Furthermore, various approaches have been proposed in the context of e-commerce, such as statistical approaches [18] and machine learning frameworks.Algorithms such as Collaborative Filtering [24], Deep Learning Neural Networks [10] and SVM models [14] have been widely used in personalized content recommendation systems.Although most of these methods have shown promising results, other methods have focused on the cold-start problem, that is the case where no prior knowledge exists (e.g. the first visit of a new customer).In this case association rules [15] have been used with satisfactory results.Dealing with infrequent items is also another problem that requires special treatment.These items usually are treating as outliers and excluded in the preprocessing phase [13].Another interesting problem in the field are Top-N recommenders, which aim to produce N ordered recommendations, rather than a single one.Top-N recommender algorithms sometimes are considered a distinct problem due to sometimes increased computational load required, as well as differences in the ways they are evaluated [11].
On the other hand, recommending advertisements has attracted increased attention during the past years.Yuan and Tsao [25] proposed a personalized contextualized mobile advertising infrastructure for the advertisement of commercial and non-commercial activities that minimize users' inputs by using implicit browsing behaviors and recommends Top-N scored advertisements using a two-level Neural Network.In a different approach, Wang et al. [23] propose an interactive service recommendation based on contextual search by building an ad domain-based concept hierarchy to make the most of the product details over the e-commerce sites.In [3] Anastasakos et al. explore search engine logs to determine the relevance of an ad document for a search query by using a graph with several million edges.Last but not least, Papadopoulos et al. [16] study the problem of recommending a small set of ads to a user based solely on the currently viewed web page and propose the use of lexical graphs created from web corpora as a means of computing improved content similarity metrics between ads and web pages.
Obviously, a lot of progress has been made in personalization and recommendation systems; however, most of the existing solutions focus on product or ads recommendation.In our case, we deal with the problem of e-commerce banner recommendation, especially focusing in small and medium e-commerce sites.In this case the pool of available recommendations is limited and available banner positions are fixed, while available data and resources are also limited.Our proposed solution is a Top-N recommendation framework for selecting the most appropriate banners in e-commerce.Advancing the state-of-the-art we propose an integrated solution that combines topic analysis and neural networks, a combination that not only achieves better results, but is also more adaptable to data changes.

Proposed Evaluation Metrics
To measure the effectiveness of recommender systems and compare different approaches, various metrics have been applied such as Classification Accuracy, Logarithmic Loss, Confusion Matrix, Area under Curve, Recall, F1 Score, Mean Absolute Error and Mean Squared Error [22].Nevertheless, our problem presents some differences from typical classification or ranking problems, as in our case there are difficulties in separating good and bad recommendations, due to the fact that available classes and/or banners for recommendation may be limited and may not match the user's interests.Thereafter, in our case we may have to recommended the best of a few bad choices.For this reason, we propose four new metrics to act as KPIs (Key Performance Indicators) in our problem: a) Recall-TopN , b) Recall-Purchases, c) Min-1 and d) Pos-Error.We argue that these new metrics are more suitable for evaluating the performance of banner recommendations systems and we believe can improve the evaluation procedure.a) Recall-TopN In the field of Information Retrieval, Recall (also known as sensitivity) is the fraction of relevant instances that have been retrieved over the total amount of relevant instances.We propose a variation of the Recall metric that we call Recall-TopN (Equation 1), where N is the number of recommendations, S the total number of sessions, P i the number of categories that the system predicted correctly (true positives) for session i and C i the number of categories the user showed interest for (relevant elements).
This adjustment is necessary, as even if the user is interested in multiple categories, there will be only N recommendations regardless of the number of relevant items.Traditionally, recall would penalize this behavior as in our case the number of true positives cannot never exceed N .b) Recall-Purchases Next, we propose Recall-Purchases (Equation 2), a metric that intends to evaluate the percentage of recommendations that potentially can lead to actual conversions.In Equation 2, M is the number of sessions that lead to a purchase, B i , is the number of categories of which products were purchased for session i, and P i is the number of categories recommended.Recall-Purchases may be more important than Recall-TopN , as conversions is the definite target in e-commerce.
c) Min-1 Another KPI (Key Performance Indicator) that is necessary in ecommerce is determining where a recommendation set is useful or completely irrelevant.Towards this end we define Min-1, which is the percentage of sessions where at least one of Top-N recommended categories was interesting for the user.While, not a particularly strict criterion, it is important in determining how many recommendation sets are even minimally useful.
d) Pos-Error Recommendation systems should not only provide relevant recommendations, but also rank them properly.Thus, we propose a metric we call Pos-Error that calculates the degree to which recommendations are not only relevant, but also optimally ranked.An intuitive description would be that Pos-Error increases when all banners belonging to user's top-N interests are optimally ranked.Pos-Error is defined in Algorithm 1, where S is the number of sessions, C is the number of categories, P is a N XC matrix with the predictions of the recommender regarding user interest per session per tag, and C is a N XC matrix with the actual user interested displayed in the corresponding sessions.According to Algorithm 1, Pos-Error increases as the distance between predicted and the ground truth positions increases.Pos-Error captures the fact that prediction mismatches about categories the user is more interested are more important than prediction mismatches for less preferred categories.Logarithmic values are used to smooth out results.

Proposed Framework
In this Section we propose an integrated framework for banner recommendations as depicted in Figure 1.Our framework processes product textual information, user history and categories (classes) that users express interest for.Latent Dirichlet allocation (LDA) [4] is used employed for clustering products based on their textual description (name, description etc.).Then products clusters and user history data are processed in order to calculate user's session-based interest to specific categories.Finally, we use a neural network that processes the sessionbased interests and produces banner recommendations.
In every e-commerce system there are data that consist of user ids (customer ids or cookie ids in case of registered or unregistered visitors respectively), products, user pageviews, purchases and categories.Moreover, products are categorized based on brand, attributes or general interest, so by categories (classification classes) we mean the general user interests (brand, category or class).First, we apply LDA for extracting semantic information from products and cluster them into topic themes, based on the similarities of their textual descriptions.Topic modeling is based on the assumption that each document d i is described as a random mixture of topics and each topic as a focused multinomial distribution over terms.LDA builds a set of thematic topics from terms that tend to co-occur in a given set of documents.The result of the process is a set of N Θ topics, each expressed with a set of N W terms.The number of topics N Θ and the number of LDA discovers a mixture of topics P (θ|d) for each document d i , where each topic is described as a mixture of terms P (w|θ) following another probability distribution as given in Equation 3. The probability of the w i term describing a given document d is P (w i |d), where θ i is the latent topic, and P (w i |θ i=j ) is the probability of w i , within topic j.The probability of picking a term from topic j in a document is P (θ i = j|d).
LDA estimates the topic-term distribution P (w|θ) and the document-topic distribution P (θ|d) from an unlabeled corpus of documents using Dirichlet priors for the distributions and a fixed number of topics.The Gibbs sampler [6] iterates multiple times over each term w i in document d i and samples a new topic j.
The Gibbs sampler sets the complexity of topic modeling to O(N Θ N I), where of N Θ is the number of topics, N the number of documents (movies) and I is the number of the sampler's iterations.The most common way to evaluate a probabilistic model is to measure the log-likelihood of a held-out test set, thus we use perplexity [4] for determining the optimal number of clusters, which is monotonically decreasing in the likelihood of the test data, and is algebraically equivalent to the inverse of the geometric mean per-word likelihood.A lower perplexity score indicates better generalization performance.More formally, for a test set of N documents, the perplexity is described in Equation 5.
After we assign each product to a topic cluster via LDA, we group user actions into sessions.We use an arbitrary time limit of T hours, which means that a session consists of actions of the same user, all of which happened within T hours of the previous one.For each session we count the instances a category was viewed.Thus, for each session a vector of length C is created, with each slot containing the number of times the respective category or product was viewed.In case of product views, we consider that products are assigned to one or more categories.Thus, a matrix V cat is created with dimensions (N, C), where N is the number of sessions examined and C is the number of categories.Following the same process, purchase statistics are calculated and matrix P cat is created (Algorithm 2).Having created matrices V cat and P cat for pageviews and purchases respectively, we repeat the process for views and purchases of products with regard to their assignments to LDA clusters.Each session is assigned two vectors of zeros, one for views and one for purchases, where each slot is related to an LDA group.For each view or purchase, the respective LDA vector is added to the total views and purchases of the accordingly .Thus, two more matrixes V lda and P lda are created according to Algorithm 3. Next, we normalize each matrix in order to contain values in the range [0,1] according to the relevant interest degree for each category in every session.After that, V cat and P cat are aggregated according to Equation 6 where α and β are weight factors as purchases are a stronger signal that product views.Then, matrices Scat and Slda are concatenated to form matrix S.

Algorithm 3 Calculate Category interest based on LDA clusters
F or each session i : Having calculated session-based category interest for each user, we estimate user interests taking into account the entire user-history as the mean category interest for each sessions, which results to a matrix H that we use as input of the neural network for predicting the output matrix Scat.In other words, the proposed framework will predict user interests at session s using the available information of that user's actions before session s, as described by their historical data in array H.

Evaluation
In this Section we evaluate the proposed solution using a dataset coming from a live e-commerce store.Our evaluation dataset contains anonymous data of 4,292,495 unique clicks, 10,167 products, 83,467 unique customers and 146,803 product purchases.After some preprocessing, we grouped available data into sessions, where in our case each session consisted of actions of the same user for a period of T = 2 hours.We also clustered products into 50 topic-clusters using LDA after evaluating the optimal number of clusters using the perplexity metric.In our case we found out that 13 out of the available 68 categories (classes) were too generic, covering more than 80% of the products, so we treated these 13 categories as outliers and removed them, the number of products remained unchanged.Then, we performed a five-fold cross validation evaluation test for determining the optimal hyper-parameters of our neural network.Our neural network, built with Python and Tensorflow, used the Relu activator and neuron weights initialized with the Glorot initialization function [9].We tested hyperparameter values in order to find the optimal combination using various combinations of the following hyperparameter values: -Learning Rate = {0.1,0.01, 0.001, 0.0001, 0.00001, 0.000001} -Layers = {2, 3, 4, 5, 6, 10} -Neurons per Layer = {236, 354, 472, 590, 708} -Regularization Factor = {0.1,0.01, 0.001} -Loss Function = {Mean Squared Error, Categorical Cross Entropy} -Optimizer = {Stochastic Gradient Descent, Adam} -Batch Size = {64, 32, 16} A sample of our experiment set is available in Table 1, the duration of each ANN training run was between 90 and 120 minutes, depending on the learning rate, number of layers and neurons per layer, on an Intel i7 6700K computer with a GeForce GTX 970.The following hyperparameters: Learning Rate=0.0001,Layers=3, Neurons=590, Loss Function=Categorical Cross entropy, Optimizer=Adam, Batch Size=32 and Regularization Factor=0.1 resulted in Recall Top-N = 0.333, which was the best result in our tests, thus we used these values in the next step.Next, we evaluated our proposed framework against four approaches.First, we wanted to test more machine learning approaches, so we replaced the neural network in the processing step with a random forest (with 100 estimators and a maximum depth of 10), as well as k-nearest neighbors (KNN) [8] algorithm.Finally, we evaluated our approach against two baseline solutions, we set N = 5 most popular categories and then randomly selected N = 5 categories.According to the results of our tests which are depicted in Table 2, our approach achieves higher scores in all four evaluation metrics, up to 17.45% in case of random forests and up to 77.36% in case of randomly showing banners, which is the case for the majority of existing e-commerce sites.

Conclusion
In this paper, we introduced a compact integrated system for personalized banner recommendations that is specially targeted to e-commerce stores.We proposed a framework that employs latent dirichlet allocation and neural networks for processing user history and product's textual information in order to produce and rank recommendations from a limited pool of available banners that can be displayed in specific banner positions of an e-commerce store.We evaluated the proposed approach on a dataset coming from an active e-commerce store against other approaches and showed that our solution can provided up to 17.45% better results than random forest methods and improve up 77.36% than baseline solutions, which is the case for most small and medium e-commerce stores.Our future work includes further experimentation on discovering the optimal parameters for LDA and our neural network.We also plan to evaluate our approach against more recommendation algorithms, as well as test the proposed solution live in an e-commerce store via A/B testing.

For
i=1 to Number of Sessions do V = Category Pageviews related to session i P = Category Purchases related to session i For j=1 to C do Vcat [i,j] = Number of V entries related to category j Pcat [i,j] = Number of P entries related to category j

For
i=1 to Number of Sessions do Pv = Product Pageviews related to session i Pb = Product Purchases related to session i For j=1 to Sizeof Pv do Lda_var = LDA memberships for product Pv [j] Vlda [i] += Lda_var For j=1 to Sizeof Pb do Lda_var = LDA memberships for product Pb [j] Plda [i] += Lda_var

Table 1 .
Evaluation of various hyperparameters for our neural network.