Paper Title (use style: paper title)

2 downloads 964 Views 766KB Size Report
Performance Analysis of Classification and Ranking Techniques. Praful Koturwar. Department of Information Technology. Maharashtra Institute of Technology.
Performance Analysis of Classification and Ranking Techniques Praful Koturwar

Sheetal Girase

Debajyoti Mukhopadhyay

Department of Information Technology Department of Information Technology Department of Information Technology Maharashtra Institute of Technology Maharashtra Institute of Technology Maharashtra Institute of Technology Pune, India Pune, India Pune, India [email protected] [email protected] [email protected]

Abstract— Recommendation systems aim at recommending relevant items to the users of the system. Recommendation Systems provide efficient recommendations based on algorithms used for classification and ranking. There exist various ways by which classification can be achieved in a supervised or unsupervised manner. Since the sample datasets that are used for experiments are large and also contain more number of feature sets, it is essential to understand dataset beforehand. Also when results are shown to the user, big challenge is how well data can be ranked so that user satisfaction is guaranteed. When data sets are large, some ranking algorithms perform poorly in terms of computation and storage. Thus, these kinds of algorithms are quite expensive. We aim at developing classification and ranking algorithm which will reduce computational cost and dimensionality of data without affecting the diversity of the feature set. Dimensionality of data can be handled by SVM (Support Vector Machine). AUC (Area under the Curve) and WARP (Weighted Approximately Ranked Pairwise) algorithms are efficient for ranking of the items which are of user interest. Keywords- Recommendations; Classification; Ranking; SVM; AUC; WARP.

I. INTRODUCTION Recommender Systems (RS) are now pervasive in user’s lives. They aim to help users in finding items that they would like to buy or consider based on huge amounts of data collected. These items can be of any type like websites, movies, books, music, or news articles [11]. RS are a subclass of information filtering system that seek to predict the 'rating' or 'preference' that user would give to an item. The user’s interest in an item is expressed through the rating/ranking the user gives an item. A recommendation system has to predict the ranking for items that the user has not yet seen. The system can use these estimated ratings to recommend the ranking of items that have the highest estimated rating. RS also predict user’s preference on the basis of his/her demographic features like age, gender, location, etc. Classification technique is used to classify the unstructured data according to the format of the data that is to be processed, the analysis type to be applied, the data sources for the data that the target system is required to acquire, load, process, analyze and store. Many classification techniques are used based on applications selected. Before actual classification begins, required information is extracted from dataset and then classification is done. There are two main classification techniques supervised and unsupervised. Supervised

classification techniques are also known as predictive or directed classification. In this method set of possible outcomes is already known. Unsupervised classification techniques are also known as descriptive or undirected. In this method set of possible outcome is unknown, after classification we can assign name to that class [14-15]. Ranking is associated with reordering the clusters data obtained from classification to find out most relevant results among the data inside cluster. There are various challenges associated with the ranking of datasets such as some dataset need explicit inputs from user where as some does the task of ranking implicitly without explicit user query. This paper studies the different classification and ranking recommendation techniques being experimented, analyzing them in terms of the data that supports the recommendations [5]. We proposed optimization approach for learning ranking functions that is robust to outliers and applicable to any method that learns a linear ranking function. Metastrategy is computationally economical and it is developed for item ranking, this strategy will be generalized across many ranking tasks. This approach has existing MovieLens dataset on which user can fire query. As per the user query, classification can classify the data in the cluster of movie, director, popularity and genre. Except popularity all clusters are sorted in the ascending order due to k-order. Outcome of this classification is given to sigmoid loss function. Sigmoid global loss function will decide which loss function is suitable to order user interest [12]. The rest of this paper is organized as follows. In Section II we present related work done by various researchers in this field. In Section III proposed methodology is discussed. In Section IV experimental results are shown. The conclusion is given in last section. II. RELATED WORK A lot of research work is done on Recommendation Systems by many researchers. There are various classifications and ranking techniques available to recommend items as per the user interest. According to the survey, the techniques used by recommender systems can be classified based on the information dataset they use. The available datasets are the user features (demographics) (e.g. age, gender, profession, income, location), the item features (e.g. keywords, genres), and the user-item ratings (gathered through questionnaires, explicit ratings, transaction data).

978-1-4673-6540-6/15/$31.00 ©2015 IEEE

Dingxian Wang et al. [1], has presented a stock futures prediction strategy by using a hybrid method to forecast the price trends of the futures which is essential for investment decisions. In order to deal with huge amounts of futures data, the authors have studied strategy which consists of two parts, such as, one is Raw Data Treatment and Features Extraction, and other is DT-SVM Hybrid Model Training. G. Kesavaraj et al. [2], have presented the basic classification techniques and different kinds of classification method which are decision tree induction, bayesian networks, k-nearest neighbor classifier. Hwanjo Yu et al. [3], presented a new method called ClusteringBased SVM (CB-SVM) which is specifically designed for handling very large data sets. Their experiments on synthetic and real data sets show that CB-SVM is highly scalable for very large data sets while also generating high classification accuracy. Yongjun Piao et al. [4], had proposed an ensemble method for classification of highdimensional data, with each classifier constructed from a different set of features determined by partition of redundant features. Their method for the redundancy of features was considered to divide the original feature space then, each generated feature subset was trained by support vector machine and the results of each classifier were combined by the majority voting method.

proposed system is described briefly in this section. Fig.1 shows block diagram of proposed system.

Nicolas Usunier et al. [6], proposed to optimize a larger class of loss functions for ranking which is based on an Ordered Weighted Average (OWA) (Yager, 1988) of the classification losses. When aggregating hinge losses, the optimization problem is similar to the SVM for interdependent output spaces. Moreover, they showed OWA aggregates of margin based classification losses have good generalization properties. Experiments on the Letor 3.0 benchmark dataset for information retrieval validate our approach. In this paper [7], they presented a family of loss functions, the k-order statistic loss which includes stochastic gradient descent scale i.e. good for large collaborative data scale. Alexandros Karatzoglou et al. [8], tutorial focuses on the cutting-edge algorithmic developed in the area of recommender systems that provide a depth picture of the progress of ranking models in the field, summarized the strengths and weaknesses of existing methods, and discussed open issues that could be promised for future research in the community. Krisztian Balog et al. [9], performed an experimental comparison of these two strategies using supervised learning with a rich feature set. There were main finding is that ranking outperforms classification on all evaluation settings and metrics and analysis reveals that a ranking-based approach has more potential for future improvements.

Fig. 1. Block Diagram of the Proposed System

III. PROPOSED METHODOLOGY On the basis of user query, which applied on existing dataset will provide relevant items. The relevant items will be classified by classifiers as per the user interest. Korder ratings are used to sort relevant items in ascending or descending order. Outcomes which obtained from classification will be provided to ranking loss functions to get more relevant results. Each module is involved in

A. K-Order Rating: We consider the general recommendation task of ranking a set of items D for a given user; the returned list should have the most relevant items at the top. To solve this task, we are given a training set of users U each with a set of known ratings [6-7].

We consider the case where each user has purchased / watched / liked a set of items, which are considered as higher (positive) ratings. No low (negative) ratings are given. All low or negative rated items are thus considered as having an unknown rating. Here, we define the set Du to be the positive items for user u. Hence, we consider factorized models of the form: f d u  

1 Du

V

iDu

T i

Vd

where, V is an m × |D| matrix, one vector for each item, contains the parameters to be learnt. We are using a probability distribution P of drawing the ith position in a list of size K. This defines the choice of loss function.  Select a user at random from the training dataset  Select positive items o Compute for each o Sort the scores in descending order o Let be the index into that is in position in the list o Select a position using the distribution o Perform a learning step using the higher or positive item

978-1-4673-6540-6/15/$31.00 ©2015 IEEE

B. Meta Strategy: Meta Strategy is the heart of the system. It is consisting of classification, ranking and learning modules. 1) Learning: This function predicts the outcome of classification and ranking modules. To predict the outcome, it first learns the results coming from classification and ranking modules for a particular data set of size ‘s’ and whenever data set of similar size appears it predicts the result accordingly [10]. 2) Classification: In this module classification is done by using SVM and K-NN. Initially, classification of items is done by using SVM. The result obtained is further classified using K-NN to achieve more accurate results of classification. These classified items are given as an input to sigmoid Meta decider by sorting them using k-order. a) K-order ratings The clusters are obtained from classification. The data inside each cluster will get arranged in ascending or descending order by applying k-order on it. According to the dataset considered, clusters obtained will be of movies, directors, popularity and genres. After applying k-order, data inside the clusters of movies, directors and genres are sorted in ascending order and data in popularity cluster are arranged in descending order.

number of irrelevant elements that have a higher score. This count can be carried out by building the pair of Relevant and Irrelevant elements and checking the sign of different scores. Algorithm 1: Select a user at random from training dataset Select positive items Compute for each Sort the score by descending order Let be the index into that is in position in the list Select a position using the distribution Perform a learning step using positive item Set Select a random item

Make a gradient step to minimize:

Project weights to enforce constraints

b) Sigmoid Meta Decider Sigmoid Meta Decider normalizes the items and then provides a threshold value for size of classified dataset, based on this threshold value loss function for ranking will be decided. In this approach WARP and AUC are two loss functions have been used.

Set

Validation error does not improve

Threshold value for size of the dataset is calculated as:

b) AUC Algorithm

Threshold = [(Number of element of a cluster) / Range of Dataset] * (Number of clusters - 1)

Area under the Curve is a well-known loss function, sometimes known as Margin Ranking Loss. With respective above WARP algorithm which only focus on top ranked elements and ignore the low ranked elements. These low ranked elements considered as positive items by AUC loss function and then obtained well rank list. Algorithm 2:

Algorithm to choose loss function: If (sigmoid(x) =1/1+eXx) >0.5 then WARP else AUC 3) Ranking Loss Function: WARP and AUC are two loss functions used in this approach. WARP considers only positive items and, AUC considers both positive and negative items. Both loss functions are described briefly in this section. a) WARP Algorithm: WARP loss function that attempts to focus on the top of list by comparing the positive items from data sets. In the pairwise approach, the position of a given relevant element in the sorted list can be computed by counting the

Select a user at random from training dataset Select items Compute for each Sort the scores by descending order Let be the index into that is in position in the list Select a position using the distribution Perform a learning step using positive item Select a random item

978-1-4673-6540-6/15/$31.00 ©2015 IEEE

the system and other one is Meta strategy (i.e., Meta Learning) that is the heart of this architecture which classified into classification and ranking.

Make a gradient step to minimize: Project weights to enforce constraints Set Validation error does not improve C. Proposed Algorithm: Select a user Select

at random from training dataset items from classified data

Compute for each Initialize Swap model

Calculate Data For Each

Classification will filter the unstructured data into structured form by taking input from user. For this purpose, Classification uses the SVM and K-NN supervised techniques which are the most effective methods than other classification techniques [13]. Learning is the component that takes a filtered data from SVM and K-NN to learn how to perform in future. Proposed system uses the SVM kernel or function called Sigmoid, which is nothing but decider. Sigmoid performs decision making about the data it gains from classifiers that either it sends to AUC or WARP based on threshold value. If threshold value of sigmoid function is greater than 0.5 then WARP algorithm is used else AUC algorithm. At last, the learning will compare optimized loss function items with the filtered data and then obtained item is the top ranked item. IV. EXPERIMENTAL RESULTS

and create

for Classified

In this Section, we have included the results of classification techniques (SVM, K-NN and NB) and ranking algorithms (AUC, WARP and SWAP) with respect to different sizes of datasets. For experimental purpose, MovieLens dataset is used which is available on public domain.

Prepare

Algorithm1: Warp ( ) Swap ( ) = Warp ( ) Algorithm2: Auc ( ) Swap ( ) = Auc ( ) Sort Swap ( ) as dataset attribute Validation error does not improve D. System Architecture: Figure.3. Accuracy for different classifiers

SVM and K-NN classifiers give more accurate results than NB classifier with increase in dataset size.

Fig. 2. Proposed System Architecture

Figure 2 shows architecture of proposed system which is divided into two components, so as to separate internal representations of information from the ways that information will present to or accept from the user. First component is the user query through that user can request

Figure.4. Precision for different classifiers

978-1-4673-6540-6/15/$31.00 ©2015 IEEE

SVM gives more precise results than K-NN and NB classifiers with increase in dataset size.

According to Fig.7, SWAP gives more precise results tha AUC and WARP loss functions with increase in dataset size.

Figure.8. Recall for different loss functions

Figure.5. Recall for different classifiers

SVM and K-NN have less recall values as compared to NB. Hence, SVM and K-NN classifiers performed well with increase in dataset size.

According to Fig.8, SWAP gives better result with increase in dataset size compared to AUC and WARP loss functions. Because, the recall values of AUC and WARP is more than SWAP recall values. V. CONCLUSION We have studied various classification and ranking techniques to deliver the challenges of Data Mining for Recommendation Systems. From the study, we have identified SVM classifier which gives better outcomes, though still there is need to rank these classified movies dataset clusters which can be done using AUC and WARP. In particular, by focusing on the training of more highly ranked items, one can obtain better precision and recall metrics compared to those existing approaches. We introduced sigmoid function for deciding the use of either AUC or WARP to generate more precise results as per the user interest. Based upon threshold value obtained by sigmoid function, ranking algorithm provides a definite rank to resultant data items.

Figure.6. Accuracy for different loss functions

According to Fig.6, SWAP loss function give more accurate results than AUC and WARP loss functions with increase in dataset size.

From the experiment it is observed that SVM provide better result as compared to K-NN in terms of accuracy, precision and recall when we increase size of Movie dataset. Ranking algorithms used (AUC and WARP) have utilized results generated by SVM to provide better ranked results. Thus, proposed solution not only increases the efficiency but also increases the recall of results without affecting the diversity of feature set. REFERENCES [1]

[2]

[3]

[4] Figure.7. Precision for different loss functions

[5]

Dingxian Wang, Xiao Liu, Mengdi Wang, “A DT-SVM Strategy for Stock Futures Prediction with Big Data,”16th International Conference on Computational Science and Engineering 978-07695-5096-1/13, 2013, IEEE. G. Kesavaraj, Dr. S. Sukumaran, “A Study on Classification Techniques in Data Mining,” 4th ICCCNT – Tiruchengode, India, 31661, July 4 - 6, 2013, IEEE. Hwanjo Yu, Jiong Yang, Jiawei Han, “Classifying Large Data Sets Using SVMs with Hierarchical Clusters,” SIGKDD ’03 Washington, DC, USA, 1581137370/ 03/0008, 2003, ACM. Yongjun Piao, Hyun Woo Park, Cheng Hao Jin, Keun Ho Ryu, “Ensemble Method for Classification of High-Dimensional Data,” 978-1-4799-3919-0/14, 2014, IEEE. Mohammed GH. AL Zamil, “The Application of Semantic-based Classification on Big Data,” International Conference on

978-1-4673-6540-6/15/$31.00 ©2015 IEEE

Information and Communication Systems (ICICS) 978-1-47993023-4/14, 2014, IEEE. [6] Nicolas Usunier, David Buffoni, Patrick Gallinari, “Ranking with Ordered Weighted Pairwise Classification,” ICML ’09 Proceedings of the 26th Annual International Conference on Machine Learning, Pages 1057-1064, ISBN: 978-1-60558-516-1, New York, NY, USA, 2009, ACM. [7] Jason Weston, Hector Yee, Ron J. Weiss, “Learning to Rank Recommendations with the k-Order Statistic Loss,” RecSys’13, Hong Kong, China, 978-1-4503-2409-0/13/10 October 12–16, 2013, ACM. [8] Alexandros Karatzoglou, Linas Baltrunas, Yue Shi, “Learning to Rank for Recommender Systems,” RecSys’13, Hong Kong, China, 978-1-4503-2409-0/13/10 October 12–16, 2013, ACM. [9] Krisztian Balog, Heri Ramampiaro, “Cumulative Citation Recommendation: Classification vs. Ranking,” SIGIR’13, Dublin, Ireland, 978-1-4503-2034-4/13/07 July 28–August 1, 2013, ACM. [10] V Vitor R. Carvalho, Jonathan L. Elsas, William W. Cohen, Jaime G. Carbonell, “A Meta-Learning Approach for Robust Rank Learning,” SIGIR 2008 LR4IR, Learning toRrank for Information Retrieval, Singapore, 2008, ACM.

[11] Khalid Ibnal Asad, Tanvir Ahmed, Md. Saiedur Rahman, “Movie Popularity Classification based on Inherent Movie Attributes using C4.5, PART and Correlation Coefficient ” IEEE/OSI/IAPR International Conference on Informatics, Electronics & Vision, Bangladesh, 2012, IEEE. [12] Chi Zhang, Feifei Li, Jeffrey Jestes, “Efficient Parallel k-NN Joins for Large Data in MapReduce,” EDBT 2012, March 26-30, 978-14503-0790-1/12/03, Berlin, Germany, 2012, ACM. [13] Mohammed J. Islam, Q. M. Jonathan Wu, Majid Ahmadi, Maher A. Sid-Ahmed, “Investingating the performance of Naïve-Bayes Classifiers and k- Nearest Neighbor Classifiers,” 2007 International Conference on Convergence Information Technology, IEEE Computer Society, 0-7695-3038-9/07, 2007, IEEE. [14] Praful Koturwar, Sheetal Girase, Debajyoti Mukhopadhyay, “A Survay of Classification Techniques in the area of Big Data,” International Journal of Advance Foundation and Research in Computer (IJAFRC), ISSN 2348 - 4853, Volume 1, Issue 11, pp. 100-106, November 2014. [15] Praful Koturwar, Sheetal Girase, Debajyoti Mukhopadhyay, “Usage-Based Classification and Ranking with Machine Learning Techniques for Recommendations,” iPGCON – 2015 - Fourth Post Graduate Conference for Information Technology, March 2015.

978-1-4673-6540-6/15/$31.00 ©2015 IEEE