Revealing Hidden Connections in Recommendation Networks - arXiv

2 downloads 0 Views 1MB Size Report
Mar 27, 2012 - because most people do not follow the “recommendation chain”. Most of the return rates of these campaigns come directly from massive number of emails sent to the user database, rather than to ... In some networks, the types of relationships are extremely .... model developed by Erdös and Rényi (1959).
Revealing Hidden Connections in Recommendation Networks Rogério Minhano Universidade Federal do ABC - UFABC, Rua Santa Adélia., 166, Santo André, Brazil [email protected]

Stenio Fernandes Universidade Federal de Pernambuco - UFPE, Av. Prof. Moraes Rêgo 1235, Cidade Universitária, Recife, Brazil [email protected]

Carlos Kamienski Universidade Federal do ABC - UFABC, Rua Santa Adélia., 166, Santo André, Brazil [email protected] Abstract: Companies have been increasingly seeking new mechanisms for making their electronic marketing campaigns to become viral, thus obtaining a cascading recommendation effect similar to word-of-mouth. We analysed a dataset of a magazine publisher that uses email as the main marketing strategy and found out that networks emerging from those campaigns form a very sparse graph. We show that online social networks can be effectively used as a means to expand recommendation networks. Starting from a set of users, called seeders, we crawled Google’s Orkut and collected about 20 million users and 80 million relationships. Next, we extended the original recommendation network by adding new edges using Orkut relationships that built a much denser network. Therefore, we advocate that online social networks are much more effective than email-based marketing campaigns.. Keywords: recommendation networks, viral marketing Categories: J.4, I.6

1

Introduction

Viral marketing campaigns typically rely on word-of-mouth strategies, where existing users recommend products and services to their social networks. Explored by marketing professionals for decades, it is a well-known feature of human buying behaviour: people will be more interested in what a friend or acquaintance buys instead of randomly selecting a product. Those potential customers may buy the advertised product and/or send recommendations to a list of contacts who they believe they may have some influence on. This behaviour is usually promoted by rewarding customers with bonus products when a recommendation effectively is fulfilled with a purchase. Such strategies are called viral marketing, which means that the transmission of the advertising has an epidemic behaviour (i.e. spreading or increasing its occurrence like a disease). Viral marketing, in turn, is considered an instance of the more general idea of network-based marketing (Hill et al. 2006). In the Internet age, companies have been increasingly using new media, such as email and text messages, to obtain a cascading recommendation effect similar to direct human contact. The first step in this direction is to build a large user database, with thousands or millions records, where each record may be of either an existing customer or a potential one. Depending on the objectives of each marketing campaign, the database is segmented in order to send targeted emails to the most promising subset of users. The following step is to provide incentives for users to propagate the advertising message, thus creating a recommendation network (RN). Some rough numbers say that typical return rates of such campaigns (in terms of purchases) are about 0.5%. In this paper we analyse a database with subscription recommendations from a major Latin America magazine publisher using the theory of complex networks and their structural features, such as degree distributions, correlations among vertices, clustering coefficients, diameter, and average path lengths, in order to have a closer look at the viral marketing behaviour through network analysis. This database contains 28,562 people (acting as source and/or destination of recommendations) and 40,933 recommendations among them. We found out that, if we model the recommendations as a network, those people yield 9,562 sub-networks. In other words, those campaigns form a very sparse graph, where most sub-graphs have less than 3 vertices. This result shows that this type of campaign has a limited appeal among consumers, because most people do not follow the “recommendation chain”. Most of the return rates of these campaigns come directly from massive number of emails sent to the user database, rather than to the emails that are propagated through user´s social network connections. Further analysis revealed that it is possible to classify users’ behaviour in four well-defined types, namely “highly 1

recommended people”, “usual behaviour”, “good recommenders”, and “disseminators”. Users are mapped into these classes according to their connections in the social network. Our goal in this paper is to reveal the hidden relationships behind recommendation networks and to point out that Online Social Networks (OSN) can be effectively used as a means to promote viral marketing campaigns, by stimulating users to send recommendations of products or services to their friends. Starting from our existing dataset containing a sparse recommendation network, we crawled Google’s Orkut and collected about 20 million users and 80 million connections among them. A set of subscribers were used as seeders (i.e., those who actively contribute to the social network community) for crawling the social network and as a result we built a large network of Orkut users, in a single large connected component. Although theoretically any other OSN might suffice for our purposes, Orkut was chosen because it is the preferred one in the location where this research was conducted. We searched thousands of users of the recommendation network manually in Orkut and found exactly 1,625 seeders, which in turn were used as the entry points in the Orkut network. This is itself a contribution of this paper, because we are not aware of another research work where approximately 20% of the current Orkut network have been collected and analysed using common metrics of complex networks. In the field of recommendation system, there are some research studies on how to use social graphs to boost the performance of recommendation systems (De Choudhury 2010) or to leverage viral marketing (Chen et al., 2010). However, our research work is different. Here recommendation network is also a social network and to the best of our knowledge, the idea of combining recommendation networks and social networks is new. Moreover, we dedicated great efforts to crawl and analyze two social network data. A full analysis on the extended network is provided, such as average path length, clustering coefficient, and the like. Based on the relationships found in Orkut, we extended the original recommendation network, by adding new edges that built a much denser network. Those edges were given a weight in this extended recommendation network (ERN) according to the number of hops in social network, in order to make it possible to distinguish between them. It means that if two users of the RN are directly connected in Orkut, an edge is created in the ERN and assigned weight 1. If they are indirectly connected in Orkut through a single user, an edge is created in the ERN and assigned weight 2 and so on. Results also indicate that there is a direct relationship between user behavior in the RN and Orkut. In both networks the seeders are among the most connected ones (i.e. a higher degree) and have friends highly connected too. In other words, an active user in terms of sending recommendations in the RN is also active in keeping a large list of friends in Orkut. This may have a significant impact for marketing strategies, since professionals may use this information to make better use of the effective recommenders in their campaigns. Therefore, we advocate that online social networks should be explorer by marketing campaigns. This paper is organized as follows. Section 2 presents background information and discusses related work and section 3 explains the methodology used in this research. Sections 4 and 5 present our main results for the recommendation network and extended recommendation network respectively. We discuss the lessons learned and the possible outcomes from these results in section 6 and draw some conclusions in section 7

2

Background and Related Work

In this section, we present all the necessary technical background for an in-depth understanding of the paper. Also, we review the literature and show that our approach and results are unique and set the ground for further analysis of recommendation networks. 2.1

Viral Marketing

The study of epidemic behaviors in the network sciences area, like viruses spreading, and transmitting diseases, is highly relevant for understanding various areas that may be modeled as networks and their growing patterns (Kempe et al, 2003) (Iribarren et al, 2011) (Barash et al., 2012). A marketing technique called viral marketing has as its main feature the exploitation of this potential inherent to every social network. However, the path information travels to reach this epidemic stage is not straightforward. In some networks, the types of relationships are extremely important for a positive result. With the advent of the Internet, such advertising campaigns have been directed towards sending emails to potential customers, who may buy that product and/or send recommendations to a list of contacts who they believe they may have some influence on. The incentive companies use to promote this behavior is rewarding customers with bonus products. 2

The term viral marketing was first used in 1996 to describe the marketing strategy used by the free e-mail service Hotmail (Kaikati et al. 2004). Although this phenomenon has grown tremendously, there are still several interpretations for the term. Kiss and Bichler (Kiss and Bichler 2008), for example, define viral marketing as “marketing techniques that use social networks to produce increases in brand awareness by ‘viral’ diffusion processes, analogous to the spread of pathological and computer viruses.” In other words, a company uses the social network of consumers as a way of popularizing a brand or product through the messages’ dissemination. Bampo et al. (2008) define it as "a form of peer-to-peer communication in which individuals are encouraged to pass on promotional messages within their social networks". Viral marketing, in turn, is considered an instance of the more general idea of network-based marketing (Hill et al. 2006). Also, it is considered a type of word of mouth marketing, which aims at giving people reasons to exchange information about products/services and providing support for those conversations to take place (WOMMA 2005). This strategy often works through electronic messaging (email) containing information about products and services. Phelps et al. (2004) suggest that "the forces driving the growth of email marketing are low costs to the marketer, the ability to target messages selectively, and high response rates relative to other forms of direct consumer contact." However, as the use of email marketing by businesses becomes more widespread, consumers are dealing with such messages as spam, increasingly diminishing rates of return in marketing campaigns. This factor is extremely important for understanding the success of viral marketing. Since emails from viral marketing strategies come from people one knows, consumers are much more reluctant to delete the message. From the consumer’s point of view, it is convenient to receive recommendations for products and services of interest. When searching for product information, people usually consult online blogs, communities, or the websites of vendors. According to Jupiter Research Institute studies, the majority of online shoppers use online tools to find interesting products (Loechner 2009). Although the study shows that the most popular search methods are search engines and ecommerce sites, 61% of consumers use recommendation messages as the basis for their purchases. However, measuring the results of a viral marketing campaign is not trivial. According to Cruz and Fill (2008), only a limited number of research studies on the subject are available, which makes it impossible to determine what technique is most often used by professionals to measure such campaigns. They argue that it is very difficult to find a criterion to measure viral marketing because there are many ways in which users can be involved in a campaign. This interpretation is also supported by De Bruyn and Lilien (2008). They argue that it is difficult to explain why and how viral marketing works. Viral marketing campaigns result in peer-to-peer recommendations, thus increasing the credibility of the message. In addition, according to Rosen (2002) the acquisition of the product is part of a social process. This involves not only the interaction between company and customer, but also the exchange of information between people and the influences that are around the customer. The value of a customer for the company is not only related to the size of the purchase that he/she makes. His/her value should be measured by how many people on whom he/she can positively or negatively have an influence. According to Domingos (2005) wellconnected consumers can help, but it is important that they like the product. Also, even though there are evidences that recommendations help people to make informed choices and therefore are considered a positive influence, sometimes the opposite happens (Fitzsimons and Lehmann 2004), e.g. when unsolicited advice contradicts someone´s initial impression. A very important issue is how a member of a network is motivated to pass information. A person attending a cooking course will probably remember his/her classmates when he/she is involved in events related to cooking. In a bookstore, for example, if this person finds some books on issues raised during classes, he/she may consider that information being of great value to the group. At first, their relationships with classmates may just be based on the simple fact of sitting next to them while attending the cooking class. However, a new network is generated from the moment a book is shared within this group, which models attendees with a common interest on a particular subject. This type of network is similar to most common social networks, where communities sharing specific interests exchange information about products and events. When it comes to viral advertising campaigns, however, the motivation for having relationships are not necessarily of this type. In fact, the main purpose of a campaign is to capture the reason why someone passes on information that might be important to someone else. In the above example, a customer in a bookstore was the path to the book, to be known and possibly purchased by another person who may or may not be a book worm and knowledgeable of good publications. The main difference is that the impetus for the information disclosure was not caught. The publishers did not aim to generate situations wherein this behaviour occurs, or the customer was not stimulated by advertisements but by particular content. 3

However, in an advertising campaign via email, regardless of how good the product really is (and in fact the product quality always tends to be in the background in this type of disclosure) the campaign always tends to stimulate a momentary impulse and, as an exchange of favors, the stimulus tends to be better accepted if the participants are benefited. When recommendations are rewarded with a bonus, some people recommend dozens and become real good recommenders of advertising via email. On the other hand, others simply ignore the message and do not recommend to anyone. Also these campaigns are often targeted to a particular public, divided up by characteristics such as age, gender, income, and home address. This process generates an advertisement which is much more effective for that segmented public. People who develop such campaigns work with well-defined goals and a database containing details of potential customers. Therefore, advertising campaigns via email might no longer be considered a number of unwanted messages by potential customers, but might become highly productive mechanisms for both companies and customers. Its application must be carefully designed, since in a related scenario, i.e., in the Tweeter OSN, Harrigan et al. 2012 found that popular individuals have a significantly lower likelihood of retweeting, particularly when they are following a large number of individuals. In this paper we aim at shedding some light into the advantages of the integration of viral marketing and Online Social Networks. By knowing their customer’s social relationships, we expect marketing professionals will be able to create better and more effective campaigns. 2.2

Google’s Orkut

Orkut is an OSN currently dominated by the Brazilian users, since 50.6% of its users come from Brazil. Google does not make available the number of Orkut users, but other non-official sources estimate this number to be over 100 million worldwide. Also, privacy policies of Orkut are not too strict when compared to other OSN and user profiles are public by default, which makes it appropriate for our purposes. 2.3

Complex Networks Theory

The analyses performed in this work are supported by the theory of complex networks, commonly used in social network analysis and based on graph theory. This subsection presents an overview of this theory. However, for an in-depth understanding of the subject we refer the user to the work of Newman (2003). Research on complex networks is multidisciplinary per se. Indeed it is closely related to disciplines such as physics, biology, mathematics, statistics, and computing. Most social networks have non-trivial characteristics with connections patterns between its elements that are neither regular nor random (Barabási and Albert 1999). Some characteristics include the degree distribution of vertices, the clustering coefficient, communities and hierarchies in such networks. The theory of complex networks has been widely used in the study of human interactions (Barabasi 2005). In the past decade, several research papers have been published in a number of areas, for example on the topological structure of the Internet (Pastor-Satorras et al. 2001), the World Wide Web (Broder 2000), online blogs (Leskovec et al. 2007), online social networks (Ahn et al. 2007), instant messaging networks (Leskovec and Horvitz 2008), scientific collaboration networks (Newman 2001), a network of sexual relations (Jones et al. 2003), prostitution networks (Rocha et al. 2010), and networks formed by geographical positioning (Liben-Nowell et al. 2005). Traditionally, networks with complex topologies were described by the random graph model developed by Erdös and Rényi (1959). The ER model is simple because it assumes a fixed probability for a vertex to connect to another one, so that the resulting degree distribution of its vertices is Poisson. However, random graphs differ from reality as far as clustering of vertices and degree distributions are concerned, which led to the development of the small world and scale free networks models. Unlike the ER model, vertices in real networks tend to be highly clustered. Also, the average distance between vertices is short even for large networks. Watts and Strogatz (1998) used the name small world networks to characterize networks that simultaneously present short distances and high clustering among their vertices. Barabási and Albert (1999) created the concept of scale free networks, as they observed that in some real networks the probability of finding a highly connected vertex does not decrease exponentially as the vertex degree increases, as assumed by the ER model. Rather, they follow a power law distribution where their probability density function (PDF) has the form p(x)~kx^(-α), where p(x) is the probability of finding the value x, k is a constant and α is known as the scaling parameter. In general, for most networks found in nature, the scale parameter lies between the limits two and three, i.e. 2