This is an accepted version of a paper published in IEEE Transactions

0 downloads 0 Views 986KB Size Report
daily lives, the training data can be extremely sparse. The aim of this paper is to ... method that uses tensor factorization (or matrix factorization) to accurately ...
This is an accepted version of a paper published in IEEE Transactions on Information Forensics and Security. If you wish to cite this paper, please use the following reference: T. Murakami, H. Watanabe, Localization Attacks Using Matrix and Tensor Factorization, IEEE Transactions on Information Forensics and Security, Vol.11, No.8, pp.1647-1660, 2016. http://dx.doi.org/10.1109/TIFS.2016.2547865 © 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 11, NO. 8, AUGUST 2016

1

Localization Attacks Using Matrix and Tensor Factorization Takao Murakami*, Member, IEEE, and Hajime Watanabe

Abstract—It is known that various types of location privacy attacks can be carried out using a personalized transition matrix that is learned for each target user, or a population transition matrix that is common to all target users. However, since many users disclose only a small amount of location information in their daily lives, the training data can be extremely sparse. The aim of this paper is to clarify the risk of location privacy attacks in this realistic situation. To achieve this aim, we propose a learning method that uses tensor factorization (or matrix factorization) to accurately estimate personalized transition matrices (or a population transition matrix) from a small amount of training data. To avoid the difficulty in directly factorizing the personalized transition matrices (or population transition matrix), our learning method first factorizes a transition count tensor (or matrix), whose elements are the number of transition counts that the user has made, and then normalizes counts to probabilities. We focus on a localization attack, which derives an actual location of a user at a given time instant from an obfuscated trace, and compare our learning method with the ML (Maximum Likelihood) estimation method in both the personalized matrix mode and the population matrix mode. The experimental results using four real datasets show that the ML estimation method performs only as well as a random guess in many cases, while our learning method significantly outperforms the ML estimation method in all of the four datasets. Index Terms—location privacy, localization attack, Markov Chain, tensor factorization, matrix factorization.

I. I NTRODUCTION ITH the widespread use of GPS-equipped devices such as smartphones and in-car navigation systems, people are increasingly using various types of LBS (Location-based Services) such as mapping, route finding, and POI (Pointof-Interest) search. Many people also disclose their current location using geosocial services such as location check-in, tagging, and nearby friends [2], [3]. A recent survey in the U.S. reports that 74% of smartphone owners use LBS to obtain real-time location-based information, and 18% of smartphone owners use geosocial services such as location check-in [4]. Nowadays, a great deal of information about users’ locations or traces (time-series location trails) is stored in a data center, which is called Spatial Big Data [5]. This data is expected to help identify routes that avoid congestion or reduce fuel consumption.

W

A preliminary version of this paper was presented at the First IEEE International Workshop on Big Data Security and Privacy (BDSP 2014), Washington DC, USA, October, 2014 [1]. T. Murakami is with Information Technology Research Institute (ITRI), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, 135-0064, Japan (e-mail: [email protected]). H. Watanabe is with Information Technology Research Institute (ITRI), National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, 305-8568, Japan (e-mail: [email protected]).

It is frequently argued that while these services can provide significant benefits to users and society, the disclosure of location information raises serious privacy concerns. For example, the disclosure of a user’s location can lead to inference of sensitive locations such as homes and hospitals that he/she regularly visits. It can also expose other private information such as a user’s social relationship [6], [7], activities (e.g. working, shopping, sleeping) [8], [9], and long-term properties (e.g. age, gender, work role) [9], [10]. There is even a danger that a stalker exploits location information of the target user [11]. This kind of privacy is referred to as location privacy, and numerous studies have addressed this issue, mainly from the perspectives of attacks, defenses, and privacy metrics [12]– [14] (as described in detail in Section VII). One of the most popular approaches to location privacy attacks is based on a Markov Chain model [15]–[21]. Some studies showed that this model works very well through comparison experiments [15], [18]. This approach first converts locations and time into discrete values by dividing an area, in which users can move, into M distinct regions x1 , · · · , xM (or extracting M frequently visited POIs) (see the left side of Fig. 1) and partitioning time at a fixed interval (e.g. 20 minutes, 1 hour). It then assumes a Markov model for users’ behavior and computes an M × M transition matrix, which comprises the probability that a user moves from region xi to xj (1 ≤ i, j ≤ M ). Based on this matrix, the adversary can carry out a location prediction attack [16], [17], which predicts a future location (i.e. region ID) of a target user from a past location that he/she disclosed1 . The adversary can also carry out a more general attack called a localization attack [21], which derives an actual location of a target user from his/her trace that is obfuscated (e.g. by adding noise, merging regions, deleting some locations). The adversary can derive the actual location using either a personalized transition matrix that is learned for each target user or a population transition matrix that is common to all target users (see the middle/right side of Fig. 1). In this paper, we refer to the former case as a personalized matrix mode, and the latter as a population matrix mode. The personalized matrix mode has the advantage that the adversary can exploit unique features of each user’s behavior (e.g. favorite places or routes, walking speed) for deriving the actual location. It also enables a de-anonymization attack [18]–[21], which identifies 1 It should be noted that location prediction can be used to provide useful information services (e.g. recommendation of POIs, targeted advertisements). Indeed, many papers studied location prediction to this end [15], [16]. However, they can cause privacy issues when they are used by a malicious adversary, as discussed in [16], [17].

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 11, NO. 8, AUGUST 2016

Personalized Transition Matrices User uN

Regions x1 x2

x3 x4

x5 x6

x7 x8

User u1

x9 x10 x11 x12

x1

xj

Population Transition Matrix x1

xj

x16

x1

x16

x1 xi

p*ij

xi

p1ij

x13 x14 x15 x16 x16

x16

Fig. 1. Regions and transition matrices (M = 16). p1ij (resp. p∗ij ) is the probability that user u1 (resp. a user) moves from region xi to xj in the personalized matrix mode (resp. population matrix mode).

Training Traces User uN

Personalized Transition Matrices User uN User u1

User u1 x2 x2

x4

x3

x3

x2

x2

x4

x3

x3

x3

x2

x1 x2 x3 x4 x5 x1 x2 x3 x4 x5

? 0 0 0 ?

? ? ? 0.5 0 0.5 0.5 0.5 0 0 1 0 ? ? ?

? 0 0 0 ?

Fig. 2. Sparse data problem. “?” is an unobserved element whose transition probabilities cannot be estimated using the ML estimation method.

users from anonymized traces (i.e. traces whose user names are replaced by pseudonyms). Many users, however, disclose only a small amount of location information in their daily lives (e.g. a few times per day). Although the studies in [18]–[20] computed each personalized transition matrix using the ML (Maximum Likelihood) estimation method, the available training data can be extremely sparse in such cases2 . We explain the problem arising from sparse training data using the example shown in Fig. 2. In this example, the training data of user u1 consists of only 3 traces, each of which is composed of 4 locations (M = 5). Since the transition from region x1 or x5 is not observed in this trace, the transition probabilities from region x1 or x5 cannot be estimated using the ML estimation method. We refer to the corresponding elements, which are marked with “?” in Fig. 2, as unobserved elements. Also, the remaining elements are overfitted to the training traces (e.g. the transition probabilities from region x4 are either 0 or 1), which will generally result in poor predictive performance (i.e. the overfitting problem [22]). We refer to such elements as overfitted elements. If there are many unobserved elements or overfitted elements, the adversary cannot accurately perform location privacy attacks. The population matrix mode has an advantage over the personalized matrix mode in terms of the sparse data problem, since the adversary can use the training data of all target users to learn one matrix (i.e. the population transition matrix). However, even the population matrix mode might also suffer from the sparse data problem when the number of regions M is large, since the number of elements in the population matrix is 2 Although Shokri et al. [21] proposed a learning method that considers the case where some locations in training traces are missing, it provides almost the same performance as the ML estimation method when there is no missing locations, as described in Section II-C in details.

2

M 2 (i.e. it increases quadratically). Despite numerous studies in the field of location privacy, no studies have attempted to solve this sparse data problem, to the best of our knowledge. The aim of this paper is to clarify the risk of location privacy attacks in a realistic situation where the amount of training data is limited. To achieve this aim, we propose a learning method that solves the sparse data problem for both the personalized matrix mode and the population matrix mode. We apply our learning method to the localization attack [21], and show its effectiveness using four real datasets: the Geolife dataset [23], the Gowalla dataset [24], the CRAWDAD (epfl/mobility) dataset [25], and the CRAWDAD (roma/taxi) dataset [26]. A. Our Contributions The main contributions of our work are as follows: 1) We propose a learning method that regards a set of personalized transition matrices as a 3rd-order tensor and uses tensor factorization [27]–[31] (we denote this “factorization-based” learning method by “FC”). Here, the summation of transition probabilities over destinations is always 1, and it is intractable to directly factorize the personalized transition matrices under this constraint. To avoid this problem, our learning method first factorizes a transition count tensor, whose elements are the number of transitions that the user has made, and then normalizes counts to probabilities. By doing so, the transition matrices can be accurately estimated from a small amount of training data (Section IV-A). 2) We apply our learning method to the localization attack [21], and provide a detail comparison with the ML (Maximum Likelihood) estimation method [18]– [20] as follows. We first extend our learning method explained above to the population matrix mode by using matrix factorization [30]–[32] (we denote this method by “FC*”) (Section IV-B). We then compare our learning method with the ML estimation method in both the personalized matrix mode and the population matrix mode using four real datasets. The results show that the ML estimation method performs only as well as a random guess in many cases, while our learning method significantly outperforms the ML estimation method in all of the four datasets. This means that an attack based on the ML estimation method does not pose a threat in a “realistic” situation where the amount of training data is limited, while an attack based on our learning method is a threat even in this situation (Section V). We also show that FC can capture unique features of each user’s behavior by visualizing parameters in FC and the corresponding traces (Section VI). This paper is an extension of a previously published conference paper [1]. Note that while the first contribution above is covered in [1], the second contribution is new. More specifically, in [1] we applied FC to the location prediction attack [16], [17], and showed that it outperformed the ML estimation method in the personalized matrix mode. In [33], we also compared these methods when applied to the deanonymization attack [18]–[21]. However, since both [1] and

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 11, NO. 8, AUGUST 2016

Personalized Transition Matrices

Training Traces

User uN

User uN User u1 x2 x2

x4

x3

x3

x2

x2

x4

x3

x3

x3

x2

User u1 x1 x2 x3 x4 x1 ? ? ? ? x2 0 0.5 0 0.5 x3 0 0.5 0.5 0 x4 0 0 1 0 x5 ? ? ? ?

Localization Attack

[33] evaluated these methods only in the personalized matrix mode (i.e. they did not evaluate them in the population matrix mode), it remained unclear whether our learning method is really effective, compared to the ML estimation method. Also, both [1] and [33] conducted experiments using only a single dataset. In this paper, we provide a detail comparison between our learning method and the ML estimation method in both the personalized matrix mode and the population matrix mode using four real datasets.

3

x5 ? 0 0 0 ?

Obfuscated Traces

B. Paper Organization The rest of this paper is organized as follows. In Section I-C, we define the basic notation used throughout this paper. In Section II, we describe the localization attack and the sparse data problem. In Section III, we explain matrix/tensor factorization. In Section IV, we propose a learning method using tensor factorization to solve the sparse data problem. We also extend this learning method to the population matrix mode by using matrix factorization. In Section V, we show the experimental results. In Section VI, we visualize parameters in tensor factorization and the corresponding traces. In Section VII, we survey the previous work related to ours. Finally, we conclude this paper in Section VIII.

C. Notation Let U = {u1 , · · · , uN } be a set of N targeted users, and X = {x1 , · · · , xM } be a set of M regions (or POIs). We assume that time is discrete, and express time instants as integers (i.e. a set of time instants is Z = {· · · , −2, −1, 0, 1, 2, · · · }). Let Pn be an M × M transition matrix of user un ∈ U, and pn,i,j be its (i, j)-th element (i.e. the probability that user un ∈ U in region xi ∈ X will move to region xj ∈ X in the next time instant). Let further P∗ be an M × M population transition matrix, and p∗,i,j be its (i, j)-th element (i.e. the probability that an average user in region xi ∈ X will move to region xj ∈ X ). We also denote a set of natural numbers less than or equal to N by [N ] (= {1, 2, · · · , N }). In Sections II-A, II-B, II-C, and IV-A, we focus on the personalized matrix mode, in which the adversary uses personalized transition matrices {Pn |n ∈ [N ]}. In Sections II-D and IV-B, we describe the population matrix mode, in which he/she uses a population transition matrix P∗ . II. L OCALIZATION ATTACKS AND S PARSE DATA P ROBLEMS In this section, we begin by describing a framework for localization attacks based on the location-privacy framework introduced by Shokri et al. [21] (Section II-A). We then describe an algorithm for localization attacks (Section II-B), and the sparse data problem (Section II-C). Since we focus on the personalized matrix mode from Sections II-A to II-C, we finally explain the case of the population matrix mode (Section II-D).

User un 1

2

3

4

5

T-1

T

Fig. 3. Framework for localization attacks. In the training phase, the adversary learns personalized transition matrices {Pn |n ∈ [N ]} using training traces. In the localization phase, the adversary derives an actual location of user un at a given time instant from his/her obfuscated trace (gray rectangles) using Pn .

A. Framework for Localization Attacks Fig. 3 shows a framework for localization attacks. This framework consists of the following two phases: training phase and localization phase. In the training phase, the adversary learns personalized transition matrices {Pn |n ∈ [N ]} using training traces. For training traces, we make a general assumption: the target user can disclose his/her own locations to the public using geosocial services (e.g. location checkin, tagging, nearby friends); the adversary may obtain the training traces of the target user by observing the user in person. However, the number of training traces can still be very small since many users disclose only a small amount of location information in their daily lives. It is also difficult for the adversary to observe the user in person, unless they are in a close relationship with each other. In the localization phase, user un ∈ U produces a new trace that contains some sensitive locations (e.g. home, hospital). He/she (or a trusted third party [34]–[36]) obfuscates the trace and discloses it (either in realtime or after producing the actual trace) to use LBS while protecting his/her location privacy. Examples of obfuscation include perturbation [37]– [39], which adds noise to a location, location generalization (or region merging) [21], [40], [41], which combines multiple regions into one group, and location hiding [16], [21], [42], which deletes some locations from a trace. The adversary obtains the obfuscated trace, and derives the actual location of user un at a given time instant using the personalized transition matrix Pn . It should be noted that this attack is more general than the location prediction attack [16], [17], which predicts a future location of user un from a past location that he/she disclosed. Suppose that user un discloses his/her location at time instant 1. Then, the location prediction attack can be regarded as a special case of the localization attack where the actual trace is obfuscated by deleting all locations except for the location at time instant 1.

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 11, NO. 8, AUGUST 2016

B. Algorithm for Localization Attacks We briefly review the localization attack using the ForwardBackward algorithm [43], which was formalized in [21]. Let X ′ be a set of possible obfuscated regions. For example, we can use a power set of X , which includes an obfuscated region produced by perturbation, location generalization (or region merging), and location hiding, as X ′ (i.e. X ′ = P(X )). Let further on (t) ∈ X ′ be an obfuscated region of user un ∈ U at time instant t ∈ [T ], and on = (on (1), · · · , on (T )) be an obfuscated trace of user un the adversary obtains. Let Xn,t be a random variable that represents a region user un ∈ U is in at time instant t ∈ [T ]. Then, the posterior probability that the region user un is in at time instant t is xi ∈ X , given the obfuscated trace on = (on (1), · · · , on (T )), is written as follows: Pr(Xn,t = xi |on ).

(1)

This probability can be estimated using the personalized transition matrix Pn by the Forward-Backward algorithm [43]. In this paper, we do not explain the computation of (1) using the Forward-Backward algorithm (for details, see [21]), but explain how to derive the actual location of user un at time instant t using the estimated posterior probabilities {Pr(Xn,t = xi |on )|i ∈ [M ]}. For example, the adversary can choose region xi whose posterior probability Pr(Xn,t = xi |on ) is the highest as an estimated region. This approach is known as the Bayes decision rule [44], and maximizes the attack success probability (i.e. the probability that the estimation is correct) if Pr(Xn,t = xi |on ) is accurately estimated. Another approach is to sort M regions in descending order of the posterior probabilities and choose the top L (≤ M ) regions as candidates. This approach includes the Bayes decision rule as a special case where L = 1, and maximizes the probability that a correct answer is included in the L candidates. It should be noted that if the size of each region (or the number of candidates L) is large, it is difficult to estimate an exact location such as a hospital and home using only this attack. However, even in this case, the adversary may estimate the exact location by combining this attack with other auxiliary information. For example, suppose that a user disclosed the fact that he/she went to some hospital (but did not clarify which hospital) at a certain time by posting a message on a social networking site. By combining this fact with the localization attack, the adversary may narrow down the area and identify the hospital, which might specialize in a particular disease (e.g. cancer, mental illness), and therefore reveal the disease of the user. C. Sparse Data Problems The studies in [18]–[20] computed each transition matrix using the ML (Maximum Likelihood) estimation method. However, the ML estimation method suffers from the sparse data problem (i.e. there are many unobserved elements or overfitted elements in the personalized transition matrix Pn ) when only a small number of training traces are available to the adversary, as described in Section I. If there are many unobserved elements or overfitted elements, the adversary cannot

4

accurately estimate the posterior probability Pr(Xn,t = xi |on ) in (1), and therefore cannot carry out a localization attack with sufficient accuracy. In [21], Shokri et al. proposed a learning method that considers the case where some locations in training traces are missing. Specifically, their method involves learning a distribution of a transition matrix Pn by assuming a Dirichlet prior distribution for a vector pn,i = (pn,i,1 , · · · , pn,i,M ) and estimating missing locations using the Gibbs sampling method. However, since this method computes each transition matrix independently, it also suffers from the sparse data problem. More specifically, the mode of the estimated distribution of Pn is equal to the ML estimate when there is no missing locations in training traces. Thus, in this case, the learning method in [21] provides almost the same performance as the ML estimation method (we also confirmed this in our experiments in Section V). D. Population Matrix Mode Both the ML estimation method [18]–[20] and the Gibbs sampling method [21] can also be applied to the population matrix mode, in which the adversary uses the population transition matrix P∗ . In this case, the adversary assumes that the transition probability from region xi ∈ X to xj ∈ X is the same for all target users, and is given by p∗,i,j (as defined in Section I-C). By substituting P∗ for Pn in Section II-B, the adversary can carry out the localization attack in the population matrix mode. Since the adversary can use training traces of all target users to train P∗ , the adversary can use a larger amount of training data to estimate each element of the matrix (in other words, the number of elements that need to be estimated is reduced from N M 2 to M 2 ). However, as described in Section I, the population matrix mode has the following two disadvantages: (i) the adversary cannot exploit unique features of each user’s behavior (e.g. favorite places or routes, walking speed) for deriving a location; (ii) he/she cannot carry out the de-anonymization attack [18]–[21]. Furthermore, the adversary might still suffer from the sparse data problem in the case the number of regions M is large (i.e. he/she wants to know an exact location of the user), since the number of elements in the population matrix is M 2 (i.e. it increases quadratically). III. M ATRIX /T ENSOR FACTORIZATION A tensor is a mathematical object that can be regarded as a multidimensional array [27]. It includes a vector as a 1st-order tensor, and a matrix as a 2nd-order tensor. In this paper, we regard a set of personalized transition matrices as a 3rd-order tensor that is composed of the “User” mode, “From Region” mode, and “To Region” mode, and use tensor factorization [27]–[31] (resp. matrix factorization) to overcome the sparse data problem in the personalized matrix mode (resp. population matrix mode). We also refer to this 3rd-order tensor as a transition probability tensor (see Fig. 4). In this section, we briefly explain matrix factorization (Section III-A) and tensor factorization (Section III-B).

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 11, NO. 8, AUGUST 2016

5

From Region

Matrix Factorization

? 0 0 0 ?

0.8 0 0.2 0 0 ? ?? ?? ?? ?? ? 0 0.5 0 0.5 ?0 0.5 ?0 ?00 0.5 ? ? ?? ??? 0??? 0??? 0.5 ?? 0? 0.5 0.5 ? ??0 00.5 ?? 0??0 0?? 0.8 ? 0.2 0.5 0 0.5 0 0.8 0.20.5 0 00 000 0.5 ?0 ?1 ?0 ?0 ? ? ? ? ?

Tensor Factorization (PITF)

VT

User

U (a )

U

To Region

U (b ) V (a ) V (b ) U (c ) From Region To Region V (c )

ˆ A

Fig. 5. Matrix factorization and PITF applied to a tensor composed of “User” mode, “From Region” mode, and “To Region” mode.

Fig. 4. Transition probability tensor.

pairwise interaction between all three modes as follows: A. Matrix Factorization

(a)

Matrix factorization is one of the most popular approaches to item recommendation [32]. It factorizes a large matrix into two low-rank matrices to approximate its elements (including unobserved/overfitted elements) from a small amount of training data, which is called low-rank approximation. Here we consider a matrix A ∈ RM ×M as an example (we consider a square matrix for ease of explanation). Matrix factorization approximates A as follows: ˆ = UVT , A

(2)

ˆ is the approximation of A, and U ∈ RM ×K and where A M ×K V∈R are low-rank matrices called feature matrices (K is generally much smaller than M ). That is, it computes the ˆ as the product of U and VT (see the left approximation A side of Fig. 5). Let an,i ∈ R be the (n, i)-th element of A, and a ˆn,i be the approximation of an,i . Let further un (resp. vi ) ∈ RK be the n-th row of U (resp. i-th row of V). The approximation a ˆn,i is given by un and vi as follows: a ˆn,i = ⟨un , vi ⟩ =

K ∑

un,k vi,k .

(3)

k=1

un and vi are called feature vectors, and un,k ∈ R and vi,k ∈ R are called model parameters. By low-rank approximation, each element is influenced by similar feature vectors, and consequently elements (including unobserved ones) can be effectively estimated from a small amount of training data.

(b)

(c)

(c)

(b) a ˆn,i,j = ⟨u(a) n , vi ⟩ + ⟨un , vj ⟩ + ⟨ui , vj ⟩

=

K ∑ k=1

(a) (a)

un,k vi,k +

K ∑ k=1

(b)

(b)

un,k vj,k +

K ∑

(4)

(c) (c)

ui,k vj,k , (5)

k=1

where a ˆn,i,j is the approximation of an,i,j by PITF, and (a) (a) (b) (b) (c) (c) un , vi , un , vj , ui , vj ∈ RK are K-dimensional feature vectors (K is generally much smaller than N and M ). Let U(a) ∈ RN ×K , V(a) ∈ RM ×K , U(b) ∈ RN ×K , (b) V ∈ RM ×K , U(c) ∈ RM ×K , and V(c) ∈ RM ×K be (a) the corresponding feature matrices (e.g. un is the n-th row of U(a) ). Then, we can see from (3) and (4) that PITF is a generalization of matrix factorization in that it models the interaction between “User” and “From Region” as U(a) V(a)T , the interaction between “User” and “To Region” as U(b) V(b)T , and the interaction between “From Region” and “To Region” as U(c) V(c)T (see the right side of Fig. 5). By factorizing a tensor this way, each element is influenced by similar feature vectors (i.e. similar users and similar regions), and can be effectively estimated from a small amount of training data. IV. L EARNING T RANSITION M ATRICES U SING M ATRIX /T ENSOR FACTORIZATION We now propose a learning method using tensor factorization to overcome the sparse data problem in the personalized matrix mode (Section IV-A). We also extend our learning method to the population matrix mode (Section IV-B). A. Learning Personalized Transition Matrices

B. Tensor Factorization Tensor factorization is a generalization of matrix factorization to a 3rd-order tensor. There are various methods to factorize a 3rd-order tensor: TD (Tucker decomposition) [27], CP (CD/PARAFAC) [27], PITF (Pairwise Interaction Tensor Factorization) [28], [29], etc. In this paper, we focus on PITF because of its high performance; it outperforms TD largely in runtime, and achieves much higher accuracy than CP [28]. We now explain PITF in detail. Here we consider a tensor A ∈ RN ×M ×M that is composed of the “User” mode, “From Region” mode, and “To Region” mode as an example (note that in this example, an element does not take a probability but a real number). Let an,i,j ∈ R be the (n, i, j)-th element of the tensor A. PITF approximates an,i,j by modeling the

In Section III-B, we explained an example of factorizing a tensor composed of real numbers. However, very few studies have factorized a tensor composed of transition probabilities such as Fig. 4. The study of Rendle et al. [29], which is one such example, factorized a tensor composed of transition probabilities to recommend items (e.g. books, DVDs). However, they aimed at creating a personal ranking of items, and did not estimate transition probabilities themselves. On the other hand, the localization attack, which is the focus of this paper, uses the personalized transition matrix Pn (i.e. the transition probabilities themselves) to compute the posterior probability Pr(Xn,t = xi |on ) in (1). To the best of our knowledge, no studies have attempted to estimate the transition probabilities themselves using tensor factorization.

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 11, NO. 8, AUGUST 2016

4 0 ? ?? ?? 0 02 ?02 ?20 ? ?? ??? ??? ??? 0 0 0 ?1 ??0 ??1 ??0 0 0 0 02 002 040 000 0 ?0 ?1 ?0 ?0 ? ? ? ? ? To Region

1 ?? ?00 2?? 0?? 11 ?

0 ?? ?2 0? 4? 1

Tensor Factorization

0 ? ? 2 1

ˆ that minimizes the following regularized model parameters Θ squared error [30]–[32]: ∑ ˆ = arg min Θ (an,i,j − a ˆn,i,j )2 + λ||Θ||2F , (8)

Transition Probability Tensor

From Region

From Region

Transition Count Tensor

0.70.1 0.2 0 0 0.40.2 0.30.4 0.30.3 0 00 0.1 0.10.4 0.5 0 0.4 0 0.1 0 0.2 0.4 00 0.2 0.60.1 0.20.2 0.40.2 0.40.1 0 0.4 0 0.2 0.3 0.3 0.1 0.10.1 0.10.4 0 0 0.5 0 0.1 0.50.1 0.10.2 0.40.2 0 0.3 0.2 0.3 0.20.7 0.20.2 0 0.2 0 0.1 0 0.5 0 0.1 0 0.4 0 0.7 0.2 0.30.5 0 0 0.10.4 0 00 0.9 0.10.2 0 0.4 0 0.4 0 0 0.40.2 0.4 To Region

Θ≥0

Normalization to probabilities

Fig. 6. Proposed learning method of a transition probability tensor (i.e. personalized transition matrices). It first computes a transition count tensor from training data, and factorizes this tensor. It then computes a transition probability tensor by normalizing counts to probabilities. “?” is an element whose corresponding element in a transition probability tensor is unobserved.

Thus we propose a learning method that estimates each element of a transition probability tensor using tensor factorization. Here it must be noted that the summation of transition probabilities over “To Region” is always 1 (i.e. ∑M p = 1), and that it is intractable to directly factorize n,i,j j=1 a transition probability tensor under this constraint. To avoid this problem, our learning method first factorizes a transition count tensor, whose (n, i, j)-th element is the number of transitions from region xi ∈ X to xj ∈ X that user un ∈ U has made, and then computes a transition probability tensor by normalizing counts to probabilities (see Fig. 6). We begin by providing an overview of our learning method. Our learning method applies a transition count tensor to the tensor A in Section III-B (i.e. an,i,j is the number of transitions from region xi to xj that user un has made). Let Θ be a set of model parameters in PITF: Θ = =

{U(a) , V(a) , U(b) , V(b) , U(c) , V(c) }

(6)

(a) (a) (b) (b) (c) (c) {un,k , vi,k , un,k , vj,k , ui,k , vj,k |

n ∈ [N ], i, j ∈ [M ], k ∈ [K]}.

6

(7)

Our learning method computes a transition probability tensor as follows: 1) Compute a transition count tensor A by counting each transition from the training data (in Fig. 6, an element in A, whose corresponding element in a transition probability tensor is unobserved, is marked with “?”). 2) Compute a set of model parameters Θ from A, using the algorithm described below. 3) Compute {ˆ an,i,j |n ∈ [N ], i, j ∈ [M ]} from Θ using (5) and (7). 4) Compute a transition probability tensor from {ˆ an,i,j |n ∈ [N ], i, j ∈ [M ]} by normalizing counts to probabilities. It should be noted that although we use PITF as a factorization method in this paper, we can use other factorization methods such as TD and CP in our learning method. We now explain how to compute a set of model parameters Θ from A (i.e. step 2) in detail. Here we note that the number of transitions an,i,j is not a probability but a nonnegative value. Under this constraint, we compute a set of

(n,i,j)∈D

∑M where D = {(n, i, j)|n ∈ [N ], i, j ∈ [M ], j ′ =1 an,i,j ′ ≥ 1}. The first term in (8) is the summation of squared errors over the set of elements D whose corresponding elements in a transition probability tensor are observed (i.e. not marked ˆ based on observed elements with “?”). That is, we compute Θ only, and then estimate all elements including unobserved ones ˆ (using (5) and (7)). The second term is introduced based on Θ to avoid overfitting the observed elements, and is called a regularization term. || · ||2F is the Frobenius norm (i.e. square sum of all elements) and λ (> 0) is called a regularization parameter. λ controls the extent of regularization, and is generally determined by cross-validation [32]. Θ ≥ 0 means that all model parameters are non-negative, which guarantees the non-negativity of the estimated transition count a ˆn,i,j (see (5) and (7)). Since (8) is not convex for all model parameters in Θ, it is intractable to find a closed-form solution to (8). Thus we use ANLS (Alternating Nonnegative Least Square) [30], [31], which is a popular approach to finding an approximate solution. ANLS solves an optimization problem for one model parameter θ ∈ Θ while fixing all other model parameters Θ \ {θ}, and iterates this process in a cyclic manner until convergence. By substituting (5) into (8) and solving (8) for one model parameter while fixing all other model parameters, we obtain the following update formulas: ] [∑ (a) (a) (a) (a − a ˆ + u v )v n,i,j D1,n n,i,j n,k i,k i,k (a) + un,k ← (9) ∑ (a) 2 (v ) + λ D1,n i,k [∑ ] (a) (a) (a) (a − a ˆ + u v )u n,i,j D2,i n,i,j n,k i,k n,k (a) + vi,k ← (10) ∑ (a) 2 D2,i (un,k ) + λ ] [∑ (b) (b) (b) ˆn,i,j + vj,k un,k )vj,k D1,n (an,i,j − a (b) + un,k ← (11) ∑ (b) 2 D1,n (vj,k ) + λ [∑ ] (b) (b) (b) ˆn,i,j + vj,k un,k )un,k D3,j (an,i,j − a (b) + vj,k ← (12) ∑ (b) 2 D3,j (un,k ) + λ [∑ ] (c) (c) (c) ˆn,i,j + ui,k vj,k )vj,k D2,i (an,i,j − a (c) + ui,k ← (13) ∑ (c) 2 (v ) + λ D2,i j,k [∑ ] (c) (c) (c) ˆn,i,j + ui,k vj,k )ui,k D3,j (an,i,j − a (c) + vj,k ← , (14) ∑ (c) 2 D3,j (ui,k ) + λ ∑M where D1,n = {(i, j)|i, j ∈ [M ], j ′ =1 an,i,j ′ ≥ 1}, ∑M D2,i = {(n, j)|n ∈ [N ], j ∈ [M ], j ′ =1 an,i,j ′ ≥ 1}, ∑M D3,j = {(n, i)|n ∈ [N ], i ∈ [M ], j ′ =1 an,i,j ′ ≥ 1}, and [x]+ = max(x, ϵ) (ϵ is a small positive value, and typically ϵ = 10−16 [30], [31]).

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 11, NO. 8, AUGUST 2016

7

(a)

Algorithm 1 (Learning a Transition Probability Tensor): 1) Compute a transition count tensor A from training data. 2) Compute model parameters Θ from A by iteratively updating parameters using (9)-(14) until convergence. 3) Compute {ˆ an,i,j |n ∈ [N ], i, j ∈ [M ]} from Θ using (5) and (7). 4) Compute a transition probability tensor from {ˆ an,i,j |n ∈ [N ], i, j ∈ [M ]} by normalizing counts to probabilities. Let Z be the total number of transitions in the training data, and η be the number of iterations in the step 2. Then, the time complexity of each step from 1 to 4 can be expressed as O(Z), O(ηK|D|), O(N M 2 ) and O(N M 2 ), respectively. Thus, the total time complexity is O(Z+ηK|D|+N M 2 ). In comparison, the ML estimation method [18]–[20], which computes step 1 and 4 only, has the time complexity O(Z + N M 2 ).

B. Learning a Population Transition Matrix Since a tensor includes a matrix as a second-order tensor, our learning method in Section IV-A can be easily extended to the population matrix mode. In this case, we factorize a transition count matrix, whose (i, j)-th element is the number of transitions from region xi ∈ X to xj ∈ X in the training data of all target users, and then compute a transition probability matrix (i.e. population transition matrix) by normalizing counts to probabilities (see Fig. 7). We apply a transition count matrix to the matrix A in Section III-A (i.e. ai,j is the number of transitions from region xi to xj in the training data of all target users), and compute a set of model parameters in matrix factorization. Let Θ∗ be the set of model parameters: Θ∗

= {U, V}

(15)

= {ui,k , vj,k |i, j ∈ [M ], k ∈ [K]}.

(16)

Transition Count Matrix 1 0 0 0 0

Transition Probability Matrix

From Region

From Region

We first update U(a) (= {un,k |n ∈ [N ], k ∈ [K]}) using (9), (a) and then update V(a) (= {vi,k |i ∈ [M ], k ∈ [K]}) using (10). We update the other parameters (i.e. U(b) , V(b) , U(c) , V(c) ) in the same way using (11)-(14), and iterate this update procedure (a) (a) (b) (b) (c) (c) until the values un,k , vi,k , un,k , vj,k , ui,k , and vj,k converge. As for initialization of model parameters, we should avoid setting all parameters to 0, because otherwise the denominator in (9) will become very small, and consequently parameters will oscillate and converge slowly. In our experiments in Section V, we adopted the widely-used random initialization method [45], which initializes each element as a random number between 0 and 1. To sum up, our learning algorithm in the personalized matrix mode can be written as follows:

0 0 0 0 4 3 3 0 10 9 1 1 0 5 2 1 0 0 0 1 To Region

Matrix Factorization

0.80.1 0.1 0 0 0.10.4 0.30.2 0 0.10.3 0.30.2 0.1 0 0.1 0.60.2 0.1 0 0 0.10.1 0.8 To Region

Normalization to probabilities

Fig. 7. Proposed learning method of a transition probability matrix (i.e. population transition matrix). It factorizes a transition count matrix, which is computed from training data of all target users, and normalizes counts to probabilities.

be written as follows: [∑ ui,k



(ai,j D∗ 1,i



[∑ vj,k



−a ˆi,j + ui,k vj,k )vj,k

D∗ 1,i

(ai,j D∗ 2,j



]

(vj,k )2 + λ

−a ˆi,j + ui,k vj,k )ui,k

D∗ 2,j

(ui,k )2 + λ

+

(17)

] +

, (18)

∑M where D∗1,i = {j|j ∈ [M ], j ′ =1 ai,j ′ ≥ 1} and D∗2,j = {i| ∑M i ∈ [M ], j ′ =1 ai,j ′ ≥ 1} (λ is a regularization parameter). We update U (= {ui,k |i ∈ [M ], k ∈ [K]}) using (17), and then update V (= {vj,k |j ∈ [M ], k ∈ [K]}) using (18). We iterate this update procedure until the values ui,k and vj,k converge. Our learning algorithm in the population matrix mode can be written as follows: Algorithm 2 (Learning a Transition Probability Matrix): 1) Compute a transition count matrix A from training data. 2) Compute model parameters Θ∗ from A by iteratively updating parameters using (17) and (18) until convergence. 3) Compute {ˆ ai,j |i, j ∈ [M ]} from Θ∗ using (3) and (16). 4) Compute a transition probability matrix from {ˆ ai,j |i, j ∈ [M ]} by normalizing counts to probabilities. If K is much smaller than M , the number of model parameters in the matrix factorization (= 2M K) is much smaller than the number of elements in the population transition matrix (= M 2 ). It is also smaller than the number of model parameters in PITF (= (2N + 4M )K). Thus, our learning method in the population matrix mode has the fewest parameters, which is desirable when the training data is extremely sparse. However, it cannot exploit unique features of each user’s behavior (e.g. favorite places or routes), nor be applied to the de-anonymization attack [18]–[21], as described in Section II-D. V. E XPERIMENTAL E VALUATION A. Experimental Set-up

Using ANLS (Alternating Nonnegative Least Square) [30], [31], we can compute model parameters that approximates the ones that minimize the regularized squared error (in the same way as Section IV-A). The update formulas in this case can

We carried out experiments to show the effectiveness of our learning method using four real datasets: the Geolife dataset [23], the Gowalla dataset [24], the CRAWDAD (epfl/mobility) dataset [25], and the CRAWDAD (roma/taxi) dataset [26].







We divided each of the four areas (Beijing, New York, the San Francisco Bay Area, and Rome) into 16 × 16 regions (M = 256), where the boundaries were determined so that all the traces were uniformly distributed along each axis (i.e. 1/16 = 6.25[%] per each bin). We used, for each user (or taxi), Y ∈ {1, 3, 5, 7, 9} trace(s) as training data, and the remaining 10 − Y trace(s) as testing data. Here we attempted 10 ways to randomly choose Y training trace(s) from 10 traces, and carried out, for each case, the following experiments. Using the training data, we computed the personalized transition matrices {Pn |n ∈ [N ]} (or a population transition matrix P∗ ). For comparison, we evaluated our learning method and the ML (Maximum Likelihood) estimation method [18]– [20] in both the personalized matrix mode and the population matrix mode3 : • •

ML: The ML estimation method [18]–[20] in the personalized matrix mode (see Section II-C). ML*: The ML estimation method [18]–[20] in the pop-

3 We do not report the performance of the Gibbs sampling method in [21] because this method provided almost the same performance as the ML estimation method in our experiments, as described in Section II-C.

0011

1111

0000 0001 0010 0011

0010

Geolife dataset: The Geolife dataset [23] was collected by Microsoft Research Asia. It contains GPS traces of 182 users in a period of over five years (from April 2007 to August 2012), and contains a variety of private movements such as going home, going to work, and shopping, mostly in Beijing. In our experiments, we used the traces in Beijing. We chose 80 users who had long traces (N = 80), and extracted, for each of them, 10 traces each of which comprises 10 locations and has a time interval of more than 30 minutes (we eliminated the remaining 102 users, since we had insufficient data to extract 10 such traces for these users). Gowalla dataset: The Gowalla dataset [24] contains 6442890 check-ins of 196591 users over a period of over two years (from February 2009 to October 2010). Although there are a large number of check-ins, they are scattered all over the world. In our experiments, we used the traces in New York. We chose 173 users who had long traces (N = 173), and extracted, for each of them, 10 traces each of which comprises 10 locations (a time interval is more than 30 minutes) in the same way as in the Geolife dataset. CRAWDAD (epfl/mobility) dataset: The CRAWDAD (epfl/mobility) dataset [25] contains GPS traces of 536 taxis collected over 30 days in the San Francisco Bay Area. We chose 529 taxis that had long traces (N = 529), and extracted, for each of them, 10 traces each of which comprises 10 locations (a time interval is more than 30 minutes) in the same way as in the above datasets. CRAWDAD (roma/taxi) dataset: The CRAWDAD (roma/taxi) dataset [26] contains GPS traces of 306 taxis collected over 30 days in Rome. We chose 306 taxis that had long traces (N = 306), and extracted, for each of them, 10 traces each of which comprises 10 locations (a time interval is more than 30 minutes) in the same way as in the above datasets.

0001



8

0000

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 11, NO. 8, AUGUST 2016

x1 x17 x33 x49

x2 x18 x34 x50

x3 x19 x35 x51

x4 x20 x36 x52

x16 x32 x48 x64

1111 x241 x242 x243 x244

x256

Fig. 8. Generalized region (M = 256). For example, LGLH(1, 0.1), on input region x34 , outputs regions {x33 , x34 , x49 , x50 } (gray area) with probability 0.9, and outputs ⊥ that represents the region is hidden with probability 0.1.

ulation matrix mode (see Section II-D). FC: Our learning method using tensor factorization in the personalized matrix mode (see Section IV-A). • FC*: Our learning method using matrix factorization in the population matrix mode (see Section IV-B). In ML and ML*, we set a transition probability pn,i,j (or p∗,i,j ) for an unobserved element to 1/256 (i.e. we assumed a uniform distribution for these elements). In FC and FC*, we set the dimension K of feature vectors to K = 16 (since this provided high performance in [1]). We initialized the model parameters Θ (or Θ∗ ) by using the random initialization method [45], which initializes each parameter as a random number between 0 and 1. After the initialization, we computed Θ (or Θ∗ ) using our learning method. Here we set the regularization parameter λ to λ = 10−3 (we measured performance for λ values 10−1 , 10−2 , 10−3 , 10−4 , and 10−5 , and confirmed that the performance was almost the same in each case). After computing the transition matrix Pn (or P∗ ), we computed stationary probabilities (steady state probabilities) for user un (or all users), which are necessary in computing the posterior probability Pr(Xn,t = xi |on ) in (1) (for details, see [21]), by multiplying uniform probabilities by the transition matrix Pn (or P∗ ) repeatedly until convergence in the same way as [19]. Using the testing data, we evaluated the performance of the localization attack, which is described in Sections II-A and II-B. We first obfuscated each region in the testing traces using the location generalization and location hiding method [21] with parameters b and ϕ (denoted by LGLH(b, ϕ)). This method generalizes (merges) a region by dropping lower b bit(s) for each of the x-coordinate and y-coordinate (expressed as a binary sequence), and hides (deletes) a region with probability ϕ. For example, LGLH(1, 0.1), on input region x34 , outputs regions {x33 , x34 , x49 , x50 } with probability 0.9, and outputs ⊥ (⊥ represents that the region is hidden) with probability 0.1 (see Fig 8). For parameters b and ϕ, we tested b ∈ {0, 2} and ϕ ∈ {0.5, 0.8}. We then evaluated the performance in the case when the adversary estimated each region in the obfuscated traces by choosing L (1 ≤ L ≤ 256) candidates in descending order of posterior probabilities (see Section II-B). As a performance measure, we used the attack success rate, which is the ratio of the number of successful attacks (i.e. attacks in which the chosen L candidates include a correct answer) divided by the total number of attacks (i.e. N × (10 − Y ) × 10). We averaged •

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 11, NO. 8, AUGUST 2016

9

(i) Geolife dataset

90 80 70

ML ML* FC FC* random guess

60 50

0

64

128

192

256

LGLH(2,0.5)

100 80 60 ML ML* FC FC* random guess

40 20

0

64

128

192

# Candidates L

# Candidates L

LGLH(0,0.5)

LGLH(0,0.8)

256

LGLH(2,0.8)

100 75 50 ML ML* FC FC* random guess

25 0

0

64

128

192

Attack Success Rate[%]

100

Attack Success Rate[%]

LGLH(0,0.8) Attack Success Rate[%]

Attack Success Rate[%]

LGLH(0,0.5)

256

100 75 50 ML ML* FC FC* random guess

25 0

0

# Candidates L

64

128

192

256

# Candidates L

80 70

ML ML* FC FC* random guess

60 50

0

64

128

192

256

80 60 ML ML* FC FC* random guess

40 20

0

64

128

192

# Candidates L

# Candidates L

LGLH(0,0.5)

LGLH(0,0.8)

256

LGLH(2,0.8)

100 75 50 ML ML* FC FC* random guess

25 0

0

64

128

192

Attack Success Rate[%]

90

LGLH(2,0.5)

100

Attack Success Rate[%]

100

Attack Success Rate[%]

Attack Success Rate[%]

(ii) Gowalla dataset

256

100 75 50 ML ML* FC FC* random guess

25 0

0

# Candidates L

64

128

192

256

# Candidates L

80 70

ML ML* FC FC* random guess

60 50

0

64

128

192

256

80 60 ML ML* FC FC* random guess

40 20

0

64

128

192

# Candidates L

# Candidates L

LGLH(0,0.5)

LGLH(0,0.8)

256

LGLH(2,0.8)

100 75 50 ML ML* FC FC* random guess

25 0

0

64

128

192

Attack Success Rate[%]

90

LGLH(2,0.5)

100

Attack Success Rate[%]

100

Attack Success Rate[%]

Attack Success Rate[%]

(iii) CRAWDAD (epfl/mobility) dataset

256

100 75 50 ML ML* FC FC* random guess

25 0

0

# Candidates L

64

128

192

256

# Candidates L

80 70

ML ML* FC FC* random guess

60 50

0

64

128

192

# Candidates L

256

80 60 ML ML* FC FC* random guess

40 20

0

64

128

192

256

# Candidates L

LGLH(2,0.8)

100 75 50 ML ML* FC FC* random guess

25 0

0

64

128

192

# Candidates L

256

Attack Success Rate[%]

90

LGLH(2,0.5)

100

Attack Success Rate[%]

100

Attack Success Rate[%]

Attack Success Rate[%]

(iv) CRAWDAD (roma/taxi) dataset 100 75 50 ML ML* FC FC* random guess

25 0

0

64

128

192

256

# Candidates L

Fig. 9. Relationship between the number of candidates L and the attack success rate (Y = 1).

the attack success rate over all the 10 ways to randomly choose training traces to obtain a stable performance. B. Experimental Results We firstly evaluated the performance in the case when only one trace per user (or taxi) is used as training data (Y = 1). Fig. 9 shows the relationship between the number of candidates L and the attack success rate. The dashed line represents the performance of a random guess, which randomly chooses L candidates from a generalized region (or all regions when the region is hidden). For example, when we set L = 2, the attack success rate of a random guess is 50.4%

(= 0.5×1+0.5×2/256) and 3.1% (= 0.2×2/16+0.8×2/256) for LGLH(0, 0.5) and LGLH(2, 0.8), respectively4 . We also show in Table I the attack success rate when we used LGLH(2, 0.5) and fixed L = 1, 8, or 64. It can be seen from Fig. 9 that ML performs only as

4 Note that the attack success rate is always more than 50% for LGLH(0, 0.5), since LGLH(0, 0.5) does not obfuscate half of the regions. We included such unobfuscated regions in the testing data when computing the attack success rate, because we would like to compare the four obfuscation methods using the same testing data (i.e. N × (10−Y) × 10 regions). For example, we can easily verify from Fig. 9 that the attack success rate is the highest (resp. lowest) in LGLH(0, 0.5) (resp. LGLH(2, 0.8)).

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 11, NO. 8, AUGUST 2016

5 In the Geolife dataset and the Gowalla dataset, there are rare exceptions where ML provides the best performance (e.g. when we use LGLH(2, 0.8) and set L = 1). This is because many people stay in the same region for a long time in these datasets. However, since ML performs only as well as a random guess in most cases, we can say that it almost never works when Y = 1. 6 We also confirmed that FC* provided the best performance in the CRAWDAD (epfl/mobility) dataset, even if the number of taxis is not N = 529 but N = 80 (i.e. same with the number of users in the Geolife dataset).

50 40 30

1

3

ML ML* FC FC* 5 7 9

# Training Traces

Attack Success Rate[%]

3

# Training Traces

Attack Success Rate[%]

10 1

ML ML* FC FC* 5 7 9

60

L = 64

95 85 75 65 55 1

3

ML ML* FC FC* 5 7 9

# Training Traces

40

ML FC

ML* FC*

30 20 10

1

3

5

7

9

# Training Traces

L=8

64 58 52 46 40

1

3

ML ML* FC FC* 5 7 9

# Training Traces

Attack Success Rate[%]

(ii) Gowalla dataset L=1

L = 64

95 85 75 65 55

1

3

ML ML* FC FC* 5 7 9

# Training Traces

(iii) CRAWDAD (epfl/mobility) dataset L=1

14 11 8 5 2

1

3

ML ML* FC FC* 5 7 9

# Training Traces

L=8

54 48 42 36

ML FC

30 24

1

3

5

ML* FC* 7

9

# Training Traces

Attack Success Rate[%]

well as a random guess in most cases5 . This is because the training data is extremely sparse; for each user (or taxi), the number of transitions in the training data is 9, while the number of elements in the personalized transition matrix is M 2 = 65536. It can also be seen that ML* outperforms ML in most of the cases, since much more training data (e.g. 529 times more in the CRAWDAD (epfl/mobility) dataset) are used to estimate each element. However, ML* still performs similar to a random guess in many cases (e.g. LGLH(0, 0.5), LGLH(2, 0.5)), since the number of elements is very large (M 2 = 65536). On the other hand, FC and FC* significantly outperform the ML estimation method and a random guess in most cases, since they overcome the sparse data problem. From this result, we can say that the localization attack using our learning method is a threat even in a realistic situation where the amount of training data is limited. It can be seen from Fig. 9 and Table I that FC provides the best performance in the Geolife dataset and the Gowalla dataset, while FC* provides the best performance in the CRAWDAD (epfl/mobility) dataset6 . We believe this is because FC captures unique features of each user’s behavior in the Geolife dataset and the Gowalla dataset, while this is not the case in the CRAWDAD (epfl/mobility) dataset. In the CRAWDAD (epfl/mobility) dataset, each taxi does not have very distinct features (we confirmed this by visualizing the traces). It seems natural, since the destination of the taxi is often determined not by the driver but the customer. FC* outperforms FC in such cases, since the number of

15

L=8

70

L = 64

84 72 60 48 36

1

3

ML ML* FC FC* 5 7 9

# Training Traces

(iv) CRAWDAD (roma/taxi) dataset L=1

22 16 10 4

1

3

ML ML* FC FC* 5 7 9

# Training Traces

L=8

60 50 40 30 20

1

3

ML ML* FC FC* 5 7 9

# Training Traces

Attack Success Rate[%]

(iv) CRAWDAD (roma/taxi) dataset ML ML* FC FC* L=1 6.56 9.47 13.93 13.25 L=8 28.37 34.08 48.99 47.96 L = 64 58.56 58.15 78.78 79.61

20

Attack Success Rate[%]

(epfl/mobility) dataset ML* FC FC* 9.79 9.30 10.97 40.88 41.76 45.50 69.24 74.92 77.72

25

Attack Success Rate[%]

(iii) CRAWDAD ML L=1 4.54 L=8 27.32 L = 64 60.71

L=1

Attack Success Rate[%]

FC* 14.15 53.35 85.52

Attack Success Rate[%]

L=1 L=8 L = 64

(ii) Gowalla dataset ML ML* FC 20.41 13.91 21.39 42.24 49.73 58.94 62.38 74.82 87.09

Attack Success Rate[%]

FC* 10.97 49.07 79.46

Attack Success Rate[%]

L=1 L=8 L = 64

(i) Geolife dataset ML ML* FC 11.93 10.64 14.54 36.00 40.14 50.47 61.31 60.98 81.05

(i) Geolife dataset

Attack Success Rate[%]

TABLE I T HE ATTACK SUCCESS RATE [%] (Y = 1, LGLH(2, 0.5)). T HE BEST PERFORMANCE IS HIGHLIGHTED IN BOLDFACE .

10

L = 64

90 80 70

ML FC

60 50 1

3

5

ML* FC* 7

9

# Training Traces

Fig. 10. Relationship between the number of training traces Y and the attack success rate (LGLH(2, 0.5)).

parameters is smaller (as described in Section IV-B). On the other hand, since the Geolife dataset and the Gowalla dataset contain a variety of private movements (e.g. going home, going to work), each user’s trace has distinctive features. FC effectively captures such features from a small amount of training data, and exploits it for the localization attack. In Section VI, we show that this is indeed the case by visualizing model parameters in FC and the corresponding traces. We secondly evaluated the performance in the case when the number of training traces per user (or taxi) Y is changed from 1 to 9. Here we used LGLH(2, 0.5) as an obfuscation mechanism. Fig. 10 shows the relationship between the number of training traces Y and the attack success rate (for the number of candidates L, we tested three cases: L = 1, 8, and 64). We also show in Table II the percentage of unobserved elements in ML and ML*. It can be seen from Fig. 10 that in many cases, the performance of ML or ML* increases rapidly as the number of training traces Y increases from 1 to 9. In particular, when Y is large (e.g. Y = 7 or 9), ML outperforms FC and FC* in many cases in the Geolife dataset and the Gowalla dataset,

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 11, NO. 8, AUGUST 2016

TABLE II T HE PERCENTAGE OF UNOBSERVED ELEMENTS [%].

Y =1 97.82 42.81

(i) Geolife dataset Y =3 Y =5 Y =7 94.86 92.79 91.05 20.35 12.15 8.20

ML ML*

Y =1 97.88 25.08

(ii) Gowalla dataset Y =3 Y =5 Y =7 95.39 93.54 92.03 15.90 13.48 11.64

ML ML*

(iii) CRAWDAD (epfl/mobility) dataset Y =1 Y =3 Y =5 Y =7 Y =9 96.81 91.06 86.13 81.54 77.59 9.26 5.86 4.92 4.57 4.02

ML ML*

(iv) CRAWDAD (roma/taxi) dataset Y =1 Y =3 Y =5 Y =7 97.10 92.48 88.73 85.49 30.27 14.06 8.32 6.68

ML ML*

Y =9 89.48 5.55

11

to K = 4. We initialized the model parameters Θ in FC by using the random initialization method [45], determined the regularization parameter λ by 10-fold cross-validation, and trained Θ using our learning method. Then we examined the values of the following model parameters: {U(a) , V(a) , U(b) , V(b) }

Y =9 90.71 10.63

Y =9 82.47 4.92

despite the fact that there are still many unobserved elements (as shown in Table II (i) and (ii)). This is because many users stay in the same region for a long time in these datasets. ML provides a good performance in such datasets when the amount of training data is large (on the other hand, ML provides the worst performance in the taxi datasets even when Y = 9, since each taxi moves very fast). It should be noted again, however, that the available training data can be extremely sparse in reality, since many users disclose only a small amount of location information in their daily lives. For example, when Y = 1 (i.e. when more than 96% of the elements in the personalized transition matrices are unobserved, as shown in Table II), ML performs only as well as a random guess in most cases, as shown in Fig. 9. ML* also performs only as well as a random guess in many cases. On the other hand, our learning method derives the location of a target user with much higher accuracy than a random guess in all of the four datasets even when Y = 1. VI. V ISUALIZATION OF M ODEL PARAMETERS AND T RACES In Section V, we showed that our learning method in the personalized matrix mode (i.e. FC) outperformed the other learning methods (i.e. FC*, ML, ML*) in the Geolife dataset and the Gowalla dataset. To explain the reason for this, we show how well FC captures unique features of each user’s behavior from a small amount of training data in these datasets. Specifically, we show that each element of the personalized transition matrices in FC is indeed influenced by similar users and similar regions (as described in Section III-B) by visualizing model parameters and the corresponding traces. In our analysis, we used all of the 80 users in the Geolife dataset, and chose 80 users from 173 users in the Gowalla dataset (to equalize the number of users). We chose, for each dataset, 2 traces per user (i.e. 80 × 2 × 10 locations) from the traces extracted in Section V-A, and used them as training data. For the purpose of simple visualization, we divided each of the two areas (Beijing and New York) into 4 × 4 regions at regular intervals, and set the dimension K of feature vectors

(a)

(b)

(b) = {u(a) n , vi , un , vi |n ∈ [N ], i ∈ [M ]}

=

(a) (a) (b) (b) {un,k , vi,k , un,k , vi,k |n

(19)

∈ [N ], i ∈ [M ], k ∈ [K]} (20)

(i.e. the first and second terms ∑ in (5)). Here, we∑normalized (a) (a) N M these model parameters so that n=1 un,k = 1, i=1 vi,k = ∑M (b) ∑N (b) 1, n=1 un,k = 1, and i=1 vi,k = 1, for the purpose of visualization. Fig. 11 and 12 show a visualization of {U(a) , U(b) } and {V(a) , V(b) }, respectively. In Fig. 11 and 12, the elements of {U(a) , U(b) } and {V(a) , V(b) } whose corresponding model parameters are more than 0.025 and 0 are filled with gray, (a) (a) respectively (e.g. u8,1 > 0.025 and v5,1 = 0.12 > 0 in the Geolife dataset). We now explain what these figures illustrate. Since our learning method approximates a transition count (a) (a) (b) (b) (c) (c) an,i,j as a ˆn,i,j = ⟨un , vi ⟩ + ⟨un , vj ⟩ + ⟨ui , vj ⟩ (see (4)), the user un ∈ U who has a high model parameter (a) (b) un,k (or un,k ) tends to go to the region xi ∈ X whose (a) (b) corresponding model parameter vi,k (or vi,k ) is high. For example, it can be seen from Fig. 11(i)(H) and 12(i)(H), that users u4 , u5 , u10 , · · · , u80 tend to go to regions x4 and x7 in the Geolife dataset. In this sense, users u4 , u5 , u10 , · · · , u80 are similar users, and regions x4 and x7 are similar regions. In other words, we can regard {U(a) , U(b) } (Fig. 11) as user profiles, and {V(a) , V(b) } (Fig. 12) as region profiles. Fig. 13 shows a visualization of traces. The leftmost figure shows all of the traces (80 × 2 traces) used in our analysis. Each location is represented as a dot, and each trace is represented as a line (that has 10 dots). The other 8 figures ((A)-(H)) show the traces of the users whose model parameters (a) (a) (b) (b) (un,1 , un,2 , · · · , un,3 , or un,4 ) are more than 0.025. In the Geolife dataset, many traces are concentrated around the office of Microsoft Research Asia at the northwestern part of region x11 . In the Gowalla dataset, many traces are around Manhattan, which is at the western part of region x7 . It can be seen that the model parameters really capture similar users and similar regions. For example, it can be seen from Fig. 11(i)(F), 12(i)(F), and 13(i)(F) that the users who (b) have high un,2 (i.e. u3 , u7 , · · · , u76 ) really stay in region x11 (i.e. around the office) in the Geolife dataset. It can also be seen from Fig. 11(i)(H), 12(i)(H), and 13(i)(H) that the users (b) who have high un,4 (i.e. u4 , u5 , · · · u80 ) really go to x7 (i.e. northeast of the office). Since one of them goes to x4 , the others may also be interested in x4 . A similar tendency can be observed in the Gowalla dataset (e.g. it can be seen from Fig. 11(ii)(F), 12(ii)(F), and 13(ii)(F) that users u21 and u32 go to region x8 , and one of them also goes to x12 ). FC propagates such unique features among similar users and similar regions in this way, and exploits this to derive a location. We believe this is the reason FC provided the best performance in the Geolife dataset and the Gowalla dataset.

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 11, NO. 8, AUGUST 2016

12

(i) Geolife dataset (n =) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80

(A) u (B) u (C) u (D) u

(a) n ,1 (a) n, 2 (a) n,3 (a) n, 4

(n =) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80

(E) un( b,1) (F) un( b, 2) (G) un( b,3) (H) un( b, 4)

(ii) Gowalla dataset (n =) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80

(A) u (B) u (C) u (D) u

(a) n ,1 (a) n,2 (a) n ,3 (a) n,4

(n =) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80

(E) un( b,1) (F) un( b, 2) (G) un( b,3) (H) un( b, 4) (a)

(b)

Fig. 11. Visualization of {U(a) , U(b) } = {un,k , un,k |n ∈ [N ], k ∈ [K]}. The elements whose corresponding model parameters are more than 0.025 are (a) u8,1

filled with gray (e.g.

> 0.025 in the Geolife dataset).

(i) Geolife dataset Regions x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16

(B) vi(,a2)

(A) vi(,a1) 0.03 0

0

0

0

0.12 0.04 0 0.03 0.31 0 0

0

0 0.34

0

(D) vi(,a4)

(C) vi(,a3)

0.24 0 0.42 0

(F) vi(,b2)

(E) vi(,b1)

(H) vi(,b4)

(G) vi(,b3)

0

0

0 0.22

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0 0.01 0.99 0

0

0

0

0

0

0

0

0

0

0 0.88 0

0

0

0 0.12

0 0.11

0

0

0

0

0 0.78 0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

1

0

0

0

0

0

0

0 0.36

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

(ii) Gowalla dataset (B) vi(,a2)

(A) vi(,a1)

Regions x1 x2 x3 x4

0

0

0 0.28 0 0.04

0 0.16 0

0

0

0 0.77 0.01 0

0 0.86 0 0.01

0

0

0

0

0

0 0.15 0.02 0

0.11 0.05 0.11 0

0

0.03 0.39 0

0

0

0

0

(a)

0.12 0

0

0

(F) vi(,b2)

(E) vi(,b1)

0 0.83 0

0

x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16

(D) vi(,a4)

(C) vi(,a3)

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0 0.99

0

0

1

0

0

1

0

0

0 0.03 0

0

0.94 0

(H) vi(,b4)

(G) vi(,b3) 0

0

0

0

0

0

0

0 0.01

0 0.02 0

0

0

0

0 0.01

0

0

0

0

0

0

0

0

0.05 0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

(b)

Fig. 12. Visualization of {V(a) , V(b) } = {vi,k , vi,k |i ∈ [M ], k ∈ [K]}. The value of each model parameter is written in the corresponding element. The (a)

elements whose corresponding model parameters are more than 0 are filled with gray (e.g. v5,1 = 0.12 > 0 in the Geolife dataset).

(i) Geolife dataset 40.5

All Traces

40.5

40

39.5 115.8

(A) un( a,1) > 0.025

40

office 116.3

116.8

40.5

(B) un( a, 2) > 0.025

40

39.5 115.8

office 116.3

116.8

40.5

(C) un( a,3) > 0.025

40.5

40

39.5 115.8

office 116.3

116.8

39.5 115.8

(D) un( a, 4) > 0.025

40

office 116.3

116.8

39.5 115.8

40.5

(E) un(b,1) > 0.025

40

office 116.3

116.8

39.5 115.8

40.5

(F) un(b, 2) > 0.025

40

office 116.3

116.8

39.5 115.8

40.5

(G) un(b,3) > 0.025

40

office 116.3

116.8

39.5 115.8

40.5

(H) un(b, 4) > 0.025

40

office 116.3

116.8

39.5 115.8

office 116.3

116.8

(ii) Gowalla dataset 41.2

All Traces

41.2

Manhattan

40.7

40.2 -74.5

(A) un( a,1) > 0.025 Manhattan

40.7

-74

-73.5

40.2 -74.5

41.2

(B) un( a, 2) > 0.025 Manhattan

40.7

-74

-73.5

40.2 -74.5

41.2

(C) un( a,3) > 0.025 Manhattan

40.7

-74

-73.5

40.2 -74.5

41.2

(D) un( a, 4) > 0.025 Manhattan

40.7

-74

-73.5

40.2 -74.5

41.2

(E) un(b,1) > 0.025 Manhattan

40.7

-74

-73.5

40.2 -74.5

41.2

(F) un(b, 2) > 0.025 Manhattan

40.7

-74

-73.5

40.2 -74.5

41.2

(G) un(b,3) > 0.025 Manhattan

40.7

-74

-73.5

40.2 -74.5

41.2

(H) un(b, 4) > 0.025 Manhattan

40.7

-74

-73.5

40.2 -74.5

-74

-73.5

Fig. 13. Visualization of traces. The leftmost figure contains all of the traces (80 × 2 traces). The other 8 figures ((A)-(H)) contain the traces of the users (a) (a) (b) (b) whose model parameters (un,1 , un,2 , · · · , un,3 or un,4 ) are more than 0.025. In the Geolife dataset, the office is located at the northwestern part of region x11 . In the Gowalla dataset, Manhattan is at the western part of region x7 .

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 11, NO. 8, AUGUST 2016

VII. R ELATED W ORK There are a number of studies in the field of location privacy; surveys can be found in [12]–[14]. The topics of these studies can be roughly classified into the following three categories: attacks, defenses, and privacy metrics. Location privacy attacks based on a Markov chain model have been widely studied in the literature [15]–[21]. Based on this model, the adversary can carry out various kinds of attacks: location prediction attacks [16], [17], localization attacks [21], de-anonymization attacks [18]–[21], etc. Some studies showed that this model works very well. For example, Song et al. [15] compared several location predictors, and showed that the low-order Markov-based predictor outperformed the others. Shokri et al. [18] showed that an adversary who knows personalized transition matrices of the users deanonymizes and de-obfuscates traces with higher accuracy than an adversary who only knows each user’s prior probabilities. Mulder et al. [20] de-anonymized 100 users with a success rate of 77% to 88% using traces in a period of one month as training data. Gambs et al. [19] de-anonymized 59 users in the Geolife dataset [23] with a success rate of 45% using half of the dataset (over two years) as training data. However, the training data can be extremely sparse in reality, as we have noted repeatedly. Thus we proposed a learning method using tensor factorization (or matrix factorization) to address this problem. There are also a number of defenses that have been proposed to protect location privacy. They can be broadly divided into two categories: anonymization and obfuscation. Wellknown anonymization methods include k-anonymity based methods [34]–[36] and the mix-zone method [46], [47]. As for obfuscation, it can be further classified into the following four categories: perturbation (adding noise)) [37]–[39], location generalization (merging regions) [21], [40], [41], location hiding (deleting) [16], [21], [42], and adding dummy locations [40], [48]–[50]. In our experiments, we used not an anonymization method but an obfuscation method (i.e. location generalization and location hiding method [21]), since the localization attack is an attack against traces that are not anonymized but obfuscated. To evaluate the effectiveness of location privacy attacks or defenses, several location privacy metrics have been proposed in the literature. Well-known metrics include k-anonymity based metrics [34], [35], entropy-based metrics [46], errorbased metrics [21], [51], and differential privacy-based metrics [38], [39], [52]–[54]. Shokri et al. [55] pointed out that kanonymity and entropy are not adequate to measure location privacy. They also showed in [21] that these metrics are not correlated with the attack success rate. In our experiments, we used the attack success rate as a metric of location privacy. Finally, apart from location privacy, we note that tensor (or matrix) factorization (surveys can be found in [27], [30]) is widely studied in the field of machine learning. However, no studies have attempted to estimate transition probabilities using tensor (or matrix) factorization, to the best of our knowledge. Thus we proposed a learning method that factorizes a transition count tensor (or matrix), and then normalizes counts

13

to probabilities. VIII. C ONCLUSION In this paper, we proposed a learning method using tensor factorization to overcome the sparse data problem in the personalized matrix mode. We also extended it to the population matrix mode by using matrix factorization. We focused on the localization attack [21], and compared our learning method with the ML estimation method in both the personalized matrix mode and the population matrix mode (i.e. FC, FC*, ML, ML*). The experimental results using four real datasets showed that while the ML estimation method performs only as well as a random guess in many cases, our learning method significantly outperforms the ML estimation method in all of the four datasets, which strongly supports the effectiveness of our learning method. We also showed that our learning method in the personalized matrix mode captures unique features of each user’s behavior in the Geolife dataset and the Gowalla dataset by visualizing model parameters and traces. Although we focus on the fact that the amount of training data can be very small, it can also happen in reality that the training traces contain missing locations [21]. As future work, we would like to extend our learning method to be able to compute model parameters in tensor factorization (or matrix factorization) while estimating missing locations in the training traces to further improve accuracy in such cases. ACKNOWLEDGEMENT The authors would like to thank Jacob Schuldt (AIST), Atsunori Kanemura (AIST), Shotaro Akaho (AIST), and Hideitsu Hino (University of Tsukuba) for technical comments on this paper. Also, this study was supported in part by Okawa Foundation Research Grant. R EFERENCES [1] T. Murakami and H. Watanabe, “Location prediction attacks using tensor factorization and optimal defenses,” in Proceedings of the 1st IEEE International Workshop on Big Data Security and Privacy (BDSP’14), 2014, pp. 13–21. [2] H. P. Li, H. Hu, and J. Xu, “Nearby friend alert: Location anonymity in mobile geosocial networks,” IEEE Pervasive Computing, vol. 12, no. 4, pp. 62–70, 2013. [3] A. Vaccari, “Introducing a new optional feature called nearby friends,” http://newsroom.fb.com/news/2014/04/introducing-a-new-optionalfeature-called-nearby-friends, April 2014. [4] K. Zickuhr, “Three-quarters of smartphone owners use locationbased services,” http://www.pewinternet.org/2012/05/11/three-quartersof-smartphone-owners-use-location-based-services, May 2012. [5] S. Shekhar, M. R. Evans, V. Gunturi, and K. Yang, “Spatial big-data challenges intersecting mobility and cloud computing,” in Proceedings of the 11th ACM International Workshop on Data Engineering for Wireless and Mobile Access (MobiDE’12), 2012, pp. 1–12. [6] N. Eagle, A. Pentland, and D. Lazer, “Inferring friendship network structure by using mobile phone data,” in Proceedings of the National Academy of Sciences (PNAS), vol. 106, no. 36, 2009, pp. 15 274–15 278. [7] I. Bilogrevic, K. Huguenin, M. Jadliwala, F. Lopez, J.-P. Hubaux, P. Ginzboorg, and V. Niemi, “Inferring social ties in academic networks using short-range wireless communications,” in Proceedings of the 12th ACM Workshop on Privacy in the Electronic Society (WPES’13), 2013, pp. 179–188. [8] L. Liao, D. Fox, and H. Kautz, “Extracting places and activities from GPS traces using hierarchical conditional random fields,” International Journal of Robotics Research, vol. 26, no. 1, pp. 119–134, 2007.

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 11, NO. 8, AUGUST 2016

[9] V. W. Zheng, Y. Zheng, and Q. Yang, “Joint learning user’s activities and profiles from GPS data,” in Proceedings of the 2009 International Workshop on Location Based Social Networks (LBSN’09), 2009, pp. 17–20. [10] Y. Matsuo, N. Okazaki, K. Izumi, Y. Nakamura, T. Nishimura, and K. Hasida, “Inferring long-term user properties based on users’ location history,” in Proceedings of the 20th International Joint Conference on Artifical Intelligence (IJCAI’07), 2007, pp. 2159–2165. [11] J. Voelcker, “Stalked by satellite: An alarming rise in GPS-enabled harassment,” IEEE Spectrum, vol. 47, no. 7, pp. 15–16, 2006. [12] C. Bettini, S. Jajodia, P. Samarati, and S. X. Wang, Privacy in LocationBased Applications: Research Issues and Emerging Trends. Springer, 2009. [13] J. Krumm, “A survey of computational location privacy,” Personal and Ubiquitous Computing, vol. 13, no. 6, pp. 391–399, 2009. [14] C.-Y. Chow and M. F. Mokbel, “Trajectory privacy in location-based services and data publication,” ACM SIGKDD Explorations Newsletter, vol. 13, no. 1, pp. 19–29, 2011. [15] L. Song, D. Kotz, R. Jain, and X. He, “Evaluating next-cell predictors with extensive wi-fi mobility data,” IEEE Transactions on Mobile Computing, vol. 5, no. 12, pp. 1633–1649, 2006. [16] A. Y. Xue, R. Zhang, Y. Zheng, X. Xie, J. Huang, and Z. Xu, “Destination prediction by sub-trajectory synthesis and privacy protection against such prediction,” in Proceedings of the 2013 IEEE International Conference on Data Engineering (ICDE’13), 2013, pp. 254–265. [17] K. Minami and N. Borisov, “Protecting location privacy against inference attacks,” in Proceedings of the 9th Annual ACM Workshop on Privacy in the Electronic Society (WPES’10), 2010, pp. 123–126. [18] R. Shokri, G. Theodorakopoulos, G. Danezis, J.-P. Hubaux, and J.Y. L. Boudec, “Quantifying location privacy: The case of sporadic location exposure,” in Proceedings of the 11th International Conference on Privacy Enhancing Technologies (PETS’11), 2011, pp. 57–76. [19] S. Gambs, M.-O. Killijian, and M. N´un˜ ez del Prado Cortez, “Deanonymization attack on geolocated data,” in Proceedings of the 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom’13), 2013, pp. 789–797. [20] Y. D. Mulder, G. Danezis, L. Batina, and B. Preneel, “Identification via location-profiling in GSM networks,” in Proceedings of the 7th ACM Workshop on Privacy in the Electronic Society (WPES’08), 2008, pp. 23–32. [21] R. Shokri, G. Theodorakopoulos, J.-Y. L. Boudec, and J.-P. Hubaux, “Quantifying location privacy,” in Proceedings of the 2011 IEEE Symposium on Security and Privacy (S&P’11), 2011, pp. 247–262. [22] C. Bishop, Pattern Recognition and Machine Learning. Springer, 2006. [23] Y. Zheng, X. Xie, and W.-Y. Ma, “GeoLife: A collaborative social networking service among user, location and trajectory,” IEEE Data Engineering Bulletin, vol. 32, no. 2, pp. 32–40, 2010. [24] E. Cho, S. A. Myers, and J. Leskovec, “Friendship and mobility: User movement in location-based social networks,” in Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’11), 2011, pp. 1082–1090. [25] M. Piorkowski, N. Sarafijanovic-Djukic, and M. Grossglauser, “CRAWDAD dataset epfl/mobility (v. 2009-02-24),” http://crawdad.org/epfl/mobility/20090224, 2009. [26] L. Bracciale, M. Bonola, P. Loreti, G. Bianchi, R. Amici, and A. Rabuffi, “CRAWDAD dataset roma/taxi (v. 2014-07-17),” http://crawdad.org/roma/taxi/20140717, 2014. [27] T. G. Kolda and B. W. Bader, “Tensor decompositions and applications,” SIAM Review, vol. 51, no. 3, pp. 455–500, 2009. [28] S. Rendle and L. Shmidt-Thieme, “Pairwise interaction tensor factorization for personalized tag recommendation,” in Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM’10), 2010, pp. 81–90. [29] S. Rendle, C. Freudenthaler, and L. Shmidt-Thieme, “Factorizing personalized markov chains for next-basket recommendation,” in Proceedings of the 19th International Conference on World Wide Web (WWW’10), 2010, pp. 811–820. [30] A. Cichocki, R. Zdunek, A. H. Phan, and S. Amari, Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation. Wiley, 2009. [31] J. Kim, “Nonnegative matrix and tensor factorization, least squares problems, and applications,” Ph.D. dissertation, Georgia Institute of Technology, 2011. [32] Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques for recommender systems,” IEEE Computer, vol. 42, no. 8, pp. 30–37, 2009.

14

[33] T. Murakami, A. Kanemura, and H. Hino, “Group sparsity tensor factorization for de-anonymization of mobility traces,” in Proceedings of the 14th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom’15), 2015 (to appear). [34] M. Gruteser and D. Grunwald, “Anonymous usage of location-based services through spatial and temporal cloaking,” in Proceedings of the 1st International Conference on Mobile Systems, Applications and Services (MobiSys’03), 2003, pp. 31–42. [35] C. Bettini, X. S. Wang, and S. Jajodia, “Protecting privacy against location-based personal identification,” in Proceedings of the 2nd VLDB Workshop on Secure Data Management (SDM’05), 2005, pp. 185–199. [36] B. Gedik and L. Liu, “Protecting location privacy with personalized kanonymity: Architecture and algorithms,” IEEE Transactions on Mobile Computing, vol. 7, no. 1, pp. 1–18, 2008. [37] R. Shokri, G. Theodorakopoulos, C. Troncoso, J.-P. Hubaux, and J.Y. L. Boudec, “Protecting location privacy: Optimal strategy against localization attacks,” Proceedings of the 2012 ACM Conference on Computer and Communications Security (CCS’12), pp. 617–627, 2012. [38] N. E. Bordenabe, K. Chatzikokolakis, and C. Palamidessi, “Optimal geo-indistinguishable mechanisms for location privacy,” in Proceedings of the 21st ACM Conference on Computer and Communications Security (CCS’14), 2014. [39] M. E. Andr´es, N. E. Bordenabe, K. Chatzikokolakis, and C. Palamidessi, “Geo-indistinguishability: Differential privacy for location-based systems,” in Proceedings of the 20th ACM Conference on Computer and Communications Security (CCS’13), 2013, pp. 901–914. [40] M. Herrmann, C. Troncoso, C. Diaz, and B. Preneel, “Optimal sporadic location privacy preserving systems in presence of bandwidth constraints,” in Proceedings of the 12th ACM Workshop on Privacy in the Electronic Society (WPES’13), 2013, pp. 167–178. [41] M. Xue, P. Kalnis, and H. K. Pung, “Location diversity: Enhanced privacy protection in location based services,” in Proceedings of the 4th International Symposium on Location and Context Awareness (LoCA’09), vol. 5561, 2009, pp. 70–87. [42] B. Hoh, M. Gruteser, H. Xiong, and A. Alrabady, “Achieving guaranteed anonymity in GPS traces via uncertainty-aware path cloaking,” IEEE Transactions on Mobile Computing, vol. 9, no. 8, pp. 1089–1107, 2010. [43] L. R. Rabiner, “A tutorial on hidden markov models and selected applications in speech recognition,” in Proceedings of IEEE, vol. 77, no. 2, 1989, pp. 257–286. [44] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification. WileyInterscience, 2000. [45] R. Albright, J. Cox, D. Duling, A. N. Langville, and C. D. Meyer, “Algorithms, initializations, and convergence for the nonnegative matrix factorization,” SAS Technical Report, pp. 1–18, 2014. [46] A. R. Beresford and F. Stajano, “Location privacy in pervasive computing,” IEEE Pervasive Computing, vol. 2, no. 1, pp. 46–55, 2003. [47] J. Freudiger, R. Shokri, and J.-P. Hubaux, “On the optimal placement of mix zones,” in Proceedings of the 9th Privacy Enhancing Technologies Symposium (PETS’09), 2009, pp. 216–234. [48] H. Kido, Y. Yanagisawa, and T. Satoh, “An anonymous communication technique using dummies for location-based services,” Proceedings of the 2005 IEEE International Conference on Pervasive Services (ICPS’05), pp. 88–97, 2005. [49] T.-H. You, W.-C. Peng, and W.-C. Lee, “Protecting moving trajectories with dummies,” in Proceedings of the 2007 International Conference on Mobile Data Management (MDM’07), 2007, pp. 278–282. [50] R. Chow and P. Golle, “Faking contextual data for fun, profit, and privacy,” in Proceedings of the 8th ACM Workshop on Privacy in the Electronic Society (WPES’09), 2009, pp. 105–108. [51] B. Hoh and M. Gruteser, “Protecting location privacy through path confusion,” in Proceedings of the 1st International Conference on Security and Privacy for Emerging Areas in Communications Networks (SECURECOMM’05), 2005, pp. 194–205. [52] A. Machanavajjhala, D. Kifer, J. Abowd, J. Gehrke, and L. Vilhuber, “Privacy: Theory meets practice on the map,” in Proceedings of the IEEE 24th International Conference on Data Engineering (ICDE’08), 2008, pp. 277–286. [53] S.-S. Ho and S. Ruan, “Differential privacy for location pattern mining,” in Proceedings of the 4th ACM SIGSPATIAL International Workshop on Security and Privacy in GIS and LBS (SPRINGL’11), 2011, pp. 17–24. [54] R. Dewri, “Local differential perturbations: Location privacy under approximate knowledge attackers,” IEEE Transactions on Mobile Computing, vol. 12, no. 12, pp. 2360–2372, 2013. [55] R. Shokri, J. Freudiger, M. Jadliwala, and J.-P. Hubaux, “A distortionbased metric for location privacy,” in Proceedings of the 8th ACM

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 11, NO. 8, AUGUST 2016

Workshop on Privacy in the Electronic Society (WPES’09), 2009, pp. 21–30.

Takao Murakami is a researcher of Information Technology Research Institute (ITRI), National Institute of Advanced Industrial Science and Technology (AIST). He received the BS degree, the MS degree, and Ph.D degree from the University of Tokyo in 2004, 2006, and 2014, respectively. He joined Hitachi, Ltd. in 2006, and AIST in 2014. His research interests include location privacy, biometrics, and information security. He received the Yamashita SIG Research Award from the Information Processing Society of Japan (IPSJ) in 2011, and the IEEE TrustCom’15 Best Paper Award in 2015. He also received the Dean’s Award of Graduate School of Information Science and Technology, the University of Tokyo, for his Ph.D thesis in 2014. He is a member of IEICE, IPSJ and IEEE.

Hajime Watanabe is a chief senior researcher of Information Technology Research Institute (ITRI), National Institute of Advanced Industrial Science and Technology (AIST). He received B.E., M.E. and Ph.D. degrees from Osaka University in 1991, 1993 and 1997, respectively. From 1994 to 1999, he was a research associate of NAIST, Japan. In 1999, he joined ETL, Japan (was reorganized into AIST in 2001). His research interests include cryptography and information security. He is a member of IEICE and IPSJ.

15