Differentially Private Matrix Completion Revisited

0 downloads 0 Views 920KB Size Report
Jun 12, 2018 - users m is ω(n5/4), where n is the number of items, and each user provides ... We answer the following question in the affirmative: Can we design a ... Accepted for presentation at International Conference on Machine Learning (ICML) 2018. .... The main goal is to study the power of the simple SVD-based ...
Differentially Private Matrix Completion, Revisited Prateek Jain∗

Om Thakkar†

Abhradeep Thakurta‡

arXiv:1712.09765v1 [cs.LG] 28 Dec 2017

December 29, 2017

Abstract We study the problem of privacy-preserving collaborative filtering where the objective is to reconstruct the entire users-items preference matrix using a few observed preferences of users for some of the items. Furthermore, the collaborative filtering algorithm should reconstruct the preference matrix while preserving the privacy of each user. We study this problem in the setting of joint differential privacy where each user computes her own preferences for all the items, without violating privacy of other users’ preferences. We provide the first provably differentially private algorithm with formal utility guarantees for this problem. Our algorithm is based on the Frank-Wolfe (FW) method, and consistently estimates the underlying preference matrix as long as the number of users √ m is ω(n5/4 ), where n is the number of items, and each user provides her preference for at least n randomly selected items. We also empirically evaluate our FW-based algorithm on a suite of datasets, and show that our method provides nearly same accuracy as the state-of-the-art non-private algorithm, and outperforms the state-of-the-art private algorithm by as much as 30%.

1

Introduction

Collaborative filtering (or matrix completion) is a popular approach for modeling the recommendation system problem, where the goal is to provide personalized recommendations about certain items to a user [32]. In other words, the objective of a personalized recommendation system is to learn the entire users-items preference matrix Y ∗ ∈ A(t) A(t) + E v b = v b−v

A v 2 2  √ b2 + O σ log(n/β) n w.p. ≥ 1 − β. ≤λ (14) q 



b+O b ≤ λ ⇒ A(t) v σ log(n/β) n w.p. ≥ 1 − β. (15) 2

The inequality in (14) follows from the spectral norm bound on the Gaussian matrix E drawn i.i.d. from N 0, σ 2 . (See Corollary 2.3.5 in [43] for a proof). Inequality (15) completes the proof. Lemma C.3 (Follows from Theorem 3 of [15]). Let A ∈ A + E,  b be the top right where E ∼ N 0, Ip×p σ 2 . Let v be the top right singular vector of A, and let v singular vector of W . The following is true with probability at least 1 − β: √  kAb vk22 ≥ kAvk22 − O σ log(n/β) n .

Now, one can compactly write the update equation of Y (t) in function Alocal of Algorithm 1 for all the users as:    1 k > (t) (t−1) bv b Y − u Y ← ΠL,Ω 1− , (16) T T b corresponds to the set of entries u where u bi in function Alocal represented as a vector. Also, by Lemma C.2, we can conclude that kb uk2 ≤ 1. Hence, Y (t) is in the set {Y : kY knuc ≤ k} for all t ∈ [T ]. In the following, we incorporate the noisy estimation in the analysis of original Frank-Wolfe (stated in Section A). In order to do so, we need to ensure a couple of properties: i) We need to obtain an appropriate bound on the slack parameter γ in Algorithm 2, and ii) we need to ensure that the projection operator ΠL,Ω in function Alocal does not introduce additional error. We do this via Lemma C.4 and C.5 respectively. Lemma C.4. For the noise variance σ used in function Aglobal of Algorithm 1, w.p. at least 1− the slack parameter γ in the linear optimization step of Frank-Wolfe algorithm is at most  β, p √  k O |Ω| σ log(n/β) n . b2 corresponds to the maximum eigenvalue of W (t) , and notice that A(t) is the Proof. Recall that λ  1 scaled gradient of the loss function 2|Ω| kPΩ (Θ − Y ∗ )k2F at Θ = ΠL,Ω Y (t) . Essentially, we need D D D E E E 1 1 1 b > . Let α = |Ω| to compute the difference between |Ω| A(t) , kuv> and |Ω| A(t) , kb uv A(t) , kuv> ,

23

and α b=

D

1 (t) uv b> |Ω| A , kb

E

. Now, we have the following w.p. at least 1 − β:

2 > > b 2 k A(t) v b b kb v> A(t) A(t) v kb v> A(t) u  p  p = α b= √  = √  |Ω| b+Θ b+Θ |Ω| λ σ log(n/β) n |Ω| λ σ log(n/β) n 

2 √  k A(t) v 2 − O (σ log(n/β) n)  p ≥ √  b+Θ |Ω| λ σ log(n/β) n  √  n) k |Ω|λ α − O (σ log(n/β) k  p = √  , b+Θ |Ω| λ σ log(n/β) n

(17)

>

where λ2 is the maximum eigenvalue of A(t) A(t) , the second equality follows from the definition b , and the inequality follows from Lemma C.3. One can rewrite (17) as: of u     √ λ kσ log(n/β) n    α + O   p p α−α b ≤ 1 −  √ √  . (18) b b σ log(n/β) n |Ω| λ + Θ σ log(n/β) n λ+Θ | {z } | {z } E1

E2

We will analyze E1 and E2 in (18) separately. One can write E1 in (18) as follows: p p     √  √  b+O b+O λ σ log(n/β) n − λ σ log(n/β) n −λ λ k α =  λ. (19)   p p E1 =   √  √  |Ω| b+Θ b+Θ σ log(n/β) n σ log(n/β) n λ λ for eigenvalues, and the fact that w.p. at least 1 − β, we have

By Weyl’s inequality

√ > (t)

(t) (t) A = O (σ log(n/β) n) because of spectral properties of random Gaussian ma W − A 2 p √  b σ log(n/β) n . Therefore, one trices (Corollary 2.3.5 in [43]), it follows that λ − λ = O  p √  k can conclude from (19) that E1 = O |Ω| σ log(n/β) n . Now, we will bound the term E2 in  p  b ≥ 0, it follows that E2 = O k σ log(n/β)√n . Therefore, the slack parameter (18). Since λ |Ω|  p √  k α−α ˆ = E1 + E2 = O |Ω| σ log(n/β) n . Lemma C.5. Define the operators PΩ and ΠL,Ω as described in function Alocal in Section 3. Let 1 f (Y ) = 2|Ω| kPΩ (Y − Y ∗ )k2F for any matrix Y ∈ , · · · , m> m (where mi corresponds to the i-th 1 P row of M ), kM k2F = kmi k22 . Let ΠL be the `2 projector onto a ball of radius L, and BnL be a i

ball of radius L in n-dimensions, centered at the origin. Then, for any pair of vectors, v1 ∈