ANONYMOUS SUBJECT IDENTIFICATION IN ... - Semantic Scholar

2 downloads 0 Views 240KB Size Report
Ying Luo, Shuiming Ye, and Sen-ching S. Cheung. Center for Visualization and Virtual Environments. University of Kentucky, Lexington KY 40507. ABSTRACT.
ANONYMOUS SUBJECT IDENTIFICATION IN PRIVACY-AWARE VIDEO SURVEILLANCE Ying Luo, Shuiming Ye, and Sen-ching S. Cheung Center for Visualization and Virtual Environments University of Kentucky, Lexington KY 40507 ABSTRACT The widespread deployment of surveillance cameras has raised serious privacy concerns. Many privacy-enhancing schemes have been recently proposed to identify selected individuals and redact their images in the surveillance video. To identify individuals, the best known approach is to use biometric signals as they are immutable and highly discriminative. If misused, these characteristics of biometrics can seriously defeat the goal of privacy protection. In this paper, we propose an anonymous subject identification system based on homomorphic encryption (HE). It matches the biometric signals in encrypted domain to provide anonymity to users. To make the HE-based protocols computationally scalable, we propose a complexity-privacy tradeoff called k-Anonymous Quantization (kAQ) which narrows the plaintext search to a small cell before running the intensive encrypted-domain processing within the cell. We validate a key assumption in kAQ that privacy is better preserved by grouping biometric patterns far apart into the same cell. We also improve the matching success rate by replacing the original bounding boxes with -balls as basic units for grouping. Experimental results on a public iris biometric database demonstrate the validity of our framework. Keywords— Anonymous Subject Identification, Privacy Protection, Video Surveillance, k-Anonymous Quantization 1. INTRODUCTION In recent years, surveillance cameras have been widely used for preventing theft, collecting population data, and combating terrorism. The ubiquitous surveillance cameras raise significant concerns on video surveillance systems’ invasion of people privacy. The American Civil Liberties Union (ACLU) has published a report to seriously condemn the surveillance systems’ assault on public’s privacy [1]. According to ACLU’s survey of surveillance cameras in Manhattan, almost every street corner in Manhattan is monitored by surveillance cameras. To mitigate the public’s concern, many recent works have been proposed for privacy protection in surveillance systems. Some of the systems protect the individuals’ privacies by replacing selective objects with black boxes, large pixels or other generic figures [2, 3, 4], whereas others completely remove the objects and then inpaint the removed objects with background and other foreground objects [5, 6].

All of these existing works require a mechanism to identify the subjects for privacy protection. There are two approaches to subject identification in the existing systems: one is to use special markers such as yellow hard-hats [4], visual tags [6], or RFID [5]; the other relies on biometric signals such as faces [3], or skin tones [2]. Both approaches have their shortcomings. The first approach requires the protected subjects to carry a special marker. If the marker is accidentally dropped, the subjects will lose privacy protection from the system. If the marker is maliciously embezzled by unauthorized individuals, the system will protect the potential intruders and the security of the environment will be severely compromised. On the other hand, the second approach uses biometric signals which excel in authenticating the subject’s identity since they are based on who the subjects are. However, the use of biometric signals poses a direct threat to privacy – it is difficult to keep the subjects anonymous. Once the system itself is compromised, the biometric signals can be used by hackers to invade other systems that use the same biometric signals for subject identification. To take advantage of the superior performance of biometric technologies, it is imperative to strengthen the security of the surveillance systems to protect the anonymity of the subjects. In this paper, we propose an anonymous subject identification scheme that allows the surveillance system to determine the privacy protection status of an individual based on his/her iris patterns. At the same time, the surveillance system has no access to the iris pattern probe but only has access to the aggregate result of the matching between the probe and each entry from the database. As a result, the true identity of the individual will be protected from the surveillance system. Our proposed scheme is based on the Anonymous Biometric Access Control (ABAC) system described in [7]. The ABAC system ensures anonymity by encrypting the probe pattern with a Homomorphic Encryption (HE) system, and carries out the matching process entirely within the encrypted domain. To make encrypted domain processing scalable to large database, a k-anonymous grouping scheme called the kAnonymous Quantization (kAQ) was proposed in [7] to group the database into smaller cells within which the encrypted processing is performed. In this paper, we improve upon the original kAQ scheme in two aspects: 1. We validate a key assumption used in the kAQ scheme that privacy is better preserved by grouping iris patterns that are far apart from each other into the same cell.

2. We improve the matching success rate by replacing a key structure used in kAQ, the bounding box grouping all training patterns of an individual, with a -ball whose radius  is determined based on global statistics of the training patterns. This paper is organized as follows. Section 2 describes the framework of anonymous subject identification. It also reviews the key innovations in ABAC and discusses its shortcomings that are addressed in this paper. Section 3 provides experimental evidence to validate the key assumption between privacy and similarity used in the original kAQ framework. Section 4 provides experimental results to show the improvement of using -balls over bounding boxes in the kAQ framework. Finally, we conclude this paper with prospect for the future work in Section 5. 2. ANONYMOUS SUBJECT IDENTIFICATION Figure 1 describes our proposed system which consists of two main components: the Biometric Recognition Terminal (BRT) and the Video Surveillance System (VSS). A BRT is installed outside each entrance of the surveillance area. It is used to capture the biometric signal, iris pattern in our implementation, of every individual entering the area. The VSS has a database that contains the biometric signals of all authorized subjects whose identities require protection. Once the signal is captured, the BRT engages the VSS in a specially designed Secure Multi-party Computational (SMC) protocol to determine if the incoming subject is an authorized individual. If so, the VSS will activate the privacy protection mechanism and obfuscate the appearance of this subject after he/she enters the area. If multiple protection schemes are offered, each scheme will have a separate database. An example of our prototype system is shown in Figure 2. It is assumed that the privacy protection status of this subject will be maintained throughout the entire duration the subject is in the area. This can be accomplished, for example, by applying visual object tracking and identification.

Fig. 1. The Framework of Anonymous Subject Identification

Fig. 2. Different obfuscation for different privacy levels when the subjects enter into the surveillance environment

The focus of this paper is on the design of the SMC protocol for matching biometric signals. From a functional point of view, this protocol should provide the following guarantees: 1. The protocol returns a decision bit to the VSS on whether the biometric signal of the incoming subject matches any entries in VSS’s database; 2. No identity information of the subject is provided to the VSS; 3. No database information is provided to the BRT; 4. The communication between the VSS and each BRT is over an open network. The first and second guarantees define the anonymous subject identification process – the VSS can reliably authenticate the privacy protection status of an incoming individual using biometric signals without knowing the actual identity. As the BRT is installed outside the surveillance area, it is prone to outsider’s attacks and thus should not possess any sensitive biometric signals as indicated in the third guarantee. To allow many BRTs to be used at all the entrances, the fourth guarantee implies that sensitive information are encrypted and can be transmitted via an open network without worrying about eavesdropper. As mentioned in Section 1, our implementation of this protocol is based on the ABAC system in [7]. The security model used in the ABAC system assumes that the biometric server follows a semi-honest model such that it faithfully follows the protocol but attempts to recover the identity of the subject based on the communication. The probe terminal will only engage in malicious activities that can increase the probability of a positive match. This model is also appropriate for anonymous subject identification in surveillance systems and will be used throughout this paper. The ABAC system in [7] consists of two main contributions: (1) an additive homomorphic encryption based ABAC to support iris-pattern matching in encrypted domain and (2)

C∈Γ

x,y∈C

x and y are patterns in the feature space and they belong

0.2

Count percentage (%)

a k-Anonymous Quantization (kAQ) scheme to enhance computational scalability of encrypted-domain processing by relaxing privacy requirements. We provide a brief review of these two components but refer readers to [7] for details. In the first contribution, the ABAC system can identity any iris pattern xi in the database close to the probe pattern q based on a modified hamming distance dH (q, xi ) and return a single decision bit to the biometric server. It is assumed that there exists a similarity threshold  such that dH (q, xi ) <  happens if and only if q and xi are from the same individual. To ensure anonymity, the probe terminal encrypts q with Paillier encryption, a HE system that preserves addition over a large plaintext field, using a pair of public and secret keys chosen by the terminal. The public key will be shared with the biometric server. Based on this public key, the server and the terminal enters an interactive protocol to compute the hamming distance between the probe and every pattern in the database, accumulate all the matching results in a decision bit and reveal this bit to both parties. Functionally, the first component of the ABAC system provides all the four guarantees required by our anonymous identification. The problem is that the HE-based implementation is too complex to be deployed in any realistic application. In [7], it was reported that matching against a database of 10,000 iris patterns takes more than 11 hours on a power PC. To improve the scalability, the kAQ algorithm is proposed to group the entire database into cells where each cell contains roughly k entries. A constant-time SMC protocol is used to map the binary probe pattern into an Euclidean space via Fastmap and PCA, to quantize the resulting feature vector and to identify the cell to which the feature vector belongs. The selection of the specific cell is communicated to the biometric server. The HE-based biometric matching process is then applied only to the k entries within the selected cell and significant complexity savings can be achieved. Similar to other k-anonymous scheme, kAQ attempts to put patterns from the same individual into the same cell. The approach used in [7] is to collect multiple training patterns from an individual and use the smallest bounding box containing these training patterns to represent that particular individual. The entire bounding box will then be included in the cell. Our experiments in this paper show that this bounding box approach produces a relatively low recognition rate in the sense that a testing probe from the same individual has a high probability of falling outside the bounding box. Instead of bounding boxes, we propose using balls whose radii are determined by the training patterns. In Section 4, we will provide experimental results demonstrating the superiority of balls over bounding boxes. Unlike other k-anonymous schemes, kAQ utilizes a greedy algorithm to group patterns that are maximally apart from one another into the same cell. Specifically, given a collection of cells Γ, the greedy algorithm is designed to maximize the following utility function:  d(x, y)2 (1) min

Non−twins twins 0.15

0.1

0.05

0 0

0.2

0.4

0.6

0.8

1

Hamming Distance

Fig. 3. Distribution of IrisCode Hamming Distances

to the same cell C. d(·, ·) is the distance used in the feature space. The reasoning behind this strategy, as argued in [7], is that grouping patterns close to each other in the same cell may reveal important privacy information to the biometric server. This argument is certainly conceivable for some biometric signals such as face images as family members may share similar facial features [8]. Nevertheless, this argument is never demonstrated when highly discriminative biometric signals such as iris patterns are used. In Section 3, we show that iris patterns between twins are indeed closer to each other than those from unrelated individuals and thus provide conclusive evidence to support the usefulness of maximizing the utility function in a k-anonymous scheme. 3. PRIVACY AND BIOMETRIC SIMILARITY To demonstrate the usefulness of maximizing the utility function as defined in (1), one needs to show that individuals who are blood-related tend to have biometric patterns that are closer to each other than unrelated individuals. To the best of our knowledge, there is no family-based collection of biometric test data except for those among twins. Experimental results show that fingerprints and palmprints from twins have some inherent correlation and are more similar to each other than those from random individuals [9, 10]. In this section, we test a similar hypothesis based on the twin iris dataset provided by CASIA [11] – that is, the modified hamming distances among twins are smaller than those of non-twins. There are 3183 iris images from 100 pairs of twins in CASIA’s twins’ iris database. We extract all twins’ left iris images for comparison. Based on the Matlab feature extraction code from [12], we obtain 1118 accurate IrisCodes and then calculate 3351 Hamming Distances (HDs) between twins and 617631 HDs between non-twins. Figure 3 shows the distribution of these two types of HDs. While the two distributions look quite similar, testing our hypothesis requires a more rigorous testing procedure. As HDs are between 0 and 1 and are clearly non-Gaussian, we utilize the distribution-free Wilcoxon Rank-Sum Test between these two samples [13, Ch.15]. We label the sample from twins’ HDs as X and the sample from non-twins’ HDs

as Y . Let u1 and u2 be the averages of X and Y respectively, and m and n be the total number of samples from X and from Y . To make the size of the two samples comparable, 3351 random samples are randomly selected from Y so that m = n = 3351. The null hypotheses is H0 : u1 − u2 = 0 and the alternative hypothesis is Ha : u1 − u2 < 0. When we pool the samples from X and from Y into a combined sample of size m + n, these observations are sorted from smallest (rank 1) to largest (rank m + n). We then consider the sum of ranksof all samples from X as our test statistic m W , i.e. W = i=1 Ri where Ri is the rank for the i-th sample of X. The test procedure is one-tailed since, for small W value, H0 would be rejected in favor of Ha . Due to the large sample size, the distribution of W can be approximated 2 ) if H0 is true where by Gaussian(μW , σW m(m + n + 1) = 11.2 × 106 2

(2)

mn(m + n + 1) = 6.27 × 109 12

(3)

μW = and 2 = σW

Our data shows that the measured W = 9.78 × 106 . Thus, the P-value of the null hypothesis in our one-sided test can be calculated as follows: P-value

= P rob(W ≤ 9.78 × 106 )   W − μW ≈ Φ σW =

6.17 × 10−75

where Φ(·) is the cumulative distribution function of a standard normal random variable. The small P-value strongly suggests the rejection of the null hypothesis and suggests the alternative hypothesis. In other words, the HDs between twins are indeed smaller than the HDs between non-twins. This demonstrates the validity of the assumption used in kAQ that grouping iris patterns closer to each other in the same cell may leak important identity information as, at the very least, twins are more likely to be grouped together. 4. -BALL VERSUS BOUNDING BOX Due to the noise in the capturing process, iris patterns of the same person captured at different time by different machines will have small variations among them. In [7], the binary iris pattern is first projected to a lower-dimensional Euclidean feature space via Fastmap and PCA. The resulting feature vector is then quantized using a uniform lattice quantizer into a quantization bin index. The Fastmap and PCA are used to approximate the modified Hamming distance used in the original space. The quantization can group patterns that are very close together into the same bin. This procedure alone, however, is usually inadequate to cope with all the variations among patterns from the same individual. A second-level structure, which we call neighborhood, is needed to group together all the bins that can possibly contain patterns from

the same individual. Assuming that multiple training patterns are available for each individual, it is natural to estimate the neighborhood structure using the training data. There are three fundamental requirements of the design of the neighborhood structure pertinent to the overall kAQ scheme: Recognition: The probability that an unseen test pattern from an individual falls inside the neighborhood of the same individual must be very high. This ensures that the probability of failing to provide the appropriate privacy protection to an authenticated individual is negligible. Overlap among neighborhoods: As mentioned in Section 2, the motivation of using kAQ is to restrict the encrypted-domain processing to a small group of patterns called cells. Cells are essentially a third-level structure over bins – they group neighborhoods of diverse individuals together. In the high dimensional feature space, neighborhoods of different individuals usually overlap. If an unseen test pattern falls in an overlapping region and the corresponding neighborhoods are in different cells, multiple cells will be selected for the subsequent step of encrypted-domain processing, thereby increasing the search complexity. As such, it is imperative to minimize the amount of overlap among neighborhoods. Ease of Computation: There are two aspects to this requirement. First, it is beneficial to have a simple computational procedure, such as a bounding box or an -ball, to characterize a neighborhood. Second, the kAQ relies on a public scrambled table that maps the bin indices in all the neighborhoods to the corresponding cell indices. This table is essential so that the BRT can lookup the appropriate cell index and inform the VSS without divulging any details about the bin index that contains the probe. This table needs to be of manageable size, which depends exponentially on the dimension of the feature space. In [7], the neighborhood is chosen based on the bounding box of the bins containing the training patterns of an individual. In this section, we want to experimentally demonstrate that -balls provide better performance than bounding boxes in terms of the above criteria. An -ball of an individual contains all the bins whose centroids are within  from the centroid of the training patterns. A ball is simply defined to be the smallest -ball that contains all the training patterns. Here we evaluate the effectiveness of four neighborhood structures: (1) -ball with a constant  equal to the maximum radius of all the balls; (2) -ball with a constant  equal to the average radius of all balls plus one standard deviation; (3) -ball with the actual radius of each ball, and (4) bounding box. For each structure, we test under two different feature space dimensions m = 10 and m = 20, based on the same procedure as in [7], and four different quantization levels in each dimension: 2 bins, 4 bins and 8 bins. A subset of the CASIA iris database is used in our experiment. This subset

Complexity

1500

ε−ball with a statistical radius ε−ball with the acutal radius Bounding Box

1000

500

0 0

1

2

3

4

5

Utility

6 5

x 10

Fig. 4. Complexity versus utility 100

Recognition Rate (%)

consists of 160 individuals with 1948 patterns. For each configuration, we withhold one random sample for testing and use the remaining ones to build the neighborhood structure. For each test pattern, we measure the number of neighborhoods (Overlap number) that contain this pattern and whether the correct neighborhood is included (Recognition rate). The average Overlap numbers and Recognition rates over all the test patterns are reported in Table 1. From the results in Table 1, we first notice that for dimension m = 10, the overlap numbers are much larger than those of dimension m = 20, but with the similar recognition rates. This can be explained by the fact that a lower-dimensional feature space introduces much distortion and cannot approximate the original hamming distance well. The lower-dimensional feature space will induce a higher search complexity and as a consequence, we will focus only on m = 20 dimension. Second, the -ball with maximum radius produces an almost perfect recognition rate but at a cost of significantly higher overlap numbers than the rest of the schemes. The remaining three schemes produce similar overlap numbers with -ball with statistical radius leading the recognition rate at the range between 91.25% to 96.88%. The worst is the bounding box with recognition rates between 48.75% to 75.63%. While these recognition rates are certainly respectable in a recognition task, they might not be acceptable when used in our application of protecting individuals’ privacy. If the noises introduced into different test patterns are independent, the recognition rate can be improved by using multiple test patterns. On the other hand, this may adversely affect the overlap number as each test patterns introduce different sets of neighborhoods. To test this idea, we withhold an additional sample from each individual and redo the experiments for m = 20. The results are shown in Table 2. The overlap numbers increase but are not significant due to the large standard deviations. The recognition rates for all configurations of the -ball with statistical radius are above 98% but the best of the bounding box is only 86%. The bounding box, however, has a smaller overlap number and the lead over -ball increases with the number of bins per dimension. Due to the high feature space dimension of m = 20, we have problems using more than two bins per dimension – the size of the public table that maps the bin index and the cell index is 769,653 for the -ball with statistical radius, 686,867 for -ball with different  and 622,900 for bounding box. Moving from two to four bins will raise these numbers by an order of 220 which is impractical in most situations. Therefore, we concentrate on the cases of two bins per dimension. The previous experiments focus only on the relationship between bins and neighborhoods without the actual cell structure. The overlap numbers are upper-bounds to the actual complexity of kAQ because the overlapped neighborhoods can be mapped to the same cell and do not increase the encrypted-domain search complexity. For the top three neighborhood structures at dimension m = 20 with two bins per dimension, we run the same greedy algorithm introduced in [7] to compute the cell structures for different values of k ranging from 50 to 300. The parameter k defines the size of

95 90 85 80

ε−ball with a statistical radius ε−ball with the acutal radius Bounding Box

75 70 0

1

2

3

4

5

Utility

6 5

x 10

Fig. 5. Recognition versus utility each cell. Using two test patterns per individuals, we compute the average complexity as defined by the actual number of patterns contained in the return cells, the utility metric as defined in (1) which is a measurement of privacy and the recognition rate. Figure 4 shows the results of complexities at different utility levels. The tradeoff between complexity and utility is obvious. For the same level of utility, -ball with statistical radius has the highest average complexity and bounding box has the lowest as predicted by the overlap number measurements from previous experiments. The differences however are not significant due to the large standard deviation. On the other hand, Figure 5 shows that -ball with statistical radius has a much higher recognition rate, making it the best choice among all the tested schemes. 5. CONCLUSIONS In this paper, we propose the framework of anonymous subject identification on biometric signals in a privacy-aware video surveillance system. The memberships of the authorized subjects are verified using an anonymous biometric access control system before the subject enter into the surveillance area. While this anonymous biometric matching system has been previously proposed, we have made two significant improvements. First, we have validated a core assumption used in the key step of k-anonymous quantization – privacy is enhanced by grouping together iris patterns that are far apart in the same cell. Second, we have proposed a new neighborhood structure based on -ball that has a comparable complexity but better recognition rate than the bounding box approach used in the original scheme. Our experiments

Table 1. Bins’ overlap and recognition rate (%) in different dimensions (m) bin

2 Overlap number Mean Std

(1) -ball with a maximum radius (2) -ball with a statistical radius (3) -ball with different  (4) bounding box

18.63 2.91 2.27 2.08

6.83 1.79 1.43 1.29

(1) -ball with a maximum radius (2) -ball with a statistical radius (3) -ball with different  (4) bounding box

63.98 18.06 12.50 14.17

10.25 5.56 4.22 3.93

4 Recognition Overlap rate number (%) Mean Std m = 20 99.38 25.74 16.94 91.25 3.31 2.78 80.00 2.36 2.00 75.63 1.38 1.16 m = 10 99.38 51.48 26.82 92.50 18.80 12.38 80.00 11.13 7.50 83.75 9.76 6.69

8 Recognition rate (%)

Overlap number Mean Std

Recognition rate (%)

100 94.38 78.13 70.00

28.48 2.12 1.64 0.56

26.76 2.21 1.38 0.58

100 96.88 77.50 48.75

100 93.75 80.00 80.00

76.30 9.99 6.44 2.48

36.85 10.08 5.82 2.64

100 96.88 79.38 66.25

Table 2. Bins’ overlap and recognition rate (%) with 2 - test patterns bin

2 Overlap number Mean Std

(2) -ball with a statistical radius (3) -ball with different  (4) bounding box

3.88 2.95 2.54

2.20 1.61 1.31

4 Recognition Overlap rate number (%) Mean Std m = 20 98.13 4.24 3.28 86.25 2.90 2.01 86.25 1.69 1.13

indicate that both the recognition rate and the complexity decrease when the dimension of the feature space increases. Unfortunately, such an increase proves to be prohibitive as the kAQ scheme requires the use of a public table that grows exponentially with the dimension. A very interesting research direction is to investigate if such a table can be eliminated through a more structured approach in mapping bins to cells. To further test the generality of our proposed framework, we are currently extending our system to other types of biometric signals. 6. REFERENCES [1] J. Stanley and B. Steinhardt, “Bigger Monster, Weaker Chains: The Growth of an American Surveillance Society,” New York: ACLU, 2003. [2] A. M. Berger, US Patent 6,067,399: Privacy mode for acquisition cameras and camcorders, Sony Corporation, May 23 2000. [3] E. N. Newton, L. Sweeney, and B. Main, “Preserving privacy by de-identifying face images,” IEEE transactions on Knowledge and Data Engineering, vol. 17, no. 2, pp. 232–243, February 2005. [4] J. Schiff, M. Meingast, D. Mulligan, S. Sastry, and K. Goldberg, “Respectful cameras: Detecting visual markers in realtime to address privacy concerns,” in International Conference on Intelligent Robots and Systems (IROS). Springer, 2007, pp. 971–978. [5] J. Wickramasuriya, M. Datt, S. Mehrotra, and N. Venkatasubramanian, “Privacy protecting data collection in media spaces,”

8 Recognition rate (%) 97.50 86.25 81.88

Overlap number Mean Std 2.55 2.09 0.70

2.46 1.74 0.56

Recognition rate (%) 98.75 86.25 61.88

in ACM International Conference on Multimedia, New York, NY, Oct. 2004. [6] J. Zhao and S. Cheung, “Multi-camera surveillance with visual tagging and generic camera placement,” in Proceedings of ACM/IEEE International Conference on Distributed Smart Cameras, 2007. [7] S. Ye, Y. Luo, J. Zhao, and S. Cheung, “Anonymous Biometric Access Control,” EURASIP Journal on Information Security, vol. 2009, Article ID 865259, 17 pages, 2009. [8] J. Wilmer et al., “Human face recognition ability is specific and highly heritable,” Proceedings of the National Academy of Sciences, 2010. [9] A. Jain, S. Prabhakar, and S. Pankanti, “On the similarity of identical twin fingerprints,” Pattern Recognition, vol. 35, no. 11, pp. 2653–2663, 2002. [10] A. Kong, D. Zhang, and G. Lu, “A study of identical twins’ palmprints for personal verification,” Pattern Recognition, vol. 39, no. 11, pp. 2149–2156, 2006. [11] T. Tan and Z. Sun, “Casia-irisv3,” Chinese Academy of Sciences Institute of Automation, http://www.cbsr.ia.ac.cn/IrisDatabase.htm, Tech. Rep., 2005. [12] L. Masek and P. Kovesi, “Matlab source code for a biometric identification system based on iris patterns,” The School of Computer Science and Software Engineering, The University of Western Australia, Tech. Rep., 2003. [13] J. Devore, “Probability and Statistics for Engineering and the Science, Brooks/Cole Pub,” Co., Monterey, California, vol. 704, 1991.