Hash Function Learning via Codewords - 2015 ECML ... - UCF EECS

8 downloads 1948 Views 780KB Size Report
1Machine Learning Laboratory, University of Central Florida, US. 2ICE Laboratory, Florida Institute of Technology, US. September 09th, 2015 ...
Hash Function Learning via Codewords 2015 ECML/PKDD, Porto, Portugal, September 7–11, 2015.

Yinjie Huang 1 Michael Georgiopoulos 1 Georgios C. Anagnostopoulos 2 1

Machine Learning Laboratory, University of Central Florida, US 2

ICE Laboratory, Florida Institute of Technology, US

September 09th , 2015

Table of Contents 1

Introduction

2

Formulation

3

Algorithm

4

Experiments

5

Concentration Guarantees

6

Summary

7

References

8

Back Up Slides

Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

2 / 41

Introduction

Section 1 Introduction

Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

3 / 41

Introduction

What is Content-Based Image Retrieval?

Figure: Content-Based Image Retrieval (CBIR) [Datta et al., 2008] Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

4 / 41

Introduction

Challenges in CBIR

There are two main challenges in CBIR: Search complexity considerations Nearest neighbor search. In practical settings (large amount of data), exhaustively comparing the query with each sample in the database is impractical.

Storage space considerations Image features usually have hundreds or thousands of features. Storing all raw images in the database also poses a problem.

Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

5 / 41

Introduction

Hash Function Learning

Figure: Hashing based CBIR [Wang et al., 2012] Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

6 / 41

Introduction

Hash Function Learning

Design hash functions transforming the original data into compact binary codes. Hash functions to be learned aim to map similar data to similar hash codes. Benefits: Approximate nearest neighbors (ANN) search [Datta et al., 2008] using binary codes was shown to achieve sub-linear search time. Storage requirement advantages. For example, a 10 dimension real value vector needs 320 bits (for single precision), while the hash code (represent this vector) may need only 10 bits.

Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

7 / 41

Introduction

Data-dependent hashing

Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

8 / 41

Introduction

Contributions

One new hashing framework - *Supervised Hash Learning (*SHL): *SHL can naturally engage supervised, unsupervised and semi-supervised hash learning scenarios. *SHL considers a set of Hamming space codewords that are learned during training in order to capture the intrinsic similarities between the data. The minimization problem of *SHL naturally leads to a set of Support Vector Machine (SVM) problems, which can be efficiently solved by LIBSVM [Chang and Lin, 2011]. Theoretical insight about *SHL’s superior performance.

Results: We consider 5 benchmark datasets. Compared with 6 other state-of-art methods. The results show *SHL is highly competitive.

Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

9 / 41

Formulation

Section 2 Formulation

Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

10 / 41

Formulation

Formulation *SHL utilizes codewords µg , g ∈ NG . Each codeword is associated to a class.

Figure: Idea of *SHL

Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

11 / 41

Formulation

Formulation

By adjusting its parameters ω, *SHL attempts to reduce the distortion measure: X X   (1) min d h(xn ), µg E(ω) , d h(xn ), µln + n∈NL

n∈NU

g

d is the Hamming distance defined as d(h, h 0 ) ,

Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

P b

[hb 6= hb0 ].

September 09th , 2015

12 / 41

Formulation

Formulation

Hash code: h(x) , sgn f(x) ∈ HB for a sample x ∈ X. f(x) , [f1 (x) . . . fB (x)]T , where fb (x) , hwb , φ(x)iHb + βb with  wb ∈ Ωwb , wb ∈ Hb : kwb kHb 6 Rb , Rb > 0 and βb ∈ R for all b ∈ NB . Hb is a Reproducing Kernel Hilbert Space (RKHS) with inner product h·, ·iHb . Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

13 / 41

Algorithm

Section 3 Algorithm

Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

14 / 41

Algorithm

Algorithm

Two key observations: Bounds of hash codes: Hinge Hamming distances.

P P d (h(x), µ) = b [µb fb (x) < 0] 6 d¯ (f, µ) , b [1 − µb fb ]+ .  P P ¯ We have: E(ω) 6 E(ω) , g n γg,n d¯ f(xn ), µg .

Majorization-Minimization (MM): For parameter values and ω 0 , we have:  PωP 0 0 ¯ ¯ E(ω) 6 E(ω|ω ) , g n γg,n d¯ f(xn ), µg .

Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

15 / 41

Algorithm

Algorithm - First block minimization 0 ¯ Minimizing E(·|ω ), one obtains B independent, equivalent problems:

wb,m

inf , βb

C ,θb ,µg,b

XX g

+

h i 0 1 − µg,b fb (xn ) γg,n

+

n

2

wb,m

X 1 H

m

2

m

θb,m

b ∈ NB

(2)

By considering wb,m and βb for each b as a single block, it leads to the dual form:

sup αb ∈Ωab

αTb 1NG −

Huang & Georgiopoulos & Anagnostopoulos

1 T α Db [(1G 1TG ) ⊗ Kb ]Db αb b ∈ NB 2 b

Hash Function Learning via Codewords

September 09th , 2015

(3)

16 / 41

Algorithm

Algorithm - First block minimization SVM training problem - LIBSVM.

Figure: For each bit, one SVM problem.

Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

17 / 41

Algorithm

Algorithm - Second block minimization Second block minimization: inf wb,m ,βb , θb

C ,µg,b

XX g

0 [1 − µg,b fb (xn )]+ γg,n

n

+

2 1 X kwb,m kHm 2 m θb,m

b ∈ NB

(4)

MKL, closed form solution [Kloft et al., 2011]: 2

θb,m =  P

p+1 kwb,m kH m 2p p+1

m0

Huang & Georgiopoulos & Anagnostopoulos

 p1 ,

m ∈ NM , b ∈ NB .

(5)

kwb,m 0 kHm 0

Hash Function Learning via Codewords

September 09th , 2015

18 / 41

Algorithm

Algorithm - Third block minimization Third block minimization: inf wb,m ,βb ,θb ,

C µg,b

XX g

h i 0 1 − µg,b fb (xn ) γg,n

+

n

1 X kwb,m kHm 2 m θb,m 2

+

b ∈ NB

(6)

optimize over codewords by substitution: inf

µg,b ∈H

Huang & Georgiopoulos & Anagnostopoulos

X

γg,n [1 − µg,b fb (xn )]+

g ∈ N G , b ∈ NB

(7)

n

Hash Function Learning via Codewords

September 09th , 2015

19 / 41

Experiments

Section 4 Experiments

Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

20 / 41

Experiments

Supervised Hash Learning Results

Methods to compare with: Kernel Supervised Learning (KSH) [Liu et al., 2012]. Binary Reconstructive Embedding (BRE) [Kulis and Darrell, 2009]. Single-layer Anchor Graph Hashing (1-AGH) and its two-layer version (2-AGH) [Liu et al., 2011]. Spectral Hashing (SPH) [Weiss et al., 2008]. Locality-Sensitive Hashing (LSH) [Gionis et al., 1999].

Performance metric: Precision (retrieval accuracy). Precision - Recall (PR) curve.

Datasets: Pendigits, USPS, Mnist, PASCAL07, CIFAR-10.

Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

21 / 41

Experiments

Pendigits

Pendigits

Pendigits

Pendigits

1

1

1 0.9

0.7

0.6 0.5

Our KSH LSH SPH BRE 1−AGH 2−AGH

0.4 0.3

0.2

0.1

0

10

20

30

40

Number of Bits

0.8

0.7

Our KSH LSH SPH BRE 1−AGH 2−AGH

0.8 0.7

Precision

0.9 0.8

Top s Retrieval Precision

Top s Retrieval Precision (s=10)

0.9

0.6 0.5 0.4

Our KSH LSH SPH BRE 1−AGH 2−AGH

0.6 0.3 0.2 0.5 0.1

50

0.4 10

15

20

25

30

35

40

45

Number of Top s

50

0

0

0.2

0.4

0.6

0.8

1

Recall

Figure: The top s retrieval results and Precision-Recall curve on Pendigits dataset over *SHL and 6 other hashing algorithms.

Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

22 / 41

Experiments

Mnist

Mnist

Mnist 1

0.9

0.9

0.9

0.8

0.85

0.7

0.6 0.5

Our KSH LSH SPH BRE 1−AGH 2−AGH

0.4

0.3

0.2

0.1

0

10

20

30

40

Number of Bits

0.8 0.7

0.8

Precision

Top s Retrieval Precision

Top s Retrieval Precision (s=10)

Mnist

0.95

1

0.75

Our KSH LSH SPH BRE 1−AGH 2−AGH

0.7

0.65

0.6

0.5 10

0.5 0.4

Our KSH LSH SPH BRE 1−AGH 2−AGH

0.3 0.2

0.55

50

0.6

0.1

15

20

25

30

35

40

45

Number of Top s

50

0

0

0.2

0.4

0.6

0.8

1

Recall

Figure: The top s retrieval results and Precision-Recall curve on Mnist dataset over *SHL and 6 other hashing algorithms.

Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

23 / 41

Experiments

CIFAR

CIFAR−10

CIFAR−10

0.45

Our KSH LSH SPH BRE 1−AGH 2−AGH

0.9 0.8

0.3

0.25

0.2

0.35

0.7

Our KSH LSH SPH BRE 1−AGH 2−AGH

0.3

0.25

0.2

Precision

0.35

1

0.4

Top s Retrieval Precision

0.4

Top s Retrieval Precision (s=10)

CIFAR−10

0.45

Our KSH LSH SPH BRE 1−AGH 2−AGH

0.6 0.5 0.4 0.3 0.2

0.15

0.15 0.1

0.1

0

10

20

30

40

Number of Bits

50

0.1 10

15

20

25

30

35

40

45

Number of Top s

50

0

0

0.2

0.4

0.6

0.8

1

Recall

Figure: The top s retrieval results and Precision-Recall curve on CIFAR-10 dataset over *SHL and 6 other hashing algorithms.

Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

24 / 41

Experiments

Qualitative Results Query Image: Car *SHL KSH LSH SPH BRE 1−AGH 2−AGH Figure: Qualitative results on CIFAR-10. Query image is ”Car”. The remaining 15 images for each row were retrieved using 45-bit binary codes generated by different hashing algorithms. Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

25 / 41

Experiments

The Codewords Consider a small set of Mnist, bit length B = 25. Hamming distances between each pair of codewords after training µ1 µ2 µ3 µ4 µ5 µ6 µ7 µ8 µ9 µ10 µ1 µ2 µ3 µ4 µ5 µ6 µ7 µ8 µ9 µ10

-

11 -

Huang & Georgiopoulos & Anagnostopoulos

16 15 -

9 16 17 -

14 11 14 13 -

16 13 12 13 16 -

Hash Function Learning via Codewords

15 10 13 12 13 13 -

15 11 11 16 15 15 12 -

14 15 14 9 10 16 13 11 -

September 09th , 2015

10 13 14 11 14 16 11 17 10 -

26 / 41

Concentration Guarantees

Section 5 Concentration Guarantees

Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

27 / 41

Concentration Guarantees

Concentration Guarantees

With high probability, *SHL produces hash codes concentrated around the correct codeword. Images from the same class will be mapped closer to each other, which will benefit precision-recall performance.

Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

28 / 41

Concentration Guarantees

Concentration Guarantees Theorem 0 2 0 Assume reproducing kernels of {Hb }B b=1 s.t. kb (x, x ) 6 r , ∀x, x ∈ X. Then G B ¯ for a fixed value of ρ > 0, for any f ∈ F, any {µl }l=1 , µl ∈ H and any δ > 0, with probability 1 − δ, it holds that:

^ (f, µl ) + er (f, µl ) 6 er

2r √ ρB N

X

s Rb +

b

log δ1 2N

 (8)

where er (f, µl ) , B1 E{d (h, µl )}, l ∈ NG is the true label of x ∈ X, P 1 ^ (f, µl ) , NB ), where er Q (f (x )µ

n,b ρ b n ln ,b u Qρ (u) , min 1, max 0, 1 − ρ .

Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

29 / 41

Summary

Section 6 Summary

Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

30 / 41

Summary

Summary

A novel hash learning framework - *Supervised Hash Learning (*SHL) is proposed. *SHL is able to address supervised, unsupervised and, even, semi-supervised learning tasks in a unified fashion. Its training algorithm is simple to implement. Experiments on 5 benchmark datasets, compared with 6 other state-of-art methods, show *SHL is highly competitive.

Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

31 / 41

Summary

Thank You! Thanks for your time. Codes are available here: http://www.eecs.ucf.edu/˜yhuang/

Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

32 / 41

References

Section 7 References

Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

33 / 41

References

References I Chang, C.-C. and Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27. Software available at http://www.csie.ntu.edu.tw/˜cjlin/libsvm. Datta, R., Joshi, D., Li, J., and Wang, J. Z. (2008). Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys, 40(2):5:1–5:60. Gionis, A., Indyk, P., and Motwani, R. (1999). Similarity search in high dimensions via hashing. In Proceedings of the International Conference on Very Large Data Bases, pages 518–529. Kloft, M., Brefeld, U., Sonnenburg, S., and Zien, A. (2011). lp-norm multiple kernel learning. Journal of Machine Learning Research, 12:953–997. Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

34 / 41

References

References II

Kulis, B. and Darrell, T. (2009). Learning to hash with binary reconstructive embeddings. In Proceedings of Advanced Neural Information Processing Systems, pages 1042–1050. Liu, W., Wang, J., Ji, R., Jiang, Y.-G., and Chang, S.-F. (2012). Supervised hashing with kernels. In Proceedings of Computer Vision and Pattern Recognition, pages 2074–2081. Liu, W., Wang, J., Kumar, S., and Chang, S.-F. (2011). Hashing with graphs. In Proceedings of the International Conference on Machine Learning, pages 1–8.

Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

35 / 41

References

References III

Wang, J., Kumar, S., and Chang, S.-F. (2012). Semi-supervised hashing for large-scale search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(12):2393–2406. Weiss, Y., Torralba, A., and Fergus, R. (2008). Spectral hashing. In Proceedings of Advanced Neural Information Processing Systems, pages 1753–1760.

Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

36 / 41

Back Up Slides

Section 8 Back Up Slides

Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

37 / 41

Back Up Slides

Transductive Learning Datasets: Vowel Letter

Vowel

Letter

1

0.9 Inductive Transductive

Inductive Transductive

0.85

0.9 0.8 0.75 Accuracy Precision

Accuracy Precision

0.8

0.7

0.6

0.7 0.65 0.6 0.55

0.5

0.5 0.4 0.45

0

2

4

6 8 Number of Bits

10

12

14

0.4

0

2

4

6 8 Number of Bits

10

12

14

Figure: Accuracy results between Inductive and Transductive Learning.

Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

38 / 41

Back Up Slides

Algorithm Two key observations: Bounds of hash codes: Hinge Hamming distances.

P P d (h(x), µ) = b [µb fb (x) < 0] 6 d¯ (f, µ) , b [1 − µb fb ]+ .  P P ¯ We have: E(ω) 6 E(ω) , g n γg,n d¯ f(xn ), µg , by defining:

 γg,n ,

[g = ln ]   g = arg ming 0 d¯ f(xn ), µg 0

n ∈ NL n ∈ NU

(9)

Majorization-Minimization (MM): For parameter values and ω 0 , we have:  PωP 0 0 ¯E(ω) 6 E(ω|ω ¯ d¯ f(xn ), µg , where the primed quantities ) , g n γg,n are evaluated on ω 0 .

 0 γg,n

,

Huang & Georgiopoulos & Anagnostopoulos

[g h = ln ]  i n ∈ NL 0 0 ¯ g = arg ming 0 d f (xn ), µg 0 n ∈ NU Hash Function Learning via Codewords

September 09th , 2015

(10)

39 / 41

Back Up Slides

USPS

USPS

USPS

0.8

Top s Retrieval Precision

Top s Retrieval Precision (s=10)

0.9

0.7 0.6 0.5

Our KSH LSH SPH BRE 1−AGH 2−AGH

0.4 0.3 0.2 0.1

0

10

20

30

40

Number of Bits

USPS

1

1

0.95

0.9 0.8

0.9

0.7 0.85

Our KSH LSH SPH BRE 1−AGH 2−AGH

0.8 0.75 0.7 0.65

0.55 10

0.6 0.5 0.4 Our KSH LSH SPH BRE 1−AGH 2−AGH

0.3 0.2

0.6

50

Precision

1

0.1

15

20

25

30

35

40

45

Num of Top s

50

0

0

0.2

0.4

0.6

0.8

1

Recall

Figure: The top s retrieval results and Precision-Recall curve on USPS dataset over *SHL and 6 other hashing algorithms.

Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

40 / 41

Back Up Slides

PASCAL07

PASCAL 07

PASCAL 07

PASCAL 07

0.4

1

Our KSH LSH SPH BRE 1−AGH 2−AGH

0.8

0.25

0.2

0.15

0.3

Our KSH LSH SPH BRE 1−AGH 2−AGH

0.25

0.2

0.7

Precision

0.3

Our KSH LSH SPH BRE 1−AGH 2−AGH

0.9

Top s Retrieval Precision

Top s Retrieval Precision (s=10)

0.35

0.35

0.6 0.5 0.4 0.3 0.2

0.15 0.1

0.1 0.05

0

10

20

30

40

Number of Bits

50

0.1 10

15

20

25

30

35

40

45

Number of Top s

50

0

0

0.2

0.4

0.6

0.8

1

Recall

Figure: The top s retrieval results and Precision-Recall curve on PASCAL07 dataset over *SHL and 6 other hashing algorithms.

Huang & Georgiopoulos & Anagnostopoulos

Hash Function Learning via Codewords

September 09th , 2015

41 / 41