1Machine Learning Laboratory, University of Central Florida, US. 2ICE Laboratory, Florida Institute of Technology, US. September 09th, 2015 ...
Hash Function Learning via Codewords 2015 ECML/PKDD, Porto, Portugal, September 7–11, 2015.
Yinjie Huang 1 Michael Georgiopoulos 1 Georgios C. Anagnostopoulos 2 1
Machine Learning Laboratory, University of Central Florida, US 2
ICE Laboratory, Florida Institute of Technology, US
September 09th , 2015
Table of Contents 1
Introduction
2
Formulation
3
Algorithm
4
Experiments
5
Concentration Guarantees
6
Summary
7
References
8
Back Up Slides
Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
2 / 41
Introduction
Section 1 Introduction
Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
3 / 41
Introduction
What is Content-Based Image Retrieval?
Figure: Content-Based Image Retrieval (CBIR) [Datta et al., 2008] Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
4 / 41
Introduction
Challenges in CBIR
There are two main challenges in CBIR: Search complexity considerations Nearest neighbor search. In practical settings (large amount of data), exhaustively comparing the query with each sample in the database is impractical.
Storage space considerations Image features usually have hundreds or thousands of features. Storing all raw images in the database also poses a problem.
Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
5 / 41
Introduction
Hash Function Learning
Figure: Hashing based CBIR [Wang et al., 2012] Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
6 / 41
Introduction
Hash Function Learning
Design hash functions transforming the original data into compact binary codes. Hash functions to be learned aim to map similar data to similar hash codes. Benefits: Approximate nearest neighbors (ANN) search [Datta et al., 2008] using binary codes was shown to achieve sub-linear search time. Storage requirement advantages. For example, a 10 dimension real value vector needs 320 bits (for single precision), while the hash code (represent this vector) may need only 10 bits.
Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
7 / 41
Introduction
Data-dependent hashing
Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
8 / 41
Introduction
Contributions
One new hashing framework - *Supervised Hash Learning (*SHL): *SHL can naturally engage supervised, unsupervised and semi-supervised hash learning scenarios. *SHL considers a set of Hamming space codewords that are learned during training in order to capture the intrinsic similarities between the data. The minimization problem of *SHL naturally leads to a set of Support Vector Machine (SVM) problems, which can be efficiently solved by LIBSVM [Chang and Lin, 2011]. Theoretical insight about *SHL’s superior performance.
Results: We consider 5 benchmark datasets. Compared with 6 other state-of-art methods. The results show *SHL is highly competitive.
Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
9 / 41
Formulation
Section 2 Formulation
Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
10 / 41
Formulation
Formulation *SHL utilizes codewords µg , g ∈ NG . Each codeword is associated to a class.
Figure: Idea of *SHL
Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
11 / 41
Formulation
Formulation
By adjusting its parameters ω, *SHL attempts to reduce the distortion measure: X X (1) min d h(xn ), µg E(ω) , d h(xn ), µln + n∈NL
n∈NU
g
d is the Hamming distance defined as d(h, h 0 ) ,
Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
P b
[hb 6= hb0 ].
September 09th , 2015
12 / 41
Formulation
Formulation
Hash code: h(x) , sgn f(x) ∈ HB for a sample x ∈ X. f(x) , [f1 (x) . . . fB (x)]T , where fb (x) , hwb , φ(x)iHb + βb with wb ∈ Ωwb , wb ∈ Hb : kwb kHb 6 Rb , Rb > 0 and βb ∈ R for all b ∈ NB . Hb is a Reproducing Kernel Hilbert Space (RKHS) with inner product h·, ·iHb . Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
13 / 41
Algorithm
Section 3 Algorithm
Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
14 / 41
Algorithm
Algorithm
Two key observations: Bounds of hash codes: Hinge Hamming distances.
P P d (h(x), µ) = b [µb fb (x) < 0] 6 d¯ (f, µ) , b [1 − µb fb ]+ . P P ¯ We have: E(ω) 6 E(ω) , g n γg,n d¯ f(xn ), µg .
Majorization-Minimization (MM): For parameter values and ω 0 , we have: PωP 0 0 ¯ ¯ E(ω) 6 E(ω|ω ) , g n γg,n d¯ f(xn ), µg .
Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
15 / 41
Algorithm
Algorithm - First block minimization 0 ¯ Minimizing E(·|ω ), one obtains B independent, equivalent problems:
wb,m
inf , βb
C ,θb ,µg,b
XX g
+
h i 0 1 − µg,b fb (xn ) γg,n
+
n
2
wb,m
X 1 H
m
2
m
θb,m
b ∈ NB
(2)
By considering wb,m and βb for each b as a single block, it leads to the dual form:
sup αb ∈Ωab
αTb 1NG −
Huang & Georgiopoulos & Anagnostopoulos
1 T α Db [(1G 1TG ) ⊗ Kb ]Db αb b ∈ NB 2 b
Hash Function Learning via Codewords
September 09th , 2015
(3)
16 / 41
Algorithm
Algorithm - First block minimization SVM training problem - LIBSVM.
Figure: For each bit, one SVM problem.
Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
17 / 41
Algorithm
Algorithm - Second block minimization Second block minimization: inf wb,m ,βb , θb
C ,µg,b
XX g
0 [1 − µg,b fb (xn )]+ γg,n
n
+
2 1 X kwb,m kHm 2 m θb,m
b ∈ NB
(4)
MKL, closed form solution [Kloft et al., 2011]: 2
θb,m = P
p+1 kwb,m kH m 2p p+1
m0
Huang & Georgiopoulos & Anagnostopoulos
p1 ,
m ∈ NM , b ∈ NB .
(5)
kwb,m 0 kHm 0
Hash Function Learning via Codewords
September 09th , 2015
18 / 41
Algorithm
Algorithm - Third block minimization Third block minimization: inf wb,m ,βb ,θb ,
C µg,b
XX g
h i 0 1 − µg,b fb (xn ) γg,n
+
n
1 X kwb,m kHm 2 m θb,m 2
+
b ∈ NB
(6)
optimize over codewords by substitution: inf
µg,b ∈H
Huang & Georgiopoulos & Anagnostopoulos
X
γg,n [1 − µg,b fb (xn )]+
g ∈ N G , b ∈ NB
(7)
n
Hash Function Learning via Codewords
September 09th , 2015
19 / 41
Experiments
Section 4 Experiments
Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
20 / 41
Experiments
Supervised Hash Learning Results
Methods to compare with: Kernel Supervised Learning (KSH) [Liu et al., 2012]. Binary Reconstructive Embedding (BRE) [Kulis and Darrell, 2009]. Single-layer Anchor Graph Hashing (1-AGH) and its two-layer version (2-AGH) [Liu et al., 2011]. Spectral Hashing (SPH) [Weiss et al., 2008]. Locality-Sensitive Hashing (LSH) [Gionis et al., 1999].
Performance metric: Precision (retrieval accuracy). Precision - Recall (PR) curve.
Datasets: Pendigits, USPS, Mnist, PASCAL07, CIFAR-10.
Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
21 / 41
Experiments
Pendigits
Pendigits
Pendigits
Pendigits
1
1
1 0.9
0.7
0.6 0.5
Our KSH LSH SPH BRE 1−AGH 2−AGH
0.4 0.3
0.2
0.1
0
10
20
30
40
Number of Bits
0.8
0.7
Our KSH LSH SPH BRE 1−AGH 2−AGH
0.8 0.7
Precision
0.9 0.8
Top s Retrieval Precision
Top s Retrieval Precision (s=10)
0.9
0.6 0.5 0.4
Our KSH LSH SPH BRE 1−AGH 2−AGH
0.6 0.3 0.2 0.5 0.1
50
0.4 10
15
20
25
30
35
40
45
Number of Top s
50
0
0
0.2
0.4
0.6
0.8
1
Recall
Figure: The top s retrieval results and Precision-Recall curve on Pendigits dataset over *SHL and 6 other hashing algorithms.
Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
22 / 41
Experiments
Mnist
Mnist
Mnist 1
0.9
0.9
0.9
0.8
0.85
0.7
0.6 0.5
Our KSH LSH SPH BRE 1−AGH 2−AGH
0.4
0.3
0.2
0.1
0
10
20
30
40
Number of Bits
0.8 0.7
0.8
Precision
Top s Retrieval Precision
Top s Retrieval Precision (s=10)
Mnist
0.95
1
0.75
Our KSH LSH SPH BRE 1−AGH 2−AGH
0.7
0.65
0.6
0.5 10
0.5 0.4
Our KSH LSH SPH BRE 1−AGH 2−AGH
0.3 0.2
0.55
50
0.6
0.1
15
20
25
30
35
40
45
Number of Top s
50
0
0
0.2
0.4
0.6
0.8
1
Recall
Figure: The top s retrieval results and Precision-Recall curve on Mnist dataset over *SHL and 6 other hashing algorithms.
Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
23 / 41
Experiments
CIFAR
CIFAR−10
CIFAR−10
0.45
Our KSH LSH SPH BRE 1−AGH 2−AGH
0.9 0.8
0.3
0.25
0.2
0.35
0.7
Our KSH LSH SPH BRE 1−AGH 2−AGH
0.3
0.25
0.2
Precision
0.35
1
0.4
Top s Retrieval Precision
0.4
Top s Retrieval Precision (s=10)
CIFAR−10
0.45
Our KSH LSH SPH BRE 1−AGH 2−AGH
0.6 0.5 0.4 0.3 0.2
0.15
0.15 0.1
0.1
0
10
20
30
40
Number of Bits
50
0.1 10
15
20
25
30
35
40
45
Number of Top s
50
0
0
0.2
0.4
0.6
0.8
1
Recall
Figure: The top s retrieval results and Precision-Recall curve on CIFAR-10 dataset over *SHL and 6 other hashing algorithms.
Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
24 / 41
Experiments
Qualitative Results Query Image: Car *SHL KSH LSH SPH BRE 1−AGH 2−AGH Figure: Qualitative results on CIFAR-10. Query image is ”Car”. The remaining 15 images for each row were retrieved using 45-bit binary codes generated by different hashing algorithms. Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
25 / 41
Experiments
The Codewords Consider a small set of Mnist, bit length B = 25. Hamming distances between each pair of codewords after training µ1 µ2 µ3 µ4 µ5 µ6 µ7 µ8 µ9 µ10 µ1 µ2 µ3 µ4 µ5 µ6 µ7 µ8 µ9 µ10
-
11 -
Huang & Georgiopoulos & Anagnostopoulos
16 15 -
9 16 17 -
14 11 14 13 -
16 13 12 13 16 -
Hash Function Learning via Codewords
15 10 13 12 13 13 -
15 11 11 16 15 15 12 -
14 15 14 9 10 16 13 11 -
September 09th , 2015
10 13 14 11 14 16 11 17 10 -
26 / 41
Concentration Guarantees
Section 5 Concentration Guarantees
Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
27 / 41
Concentration Guarantees
Concentration Guarantees
With high probability, *SHL produces hash codes concentrated around the correct codeword. Images from the same class will be mapped closer to each other, which will benefit precision-recall performance.
Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
28 / 41
Concentration Guarantees
Concentration Guarantees Theorem 0 2 0 Assume reproducing kernels of {Hb }B b=1 s.t. kb (x, x ) 6 r , ∀x, x ∈ X. Then G B ¯ for a fixed value of ρ > 0, for any f ∈ F, any {µl }l=1 , µl ∈ H and any δ > 0, with probability 1 − δ, it holds that:
^ (f, µl ) + er (f, µl ) 6 er
2r √ ρB N
X
s Rb +
b
log δ1 2N
(8)
where er (f, µl ) , B1 E{d (h, µl )}, l ∈ NG is the true label of x ∈ X, P 1 ^ (f, µl ) , NB ), where er Q (f (x )µ
n,b ρ b n ln ,b u Qρ (u) , min 1, max 0, 1 − ρ .
Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
29 / 41
Summary
Section 6 Summary
Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
30 / 41
Summary
Summary
A novel hash learning framework - *Supervised Hash Learning (*SHL) is proposed. *SHL is able to address supervised, unsupervised and, even, semi-supervised learning tasks in a unified fashion. Its training algorithm is simple to implement. Experiments on 5 benchmark datasets, compared with 6 other state-of-art methods, show *SHL is highly competitive.
Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
31 / 41
Summary
Thank You! Thanks for your time. Codes are available here: http://www.eecs.ucf.edu/˜yhuang/
Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
32 / 41
References
Section 7 References
Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
33 / 41
References
References I Chang, C.-C. and Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27. Software available at http://www.csie.ntu.edu.tw/˜cjlin/libsvm. Datta, R., Joshi, D., Li, J., and Wang, J. Z. (2008). Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys, 40(2):5:1–5:60. Gionis, A., Indyk, P., and Motwani, R. (1999). Similarity search in high dimensions via hashing. In Proceedings of the International Conference on Very Large Data Bases, pages 518–529. Kloft, M., Brefeld, U., Sonnenburg, S., and Zien, A. (2011). lp-norm multiple kernel learning. Journal of Machine Learning Research, 12:953–997. Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
34 / 41
References
References II
Kulis, B. and Darrell, T. (2009). Learning to hash with binary reconstructive embeddings. In Proceedings of Advanced Neural Information Processing Systems, pages 1042–1050. Liu, W., Wang, J., Ji, R., Jiang, Y.-G., and Chang, S.-F. (2012). Supervised hashing with kernels. In Proceedings of Computer Vision and Pattern Recognition, pages 2074–2081. Liu, W., Wang, J., Kumar, S., and Chang, S.-F. (2011). Hashing with graphs. In Proceedings of the International Conference on Machine Learning, pages 1–8.
Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
35 / 41
References
References III
Wang, J., Kumar, S., and Chang, S.-F. (2012). Semi-supervised hashing for large-scale search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(12):2393–2406. Weiss, Y., Torralba, A., and Fergus, R. (2008). Spectral hashing. In Proceedings of Advanced Neural Information Processing Systems, pages 1753–1760.
Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
36 / 41
Back Up Slides
Section 8 Back Up Slides
Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
37 / 41
Back Up Slides
Transductive Learning Datasets: Vowel Letter
Vowel
Letter
1
0.9 Inductive Transductive
Inductive Transductive
0.85
0.9 0.8 0.75 Accuracy Precision
Accuracy Precision
0.8
0.7
0.6
0.7 0.65 0.6 0.55
0.5
0.5 0.4 0.45
0
2
4
6 8 Number of Bits
10
12
14
0.4
0
2
4
6 8 Number of Bits
10
12
14
Figure: Accuracy results between Inductive and Transductive Learning.
Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
38 / 41
Back Up Slides
Algorithm Two key observations: Bounds of hash codes: Hinge Hamming distances.
P P d (h(x), µ) = b [µb fb (x) < 0] 6 d¯ (f, µ) , b [1 − µb fb ]+ . P P ¯ We have: E(ω) 6 E(ω) , g n γg,n d¯ f(xn ), µg , by defining:
γg,n ,
[g = ln ] g = arg ming 0 d¯ f(xn ), µg 0
n ∈ NL n ∈ NU
(9)
Majorization-Minimization (MM): For parameter values and ω 0 , we have: PωP 0 0 ¯E(ω) 6 E(ω|ω ¯ d¯ f(xn ), µg , where the primed quantities ) , g n γg,n are evaluated on ω 0 .
0 γg,n
,
Huang & Georgiopoulos & Anagnostopoulos
[g h = ln ] i n ∈ NL 0 0 ¯ g = arg ming 0 d f (xn ), µg 0 n ∈ NU Hash Function Learning via Codewords
September 09th , 2015
(10)
39 / 41
Back Up Slides
USPS
USPS
USPS
0.8
Top s Retrieval Precision
Top s Retrieval Precision (s=10)
0.9
0.7 0.6 0.5
Our KSH LSH SPH BRE 1−AGH 2−AGH
0.4 0.3 0.2 0.1
0
10
20
30
40
Number of Bits
USPS
1
1
0.95
0.9 0.8
0.9
0.7 0.85
Our KSH LSH SPH BRE 1−AGH 2−AGH
0.8 0.75 0.7 0.65
0.55 10
0.6 0.5 0.4 Our KSH LSH SPH BRE 1−AGH 2−AGH
0.3 0.2
0.6
50
Precision
1
0.1
15
20
25
30
35
40
45
Num of Top s
50
0
0
0.2
0.4
0.6
0.8
1
Recall
Figure: The top s retrieval results and Precision-Recall curve on USPS dataset over *SHL and 6 other hashing algorithms.
Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
40 / 41
Back Up Slides
PASCAL07
PASCAL 07
PASCAL 07
PASCAL 07
0.4
1
Our KSH LSH SPH BRE 1−AGH 2−AGH
0.8
0.25
0.2
0.15
0.3
Our KSH LSH SPH BRE 1−AGH 2−AGH
0.25
0.2
0.7
Precision
0.3
Our KSH LSH SPH BRE 1−AGH 2−AGH
0.9
Top s Retrieval Precision
Top s Retrieval Precision (s=10)
0.35
0.35
0.6 0.5 0.4 0.3 0.2
0.15 0.1
0.1 0.05
0
10
20
30
40
Number of Bits
50
0.1 10
15
20
25
30
35
40
45
Number of Top s
50
0
0
0.2
0.4
0.6
0.8
1
Recall
Figure: The top s retrieval results and Precision-Recall curve on PASCAL07 dataset over *SHL and 6 other hashing algorithms.
Huang & Georgiopoulos & Anagnostopoulos
Hash Function Learning via Codewords
September 09th , 2015
41 / 41