Sep 26, 2013 - 1 University of Central Florida. 2 Florida Institute of Technology ... A new local metric learning approach â Reduced-Rank Local Metric.
Reduced-Rank Local Distance Metric Learning Yinjie Huang1, Cong Li1, Michael Georgiopoulos1 and Georgios C. Anagnostopoulos2 University of Central Florida Florida Institute of Technology
1 2
Presenter: Yinjie Huang ECML/PKDD 2013 Date: September 26, 2013 ECML/PKDD 2013
MACHINE LEARNING LAB
9/19/13
1/29
Main Contents q Introduction q Contributions q Problem Formulation q Algorithm q Experiments q Conclusions
ECML/PKDD 2013
MACHINE LEARNING LAB
9/19/13
2/34
Introduction
ECML/PKDD 2013
MACHINE LEARNING LAB
9/19/13
3/34
Introduction q Many Machine learning problems and algorithms entail the computation of distances. k-nearest neighbor, k-Means algorithm q Fixed metric (Euclidean distance metric or Mahalanobis metric) may not perform well for all problems. q Metric Learning, data-driven approaches to infer the best metric for a given dataset. q Through side information (similar or dissimilar), a weight matrix of Mahalanobis metric can be learned to move similar data points closer while mapping dissimilar points apart. This is done so that an eventual application of a KNN decision rule exhibits improved performance. ECML/PKDD 2013
MACHINE LEARNING LAB
9/19/13
4/34
Introduction q Many such algorithms show significant improvements. Xing’s Algorithm[1] Online Metric Learning[2] Neighborhood Components Analysis (NCA)[3] Large Margin Nearest Neighbor (LMNN)[4] Information Theoretic Metric Learning (ITML)[5]
q Learning one single, global metric may not be well-suited in some settings like multimodality or non-linearity.
ECML/PKDD 2013
MACHINE LEARNING LAB
9/19/13
5/34
Introduction Figure 1. The dataset that illustrates the potential advantages of learning a local metric instead of a global one. (a) Original data (b) Data distribution in feature space after learning one global metric. (c) Data distribution in feature space after learning local metric
ECML/PKDD 2013
MACHINE LEARNING LAB
9/19/13
6/34
Introduction q Much work has been performed on local metric learning. Yang, L.[6]: ‘local’ as nearby pairs. Probabilistic framework solved using EM algorithm. Hastie, T.[7]: learns local metrics through reducing neighborhood distances in directions that are orthogonal to the local decision boundaries. Bilenko, M.[8]: define a metric for each cluster. LMNN-Multiple Metric (LMNN-MM)[9]: number of metrics equal to number of classes. Generative Local Metric Learning (GLML)[10]: Learns local metrics through minimizing NN classification error. Parametric Local Metric Learning (PLML)[11] Each local metric is related to an anchor point of the instance space. ECML/PKDD 2013
MACHINE LEARNING LAB
9/19/13
7/34
Contributions
ECML/PKDD 2013
MACHINE LEARNING LAB
9/19/13
8/34
Contributions q A new local metric learning approach – Reduced-Rank Local Metric Learning (R2LML). q R2LML is modeled as a conical combination of Mahalanobis metrics. q R2LML is able to control the rank of the involved linear mappings through a sparsity-inducing matrix norm. q We supply an algorithm for training the model and prove the set of fixed points includes the Karush-Kuhn-Tucker (KKT) points. q Running R2LML on 9 benchmarks. Compared with other global or local metric learning methods, R2LML exhibits the best accuracy in 7 out of 9 datasets. ECML/PKDD 2013
MACHINE LEARNING LAB
9/19/13
9/34
Problem Formulation
ECML/PKDD 2013
MACHINE LEARNING LAB
9/19/13
10/34
Problem Formulation q Let
for any positive integer
q Training set: q
and
and side information:
are similar, then
. Otherwise,
q Mahalanobis distance: A x x weight matrix of the metric A q The weight matrix A L L, where L x previous distance: A x x
ECML/PKDD 2013
.
x
is A
x
x
MACHINE LEARNING LAB
.
. A x
with .
x
,the
. Then the
9/19/13
11/34
Problem Formulation q We assume: the metric involved is expressed as a conical combinations of Mahalanobis metrics. q For each metric, a vector is defined. Each element is a measure of how important the metric is, when computing distances involving the training sample. q Constrain the vectors to sums up to all-ones vector forces at least one metric to be relevant. q The weight matrix for pair
ECML/PKDD 2013
is
A
MACHINE LEARNING LAB
.
9/19/13
12/34
Problem Formulation q We have:
Here
. controls penalty of violating the previous desiteratum. The final term controls the rank. q The above formulation could be simplified.
ECML/PKDD 2013
MACHINE LEARNING LAB
9/19/13
13/34
Problem Formulation q Let all
be the hinge function defined as . Thus, .
for
q Besides, final term could be replaced into the nuclear norm of which is the sum of ‘s singular value.
,
q Now:
Where denotes nuclear norm. And singular value ECML/PKDD 2013
MACHINE LEARNING LAB
,
is a
9/19/13
14/34
Algorithm
ECML/PKDD 2013
MACHINE LEARNING LAB
9/19/13
15/34
Algorithm q Two step algorithm 1. The problem is minimized over 2. The problem is minimized over
while is fixed. and is fixed.
q For step one, (3) becomes an unconstrained minimization problem and has the form , where is the parameter to minimize over. q We use Proximal Subgradient Descent (PSD) to solve this problem. q For second step, we consider a matrix whose element is becomes . ECML/PKDD 2013
associated to the metric, . Then problem (3)
MACHINE LEARNING LAB
9/19/13
16/34
Algorithm q Define be the vector concatenating all individual into a single vector Define
The cost function becomes The constraint becomes Where ,
vectors
denotes Kronecker product.
q Since is almost always indefinite (based on Euclidean Distance Matrix), the above cost function is non-convex. ECML/PKDD 2013
MACHINE LEARNING LAB
9/19/13
17/34
Algorithm q We employ a Majorization Minimization (MM) approach. q Define So we have for all ,
and
which is negative semi-definite. . and equality only if
q Instead, we use the following convex problem to approximate the true problem:
ECML/PKDD 2013
MACHINE LEARNING LAB
9/19/13
18/34
Algorithm q Theorem 1. Let unique minimizer
,
and
. The
of
has the form , where is the element of and is the Lagrange multiplier vector associated to the quality constraint. q Theorem 3. Algorithm 1 yields a convergent, non-increasing sequence of cost function values relavent to Problem (3). Furthermore, the set of fixed points of the iterative map embodied by Algorithm 1 includes the KKT points of Problem (3). ECML/PKDD 2013
MACHINE LEARNING LAB
9/19/13
19/34
Algorithm
ECML/PKDD 2013
MACHINE LEARNING LAB
9/19/13
20/34
Experiments
ECML/PKDD 2013
MACHINE LEARNING LAB
9/19/13
21/34
Experiments q We assign test point’s g value of the corresponding vector associated to its nearest (Euclidean distance) training sample. Table 1. Dataset from UCI machine learning repository and Delve Dataset Collection
#D
#CLASSES
ROBOT
4
4
240
240
4976
LETTER A-D
16
4
200
400
2496
PENDIGITS 1-5
16
5
200
1800
3541
WINEQUALITY
12
2
150
150
6197
TELESCOPE
10
2
300
300
11400
IMGSEG
18
7
210
210
1890
TWONORM
20
2
250
250
6900
RINGNORM
20
2
250
250
6900
IONOSPHERE
34
2
80
50
221
ECML/PKDD 2013
#TRAIN #VALIDATION #TEST
MACHINE LEARNING LAB
22/34
Experiments A: Number of local metrics q We show the performance of R2LML varies with respect to the number of local metrics K.
q The results are shown in Figure 2.
q Number of metrics might not necessarily be equal to the number of classes of dataset.
ECML/PKDD 2013
MACHINE LEARNING LAB
23/34
Experiments
A: Number of local metrics
Figure 2. R2LML classification results on 9 benchmark datasets for varying number K of local metrics. #C indicates the number of classes.
ECML/PKDD 2013
MACHINE LEARNING LAB
24/34
Experiments B: Comparisons q We compared R2LML with other metric learning algorithms, including Euclidean metric KNN, ITML[5], LMNN[4], LMNN-MM[9], GLML[10] and PLML[12].
q Both ITML and LMNN learn a global metric, while LMNN-MM, GLML and PLML are local metric learning algorithms
q After the metrics are learned, the KNN classifier is utilized for classification with k (number of nearest neighbors) set to 5. ECML/PKDD 2013
MACHINE LEARNING LAB
9/19/13
25/34
Experiments B: Comparisons q For each dataset, we use K’s optimal value as established in the previous series of experiments, while the regularization parameter was chosen via a validation procedure over the set {0.01, 0.1, 1, 10, 100}.
q For pair-wise model comparisons, we employed McNemar’s test. Since there are 7 algorithms to be compared, we use Holm’s step-down procedure as a multiple hypothesis testing method to control the Family-Wise Error Rate (FWER) of the resulting pair-wise McNemar’s tests.
ECML/PKDD 2013
MACHINE LEARNING LAB
9/19/13
26/34
Experiments
B: Comparisons
Table 2. Percent accuracy results of 7 algorithms on 9 benchmark datasets. All algorithms are ranked from best to worst. Same rank means their performances are statistically comparable.
Euclidean
ITML LMNN LMNN-MM
GLML
PLML R2LML
Robot
65.312nd
65.862nd
66.102nd
66.102nd
62.283rd
61.033rd
74.161st
Letter A-D
88.822nd
93.391st
93.791st
93.831st
89.302nd
94.431st
95.071st
Pendigits 1-5
88.314th
93.172nd
91.193rd
91.273rd
88.374th
95.881st
95.431st
Winequality
86.127th
96.113rd
94.434th
93.385th
91.796th
98.551st
97.532nd
Telescope
70.313rd
71.422nd
72.162nd
71.452nd
70.313rd
77.521st
77.971st
Imgseg
80.054th
90.212nd
90.742nd
89.422nd
87.303rd
90.482nd
92.591st
Twonorm
96.542nd
96.781st
96.322nd
96.302nd
96.522nd
97.321st
97.231st
Ringnorm
55.847th
77.352nd
59.366th
59.755th
97.091st
75.683rd
73.734th
Ionosphere
75.573rd
86.431st
82.352nd
82.352nd
71.953rd
78.733rd
90.501st
ECML/PKDD 2013
MACHINE LEARNING LAB
9/19/13
27/34
Conclusions
ECML/PKDD 2013
MACHINE LEARNING LAB
9/19/13
28/34
Conclusion q We propose a new local metric learning model, namely Reduced-Rank Local Metric Learning (R2LML). q In order to solve our proposed formulation, a two-step algorithm is showcased, which iteratively solves two sub-problems in an alternating fashion. q We have demonstrated that our algorithm converges and that its fixed points include the Karush-Kuhn-Tucker (KKT) points of our proposed formulation. q In the first experiment, we varied the number of local metrics K and discussed the influence of K on classification accuracy. q In the second experiment, we compared R2LML with other metric learning algorithms and demonstrated that our proposed method is highly competitive. ECML/PKDD 2013
MACHINE LEARNING LAB
9/19/13
29/34
References [1] Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning with application to clustering with sideinformation. NIPS 2002 [2] Shalev-Shwartz, S., Singer, Y., Ng, A.Y.: Online and batch learning of pseudo-metrics. ICML 2004. [3] Goldberger, J., Roweis, S., Hinton, G., Salakhutdinov, R.: Neighbourhood components analysis. NIPS 2004 [4] Weinberger, K.Q., Blizer, J., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. NIPS 2006 [5] Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillion, I.S.: Information-theoretic metric learning. ICML 2007. [6] Yang, L., Jin, R., Sukthankar, R., Liu, Y.: An efficient algorithm for local distance metric learning. AAAI 2006. [7] Hastie, T., Tibshirani, R.: Discriminant adaptive nearest neighbor classification. TPAMI 1996. [8] Bilenko, M., Basu, S., Mooney, R.J.: Integrating constraints and metric learning in semi-supervised clustering. ICML 2004. [9] Weinberger, K., Saul, L.: Fast solvers and efficient implementations for distance metric learning. ICML 2008. [10] Noh, Y.K., Zhang, B.T., Lee, D.D.: Generative local metric learning for nearest neighbor classification. NIPS 2010 [11] Wang, J., Kalousis, A., Woznica, A.: Parametric local metric learning for nearest neighbor classification. NIPS 2012
ECML/PKDD 2013
MACHINE LEARNING LAB
9/19/13
30/34
ECML/PKDD 2013
MACHINE LEARNING LAB
9/19/13
31/34