## Reduced-Rank Local Distance Metric Learning - UCF EECS

Sep 26, 2013 - 1 University of Central Florida. 2 Florida Institute of Technology ... A new local metric learning approach â Reduced-Rank Local Metric.

Reduced-Rank Local Distance Metric Learning Yinjie Huang1, Cong Li1, Michael Georgiopoulos1 and Georgios C. Anagnostopoulos2 University of Central Florida Florida Institute of Technology

1 2

Presenter: Yinjie Huang ECML/PKDD 2013 Date: September 26, 2013 ECML/PKDD 2013

MACHINE LEARNING LAB

9/19/13

1/29

Main Contents q  Introduction q  Contributions q  Problem Formulation q  Algorithm q  Experiments q  Conclusions

ECML/PKDD 2013

MACHINE LEARNING LAB

9/19/13

2/34

Introduction

ECML/PKDD 2013

MACHINE LEARNING LAB

9/19/13

3/34

Introduction q  Many Machine learning problems and algorithms entail the computation of distances. k-nearest neighbor, k-Means algorithm q  Fixed metric (Euclidean distance metric or Mahalanobis metric) may not perform well for all problems. q  Metric Learning, data-driven approaches to infer the best metric for a given dataset. q  Through side information (similar or dissimilar), a weight matrix of Mahalanobis metric can be learned to move similar data points closer while mapping dissimilar points apart. This is done so that an eventual application of a KNN decision rule exhibits improved performance. ECML/PKDD 2013

MACHINE LEARNING LAB

9/19/13

4/34

Introduction q  Many such algorithms show significant improvements. Xing’s Algorithm[1] Online Metric Learning[2] Neighborhood Components Analysis (NCA)[3] Large Margin Nearest Neighbor (LMNN)[4] Information Theoretic Metric Learning (ITML)[5]

q  Learning one single, global metric may not be well-suited in some settings like multimodality or non-linearity.

ECML/PKDD 2013

MACHINE LEARNING LAB

9/19/13

5/34

Introduction Figure 1. The dataset that illustrates the potential advantages of learning a local metric instead of a global one. (a)  Original data (b)  Data distribution in feature space after learning one global metric. (c)  Data distribution in feature space after learning local metric

ECML/PKDD 2013

MACHINE LEARNING LAB

9/19/13

6/34

Introduction q  Much work has been performed on local metric learning. Yang, L.[6]: ‘local’ as nearby pairs. Probabilistic framework solved using EM algorithm. Hastie, T.[7]: learns local metrics through reducing neighborhood distances in directions that are orthogonal to the local decision boundaries. Bilenko, M.[8]: define a metric for each cluster. LMNN-Multiple Metric (LMNN-MM)[9]: number of metrics equal to number of classes. Generative Local Metric Learning (GLML)[10]: Learns local metrics through minimizing NN classification error. Parametric Local Metric Learning (PLML)[11] Each local metric is related to an anchor point of the instance space. ECML/PKDD 2013

MACHINE LEARNING LAB

9/19/13

7/34

Contributions

ECML/PKDD 2013

MACHINE LEARNING LAB

9/19/13

8/34

Contributions q  A new local metric learning approach – Reduced-Rank Local Metric Learning (R2LML). q  R2LML is modeled as a conical combination of Mahalanobis metrics. q  R2LML is able to control the rank of the involved linear mappings through a sparsity-inducing matrix norm. q  We supply an algorithm for training the model and prove the set of fixed points includes the Karush-Kuhn-Tucker (KKT) points. q  Running R2LML on 9 benchmarks. Compared with other global or local metric learning methods, R2LML exhibits the best accuracy in 7 out of 9 datasets. ECML/PKDD 2013

MACHINE LEARNING LAB

9/19/13

9/34

Problem Formulation

ECML/PKDD 2013

MACHINE LEARNING LAB

9/19/13

10/34

Problem Formulation q  Let

for any positive integer

q  Training set: q

and

and side information:

are similar, then

. Otherwise,

q  Mahalanobis distance: A x x weight matrix of the metric A q  The weight matrix A L L, where L x previous distance: A x x

ECML/PKDD 2013

.

x

is A

x

x

MACHINE LEARNING LAB

.

. A x

with .

x

，the

. Then the

9/19/13

11/34

Problem Formulation q  We assume: the metric involved is expressed as a conical combinations of Mahalanobis metrics. q  For each metric, a vector is defined. Each element is a measure of how important the metric is, when computing distances involving the training sample. q  Constrain the vectors to sums up to all-ones vector forces at least one metric to be relevant. q  The weight matrix for pair

ECML/PKDD 2013

is

A

MACHINE LEARNING LAB

.

9/19/13

12/34

Problem Formulation q  We have:

Here

. controls penalty of violating the previous desiteratum. The final term controls the rank. q  The above formulation could be simplified.

ECML/PKDD 2013

MACHINE LEARNING LAB

9/19/13

13/34

Problem Formulation q  Let all

be the hinge function defined as . Thus, .

for

q  Besides, final term could be replaced into the nuclear norm of which is the sum of ‘s singular value.

,

q  Now:

Where denotes nuclear norm. And singular value ECML/PKDD 2013

MACHINE LEARNING LAB

,

is a

9/19/13

14/34

Algorithm

ECML/PKDD 2013

MACHINE LEARNING LAB

9/19/13

15/34

Algorithm q  Two step algorithm 1. The problem is minimized over 2. The problem is minimized over

while is fixed. and is fixed.

q  For step one, (3) becomes an unconstrained minimization problem and has the form , where is the parameter to minimize over. q  We use Proximal Subgradient Descent (PSD) to solve this problem. q  For second step, we consider a matrix whose element is becomes . ECML/PKDD 2013

associated to the metric, . Then problem (3)

MACHINE LEARNING LAB

9/19/13

16/34

Algorithm q  Define be the vector concatenating all individual into a single vector Define

The cost function becomes The constraint becomes Where ,

vectors

denotes Kronecker product.

q  Since is almost always indefinite (based on Euclidean Distance Matrix), the above cost function is non-convex. ECML/PKDD 2013

MACHINE LEARNING LAB

9/19/13

17/34

Algorithm q  We employ a Majorization Minimization (MM) approach. q  Define So we have for all ,

and

which is negative semi-definite. . and equality only if

q  Instead, we use the following convex problem to approximate the true problem:

ECML/PKDD 2013

MACHINE LEARNING LAB

9/19/13

18/34

Algorithm q  Theorem 1. Let unique minimizer

,

and

. The

of

has the form , where is the element of and is the Lagrange multiplier vector associated to the quality constraint. q  Theorem 3. Algorithm 1 yields a convergent, non-increasing sequence of cost function values relavent to Problem (3). Furthermore, the set of fixed points of the iterative map embodied by Algorithm 1 includes the KKT points of Problem (3). ECML/PKDD 2013

MACHINE LEARNING LAB

9/19/13

19/34

Algorithm

ECML/PKDD 2013

MACHINE LEARNING LAB

9/19/13

20/34

Experiments

ECML/PKDD 2013

MACHINE LEARNING LAB

9/19/13

21/34

Experiments q  We assign test point’s g value of the corresponding vector associated to its nearest (Euclidean distance) training sample. Table 1. Dataset from UCI machine learning repository and Delve Dataset Collection

#D

#CLASSES

ROBOT

4

4

240

240

4976

LETTER A-D

16

4

200

400

2496

PENDIGITS 1-5

16

5

200

1800

3541

WINEQUALITY

12

2

150

150

6197

TELESCOPE

10

2

300

300

11400

IMGSEG

18

7

210

210

1890

TWONORM

20

2

250

250

6900

RINGNORM

20

2

250

250

6900

IONOSPHERE

34

2

80

50

221

ECML/PKDD 2013

#TRAIN #VALIDATION #TEST

MACHINE LEARNING LAB

22/34

Experiments A: Number of local metrics q  We show the performance of R2LML varies with respect to the number of local metrics K.

q  The results are shown in Figure 2.

q  Number of metrics might not necessarily be equal to the number of classes of dataset.

ECML/PKDD 2013

MACHINE LEARNING LAB

23/34

Experiments

A: Number of local metrics

Figure 2. R2LML classification results on 9 benchmark datasets for varying number K of local metrics. #C indicates the number of classes.

ECML/PKDD 2013

MACHINE LEARNING LAB

24/34

Experiments B: Comparisons q  We compared R2LML with other metric learning algorithms, including Euclidean metric KNN, ITML[5], LMNN[4], LMNN-MM[9], GLML[10] and PLML[12].

q  Both ITML and LMNN learn a global metric, while LMNN-MM, GLML and PLML are local metric learning algorithms

q  After the metrics are learned, the KNN classifier is utilized for classification with k (number of nearest neighbors) set to 5. ECML/PKDD 2013

MACHINE LEARNING LAB

9/19/13

25/34

Experiments B: Comparisons q  For each dataset, we use K’s optimal value as established in the previous series of experiments, while the regularization parameter was chosen via a validation procedure over the set {0.01, 0.1, 1, 10, 100}.

q  For pair-wise model comparisons, we employed McNemar’s test. Since there are 7 algorithms to be compared, we use Holm’s step-down procedure as a multiple hypothesis testing method to control the Family-Wise Error Rate (FWER) of the resulting pair-wise McNemar’s tests.

ECML/PKDD 2013

MACHINE LEARNING LAB

9/19/13

26/34

Experiments

B: Comparisons

Table 2. Percent accuracy results of 7 algorithms on 9 benchmark datasets. All algorithms are ranked from best to worst. Same rank means their performances are statistically comparable.

Euclidean

ITML LMNN LMNN-MM

GLML

PLML R2LML

Robot

65.312nd

65.862nd

66.102nd

66.102nd

62.283rd

61.033rd

74.161st

Letter A-D

88.822nd

93.391st

93.791st

93.831st

89.302nd

94.431st

95.071st

Pendigits 1-5

88.314th

93.172nd

91.193rd

91.273rd

88.374th

95.881st

95.431st

Winequality

86.127th

96.113rd

94.434th

93.385th

91.796th

98.551st

97.532nd

Telescope

70.313rd

71.422nd

72.162nd

71.452nd

70.313rd

77.521st

77.971st

Imgseg

80.054th

90.212nd

90.742nd

89.422nd

87.303rd

90.482nd

92.591st

Twonorm

96.542nd

96.781st

96.322nd

96.302nd

96.522nd

97.321st

97.231st

Ringnorm

55.847th

77.352nd

59.366th

59.755th

97.091st

75.683rd

73.734th

Ionosphere

75.573rd

86.431st

82.352nd

82.352nd

71.953rd

78.733rd

90.501st

ECML/PKDD 2013

MACHINE LEARNING LAB

9/19/13

27/34

Conclusions

ECML/PKDD 2013

MACHINE LEARNING LAB

9/19/13

28/34

Conclusion q  We propose a new local metric learning model, namely Reduced-Rank Local Metric Learning (R2LML). q  In order to solve our proposed formulation, a two-step algorithm is showcased, which iteratively solves two sub-problems in an alternating fashion. q  We have demonstrated that our algorithm converges and that its fixed points include the Karush-Kuhn-Tucker (KKT) points of our proposed formulation. q  In the first experiment, we varied the number of local metrics K and discussed the influence of K on classification accuracy. q  In the second experiment, we compared R2LML with other metric learning algorithms and demonstrated that our proposed method is highly competitive. ECML/PKDD 2013

MACHINE LEARNING LAB

9/19/13

29/34

References [1] Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning with application to clustering with sideinformation. NIPS 2002 [2] Shalev-Shwartz, S., Singer, Y., Ng, A.Y.: Online and batch learning of pseudo-metrics. ICML 2004. [3] Goldberger, J., Roweis, S., Hinton, G., Salakhutdinov, R.: Neighbourhood components analysis. NIPS 2004 [4] Weinberger, K.Q., Blizer, J., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. NIPS 2006 [5] Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillion, I.S.: Information-theoretic metric learning. ICML 2007. [6] Yang, L., Jin, R., Sukthankar, R., Liu, Y.: An efficient algorithm for local distance metric learning. AAAI 2006. [7] Hastie, T., Tibshirani, R.: Discriminant adaptive nearest neighbor classification. TPAMI 1996. [8] Bilenko, M., Basu, S., Mooney, R.J.: Integrating constraints and metric learning in semi-supervised clustering. ICML 2004. [9] Weinberger, K., Saul, L.: Fast solvers and efficient implementations for distance metric learning. ICML 2008. [10] Noh, Y.K., Zhang, B.T., Lee, D.D.: Generative local metric learning for nearest neighbor classification. NIPS 2010 [11] Wang, J., Kalousis, A., Woznica, A.: Parametric local metric learning for nearest neighbor classification. NIPS 2012

ECML/PKDD 2013

MACHINE LEARNING LAB

9/19/13

30/34

ECML/PKDD 2013

MACHINE LEARNING LAB

9/19/13

31/34