Semi-definite Manifold Alignment - Semantic Scholar

3 downloads 172 Views 406KB Size Report
s.t. tr(FFT )=1, tr(GGT )=1. Fe = 0, Ge = 0. (6) which is a co-dimensionality reduction problem without any prior knowledge about the relationship between X and Y ...
Semi-definite Manifold Alignment Liang Xiong, Fei Wang, and Changshui Zhang Department of Automation, Tsinghua University, Beijing, China

Abstract. In this paper, we study the problem of manifold alignment, which aims at “aligning” different data sets which share a similar intrinsic manifold provided some supervision. Unlike traditional methods that rely on pairwise correspondences between the two data sets, our method only needs some relative comparison information like “A is more similar to B than A is to C”. This method provides a more flexible way to acquire the prior knowledge for alignment, thus is able to handle situations where corresponding pairs are hard or impossible to identify. We optimize our objective based on the graphs that give discrete approximations of the manifold. Further, the problem is formulated as a semi-definite programming (SDP) problem which can readily be solved. Finally the experimental results on aligning several different types of manifolds are presented to show the effectiveness of our method.

1 Introduction In the field of machine learning, we are often faced with data that have very high dimensionality (e.g. images and vector-space representations of text documents). Directly dealing with these data is usually intractable due to the high computational load and the curse of dimensionality. In recent years, researchers have realized that in many applications the samples of interest are actually confined to a low-dimensional manifold embedded in the high dimensional feature space [1, 2]. This intrinsic structure bears a great amount of information hence algorithms that can effectively explore and exploit this structure is highly desired to facilitate the analysis and learning of data. Consequently, many dimensionality reduction methods have been developed to characterize data manifolds, such as Locally Linear Embedding [2], Laplacian Eigenmaps [3] and Maximum Variance Unfolding [4]. However, all these algorithms are unsupervised, that is, no prior knowledge is used to guide the dimensionality reduction process. As a result, the low-dimensional representations of data achieved by these methods usually failed to explicitly reflect samples’ parameters, which is often of central interest during analysis and learning (For example, the pose parameters for head images). Fortunately, provided some kind of supervised information, we are able to develop methods that can both discover manifold structures and reveal their underlying parameters. In this paper, we will focus on the problem of correspondence learning on manifolds, which is also called manifold alignment. More concretely, assuming we are given some data sets sharing the same underlying manifold, we seek to learn the correspondences between samples from different data sets (e.g. Finding different persons’ face

2

images with the same pose). Besides its usage in data analysis and visualization, this problem also have wide potential applications in various fields. For instance, in facial expression recognition, one may have a set of standard labeled images with known expressions, such as happiness, sadness, surprise, anger and fear, of a particular person. Then we can recognize the expressions of another person just by aligning his/her facial images to the standard image set. Its application can also be found directly in pose estimation. One can refer to [5] for more details.

Fig. 1. An example of data sharing the same manifold. Above is facial expression images from the JAFFE data set [6]. The top and bottom rows show selected pairs with the same underlying facial expression parameters.

There have already been some traditional methods proposed to align manifolds in a semi-supervised way [7, 8, 5, 9, 10]. Specifically, they usually assumed that some pairwise correspondences of samples in different data sets were already known, and then those information would be used to guide the alignment. However, in practice it might be difficult to obtain and use such information since: 1. The sizes of data sets can be very large, then finding high-quality correspondences between them can be very time consuming and even intractable. 2. There may exist some ambiguities in the images (see figure 2 for an example), which makes explicit matching a hard task. Brutally determine and enforce these unreliable constraints may lead to poor results; 3. Sometimes it may be hard to find the exact correspondences when the available samples are scarce. This situation may happen when the data source is restricted and users are only allowed to access a small subset, or at the early stage of an experiment when samples are still to be collected.

(a)

(b)

(c)

Fig. 2. Three facial expression images. (a) and (c) is surprised and (b) is neutral. It is hard to make a confident decision that (a) and (c) possess the same expression parameters. However, it is obvious that (c) is more similar to (a) than (b) is.

3

To solve the above problems, we propose to apply another type of supervised information to guide the process of manifold alignment. In particular, we consider a relative and qualitative supervision of the form “A is closer to B than A is to C”. We believe that this type of information is more easily available in practice than traditional correspondence-based information. With the help of such information, we show that the manifold alignment problemcan be formulated as a Quadratically Constrained Quadratic Programming (QCQP) [11] problem. To make the optimization tractable, we further relax it to a Semi-Definite Programming (SDP) [11] problem, which can be readily solved by popular software packages such as Sedumi [12]. Besides, under this formulation we are able to incorporate both relative relations and correspondences to align manifolds in a very flexible way. Finally experimental results on aligning several different types of manifolds are presented to show the effectiveness of our method. The rest of this paper is organized as follows. Section 2 will introduce some basic notations and related works, the detailed algorithm will be presented in section 3. The experimental results will be provided in section 4, followed by the discussions in section 5 and conclusions in section 6.

2 Notations and Related Works As stated in the introduction, in this paper we will study the problem of aligning different data sets that are characterized by the same underlying manifold. For the convenience of presentation, first let us consider the case of two data sets. More formally, let X and Y be two data sets in high-dimensional vector spaces X = {x1 , x2 , · · · , xN } ⊂ RDx , Y = {y1 , y2 , · · · , yN } ⊂ RDy

(1)

with Dx , Dy À 1. When the data lie close to a low-dimensional manifold embedded in a high dimensional Euclidean space, the manifold learning algorithms such as Laplacian eigenmaps [3] can effectively learn the low-dimensional embeddings by constructing an undirected weighted graph that captures the local structure of data. For example, for data set X , we can construct a graph GX = (VX , EX ), where VX = X corresponds to the vertices of GX , and EX represents the edge set of GX . Generally there is a nonnegative weight Wij associated with each edge eij ∈ EX , and we can aggregate all the edge weights to form an N × N weight matrix WX with its (i, j)-th entry WX (i, j) = Wij . The degree matrix DP X is an N × N diagonal matrix with the i-th entry on its diagonal line DX (i, i) = j Wij , and the combinatorial graph Laplacian is defined as an N × N matrix LX = DX − WX . The low-dimensional embeddings of the data in X , say F = [f1 , f2 , · · · , fN ] ∈ RD×N (D ¿ DX ) can be achieved by minimizing the following criterion under certain constraints (e.g. scale and translational invariances) SX = tr(FLX FT ),

(2)

where tr(·) represents the trace of a matrix. According to [3], SX measures the smoothness of the low-dimensional embeddings of X over the its underlying manifold.

4

Similarly, we can also define a graph GY = (VY , EY ) for data set Y with its combinatorial graph Laplacian LY = DY − WY . Then the low-dimensional embeddings of Y, say G = [g1 , g2 , · · · , gN ] ∈ RD×N can also be achieved by minimizing SY = tr(GLY G)T under certain constraints, and we can minimize the following combined smoothness criterion to achieve the common embeddings of both X and Y S = tr(FLX FT ) + tr(GLY GT ).

(3)

Now let’s return to our manifold alignment problem. Assuming that we have known some pairwise correspondences {xi , yi }li=1 between X and Y, we can embed X and Y into a common low-dimensional space by minimizing [8] J =µ

Xl i=1

kfi − gi k2 + tr(FLX FT ) + tr(GLY GT ),

(4)

where µ is a regularization parameter to tradeoff the embedding smoothness and the matching precision. When µ = ∞, the pairwise correspondences will become hard constraints which impose fi = gi after embedding [7, 10]. Finally the matched sample for xi ∈ X is yi ∈ Y with the minimum Euclidean distance to in the embedded space. However, as we explained in the introduction, sometimes it may be difficult to obtain the pairwise correspondences. Hence we propose a novel scheme for manifold alignment in this paper, which is based on relative comparisons among the data points.

3 Manifold Alignment via Semi-definite Programming In this section we will introduce our Semi-Definite Manifold Alignment (SDMA) algorithm in detail. First let’s see the objective and problem formulation. 3.1 The Quadratic Formulation As we have stated in section 2, the manifold alignment process is composed of two steps: (1) embedding the data points into a common low-dimensional space with the guidance of some prior knowledge; (2) finding the one-to-one correspondences based on the Euclidean distances in the embedding space. The main challenge is the first step, i.e., how to effectively embed the data from different data sets into a common lowdimensional space. Co-Embedding without Prior Knowledge Following [8], we also adopt the graph based criterion as our optimization objective. We construct weighted undirected graphs GX , GY for data sets X and Y respectively, and then seek a embedding which minimize Eq.(3). To avoid the illness of the optimization problem, we further impose the scale and translational invariance constraints tr(FFT ) = 1, tr(GGT ) = 1, Fe = 0, Ge = 0,

(5)

5

to the optimization objective. Then the co-embedding problem can be formulated as minF,G tr(FLX FT ) + tr(GLY GT ) s.t. tr(FFT ) = 1, tr(GGT ) = 1 Fe = 0, Ge = 0

(6)

which is a co-dimensionality reduction problem without any prior knowledge about the relationship between X and Y. Another issue that should be addressed here is the construction of the Laplacian matrices LX and LY . A common choice is to first compute the weight matrices by the Gaussian functions, i.e. the similarity between xi and xj can be computed by ¢ ¡ WX (i, j) = exp −βkxi − xj k2 , (7) and then calculate the Laplacian matrices in their standard ways. However, a problem is that the free parameter β is usually set empirically and its value may affect the final results significantly [13]. Therefore we propose to use the iterated Laplacian here [3]. Mathematically, the iterated Laplacian for data set X is defined as MX = (I − QX )T (I − QX ),

(8)

where QX is an N × N square matrix with its (i, j)-th entry QX (i, j) = qij being optimized by ° °2 X ° ° ° minqij ° x − q x ij j ° ° i xj ∈N (xi ) X s.t. qij = 1, (9) j

where N (xi ) is the neighborhood of xi (e.g. k-nearest neighborhood or ε-ball neighborhood), and for xj ∈ / N (xi ), QX (i, j) = 0. Similarly, we can define the iterated Laplacian MY for data set Y. Manifold Alignment by Incorporating the Prior Knowledge Now let’s take the relative comparison based constraints into account. As we have introduced in section 2, the knowledge “yi is closer to xj than xk ” can be translated into the relative distance constraint kgi − fj k2 ≤ kgi − fk k2 (10) in the embedded space. In the rest of this paper, for notational convenience, we will denote the constraint shown in Eq.(10) as an ordered 3-tuple tc = {yi , xj , xk }. We use T = {tc }C c=1 to denote the set of constraints. By incorporating those constraints, our optimization problem can be formulated as minF,G tr(FMX FT ) + tr(GMY GT ) s.t. ∀{yi , xj , xk } ∈ T , kgi − fj k2 ≤ kgi − fk k2 tr(FFT ) = 1, tr(GGT ) = 1, Fe = 0, Ge = 0

(11)

6

· Let H = [F, G], M =

¸ MX 0 , then Eq.(11) can be simplified to 0 MY

minH tr(HMHT ) s.t. ∀{yi , xj , xk } ∈ T , khi+N − hj k2 ≤ khi+N − hk k2 tr(HF HTF ) = 1, tr(HG HTG ) = 1 HF e = 0, HG e = 0,

(12)

where HF and HF are the sub-matrices of H corresponding to F and G. Now we have formulated our tuple-constrained optimization as a Quadratically Constrained Quadratic Programming (QCQP) [11] problem. However, since the relative distance constraints in Eq.(12) is not convex, then 1) computationally the solution is difficult to derive and 2) the solution is trapped in local minima. Therefore, a reformulation is needed to make this problem tractable. 3.2 A Semi-Definite Approach In this section we will present the details on how to relax the QCQP problem Eq.(12) to a SDP problem. Note that

and

khi+N − hj k2 ≤ khi+N − hk k2 ⇐⇒ −2hTi+N hj + 2hTi+N hk + hTj hj − hTk hk ≤ 0

(13)

tr(HMHT ) = tr(MHT H).

(14)

These two facts motivate us to relax the problem and deal with the Gram matrix instead of manipulating the data coordinates directly. The Gram matrix of data is K = HT H with its (p, q)-th entry Kpq = hT p hq , which can be divided into four blocks as ·

¸ · FF FG ¸ FT F F T G K K K= = . T T GF G FG G K KGG

(15)

Using K, we are able to convert the formulas in Eq.(12) into linear forms as follows: – The objective function is now min tr(MK). K

(16)

– The relative distance constraints is ∀{yi , xj , xk } ∈ T , −2Ki+N,j + 2Ki+N,k + Kj,j − Kk,k ≤ 0

(17)

– The scale invariance can be achieved by constraining the traces of diagonal blocks of K i.e. 1 = trace(FFT ) = trace(KF F ), 1 = trace(GGT ) = trace(KGG ).

(18) (19)

7

– The translation invariance is achieved by constraints X X F KF KGG i,j = 0, i,j = 0. i,j

(20)

i,j

To see this, consider the following fact for F (and similar for G) ¯X ¯2 X X X ¯ ¯ 0= fi ⇔ 0 = ¯ fi ¯ = fiT fj = i

i

i,j

i,j

F KF ij

(21)

Finally, to be a valid Gram matrix, K must be positive semi-definite, resulting that K º 0.

(22)

Combining Eq.(16) to Eq.(22), we can write our new optimization problem as minK tr(MK) s.t. ∀{yi , xj , xk } ∈ T , −2Ki+N,j + 2Ki+N,k + Kj,j − Kk,k ≤ 0 trace(KF F ) = 1, trace(KGG ) = 1, X X F KF KGG i,j = 0, i,j = 0, i,j

i,j

K º 0.

(23)

In addition, to avoid the case where the feasible set is empty and encourage the influence of prior knowledge, we relax the constraints by introducing the slack variables E = {εc }C c=1 and reformulate the optimization problem as follows minK,ε tr(MK) + α

C X

εc

c=1

s.t.

∀{yi , xj , xk } ∈ T , −2Ki+N,j + 2Ki+N,k + Kj,j − Kk,k ≤ εc trace(KF F ) = 1, trace(KGG ) = 1, X X F KF KGG i,j = 0, i,j = 0, i,j

K º 0, ∀εc ∈ E, εc ≤ 0,

i,j

(24)

where α is a parameter to balance the data’s inherent structure and the supervised information. When α is small, the embedding result is dominated by the manifold structure, otherwise, the prior knowledge (i.e., the distance relation constraints) plays a more important role. The constraint ε ≤ 0 can be removed if we allow the relative distance relations to be violated. Since Eq.(24) is a Semi-Definite Programming (SDP) problem [11], we call our method Semi-Definite Manifold Alignment (SDMA). Clearly, Eq.(24) is convex and thus is free of local minima. Besides, various software packages are available for efficient solutions, and we have preferred the Sedumi [12] package in this paper. When the Gram matrix K is solved, the embedded coordinates F, G can be recovered from KF F and KGG ’s dominant eigenvectors. The number of embedded dimensions can be determined either by prior knowledge or the eigenstrucutre of K.

8

Finally, we emphasize that SDMA can serve as a very flexible framework for manifold alignment and embedding. More concretely, Eq.(24) can be generalized (or degenerated) in the following ways. 1) Flexible supervision. First, the form of tuple constraints can be changed from tc = {yi , xj , xk } to tc = {hi , hj , hk }. This means that we do not have to specify which manifold the samples are chosen from. In fact, SDMA accepts relative distance constraints between any three samples (e.g. all from the same manifold, or from 3 different manifolds). Moreover, although we only present how to align manifolds based on relative comparisons, our formulation is also able to incorporate the traditional correspondence information by adding constraints “Ki,i = Ki,j = Kj,j ”. 2) Multi-manifold alignment. This can be done straightforwardly by adding more manifold components into H and M along with corresponding constraints. 3) Semisupervised embedding. When there is only one manifold component, SDMA provides a way to embed it with the guide of flexible supervisions. 3.3 Correspondence Acquisition Now the only remaining problem is how to get the pairwise correspondences (i.e., align the manifolds) using the low-dimensional embeddings. More formally, given a sample xi ∈ X , we want to find its corresponding yi ∈ Y. If we are finding a closest match, a straightforward choice is to find xi ’s nearest neighbor within Y in the embedded space, i.e., j = arg mink d(xi , xk ), where d(xi , xk ) = kfi − gk k. On the other hand, if we are looking for a bijective mapping between two data sets i.e., if we constrain that each sample from one set is coupled with one and only one sample from the other set, the problem can be solved by minimize the total distance between matched pairs. Specifically, we find matches by X arg min d(xi , yj ), (25) (xi ,yj )∈P

P

where P = {(xi , yj )|xi and yj is coupled}. (25) is an NP-hard problem. Thus, again, we do relaxation to make it tractable. Eq.(25) can be written as X min{Cij } Cij d(xi , yj ) i,j X X s.t. Cij = 1, Cij = 1, i

Cij >= 0,

j

(26)

where Cij ∈ {0, 1}, Cij = 1 if (xi , yj ) ∈ P, and Cij = 0 otherwise. To relax, we use continuous Cij as fuzzy indicators. The larger Cij is, the more probable that xi and yj is coupled. Then (26) becomes a linear programming (LP) [11]. When {Cij } are obtained, the match yj for xi is found by solving arg maxj Cij .

4 Experiments In this section, we will demonstrate the performance of our SDMA algorithm on several real world data sets. First let’s see the basic information of those data sets.

9

4.1 Data and Settings – COIL-20 [14]: This data set contains images of 20 objects. Each object has 72 images from different view points. The underlying parameter is the view point angle. – Head pose [15]. This data set contains head images of 15 sets, each of which has 2 series of 93 images of the same person at 93 different poses. In we use a subset in which the horizontal pose angles vary from −90o to 90o and the vertical pose angles very from −30o to 30o . The underlying parameters are the horizontal and vertical pose angle of the head. – Facial expression [6]. This data set contains 213 images of 7 facial expressions posed by 10 Japanese female models. The actual underlying parameter is unknown. The relative distance constraints are obtained as follows. First, samples are drawn randomly to form a tuple T = {yi , xj , xk }, which means that “yi is more similar to xj than xk ”. Then a user, or a computer, will judge if this tuple is valid. Finally, valid tuples are collected into T as the constraints. Since only “yes/no” questions are involved, this procedure is very easy for the users compared to the task of searching for correspondence. Specifically, T is determined as follows. For COIL data, we use the difference of view angle to determine the similarity. For the head pose data, similarity is determined by the sum of horizontal and vertical angle differences. For the facial expression data, if yi and xj have the same expression that is different from xj , then T is valid. This strategy gives a conservative yet reliable supervision. In SDMA, the parameter α tunes the influence of relative distance constraints. In our experiments it is chosen manually from the grid {10−5 , · · · , 10−1 , 1}. After coembedding, we find correspondences by solving Eq.(26). For all the data sets, the graphs are constructed using Eq.(8). All the results are obtained using only relative distance constraints. 4.2 Results First we align the COIL data. We construct the graph using neighborhood size 4, and 60 tuples are provided to the algorithm. α is set to 10−4 . 144 images of 2 objects are embedded onto a 2-D plane. The pure co-embedding by solving Eq.(6)) and alignment results are both shown in figure 3. By comparison we can see that the embedded coordinates by co-embedding are not related to the manifold’s underlying parameters directly, while by alignment we can infer those parameters based on samples’ positions. In addition, we also demonstrate how SDMA aligns multiple manifolds in figure 4, where 216 images of 3 objects are embedded onto a 2-D plane. We use 150 tuples for supervision with other settings unchanged. It is show that SDMA can also finds most of the correspondences correctly. Figure 5 shows the experimental result of SDMA on head pose data. We construct the graph using neighborhood size 7, and use 500 tuples, α = 10−3 . 130 samples from 2 subjects are embedded onto a 2-D plane. It can be seen that both of the underlying manifold parameters are successfully captured and aligned. Figure 6 shows the alignment of facial expression data. The graph is constructed with neighborhood size 5, and 50 tuples are used. Since its manifold structure is not

10

Fig. 3. Alignment of COIL data. (a) is the result of co-embedding. (b) is the alignment by SDMA. Lines indicate the true correspondences. Lines in (a) are skew, while in (b) they are nearly straight which implies that correspondance are found. (c) shows some samples of matched images by SDMA. D

E

F

Fig. 4. Alignment of 3 manifolds. (a) shows the true correspondences and (b) shows the matched pairs found by SDMA. (c) shows some samples of matched image.

evident, we set α = 10−1 to strengthen the influence of relative distance constraints. 40 samples are embedded onto a 2-D plane since only two eigenvalues of K are not zero. D

E

F

Fig. 5. Alignment of head pose images. (a) and (b) shows the embedding by SDMA and the correspondences found. Points are colored according to horizontal angles in (a) and vertical angles in (b). (c) shows some samples of matched pairs.

5 Discussions In machine learning problems, we usually need supervisions on individual samples like “x = a, y = b”. However, this information may be difficult to obtain in some situations. To slack, the ordinal approaches (e.g. [16]) only require binary order relations like “x > y”. Further as a generalization, methods using relative comparison information are introduced1 . These methods enable us to supervise learning problems in a more flexible way. 1

Order relations is a special form of relative comparisons, considering that the relation “x > y” is equivalent to “|x − r| > |y − r|” where r is the reference point at the negative infinity.

11 D

E

F

Fig. 6. Alignment of facial expression images. (a) and (b) shows the embedding, with points colored according to the true expressions. (a) shows the true correspondences, while (b) shows those found by SDMA. (c) shows some matched pairs.

The idea of learning with relative comparisons has also been used in other problems. [17] treat relative distance relations as the character of data, and use AdaBoost to seek for an embedding where this character is preserved. However, they did not utilize the data’s intrinsic structure. [18] and [19] propose to learn distance p metrics from relative comparisons. They both seek a distance measure d(x, y) = (x − y)T A(x − y) and use the relative relations to constrain the feasible region of A through mathematical programming. In spirit, our method is similar to [18]. They learn a distance measure that preserves global distance relations, and we learn an embedding that preserves local manifold structures. SDMA are closely related to kernel methods. By the semi-definite relaxation, we first derive the data’s Gram matrix (a.k.a. Kernel matrix), and then calculate the lowdimensional coordinates by eigen-decomposition. This procedure is similar to kernel principal component analysis (KPCA) [20] except that our kernel matrix is learnt by aligning manifolds. Therefore, SDMA can be considered a kernel learning method. From this perspective, SDMA is similar to [21]. The difference is that they only use a single manifold’s structure, while we exploit multiple manifolds and their correspondences. In SDMA, finding correspondent pairs is generally more difficult than in traditional methods, since relative distance constraints are “weaker” than the correspondence constraints, which are directly imposed on the coordinates. Thus the embedded coordinates by SDMA is not very accurate if the number of constraints is small. To achieve similar performances, more relative distance constraints are required, or we need to add a few correspondence constraints. Nevertheless, we believe that this cost is acceptable considering this type of constraint’s wide availability and applicability. One drawback of SDMA is that its computational cost is high when dealing with large data sets. Although the semi-definite relaxation makes the problem tractable, it inevitably increases the number of variables. In the future we are aiming at finding more efficient solutions.

6 Conclusion Traditional align algorithms rely on the knowledge of high-quality pairwise correspondence, which is difficult to acquire in many situations. In this paper, we study a new way of aligning manifold based on the smoothness on graphs. To achieve maximum applicability and minimum user effort, we introduce the novel relative distance constraint

12

to guide the alignment. Alignment using this type of prior knowledge is first formulized as a quadratically constrained quadratic programming (QCQP) problem. Further, by manipulating the Gram matrix of data instead of the coordinates, we relax this problem to a semi-definite programming (SDP) problem, which can be solved readily. Besides, we show that this semi-definite formulation can serve as a general framework for semisupervised manifold alignment and embedding. Experiments on aligning various data demonstrate the effectiveness of our method.

References 1. Seung, H.S., Lee, D.D.: The manifold ways of perception. Science 290 (2000) 2268–2269 2. Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290 (2000) 2323–2326 3. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15 (2003) 1373–1396 4. Weinberger, K.Q., Saul, L.K.: Unsupervised learning of image manifolds by semidefinite programming. International Journal of Computer Vision 70(1) (2006) 77–90 5. Ham, J., Ahn, I., Lee, D.: Learning a manifold-constrained map between image sets: Applications to matching and pose estimation. In: CVPR-06. (2006) 6. Lyons, M.J., Kamachi, M., Gyoba, J., Akamatsu, S.: Coding facial expressions with gabor wavelets. In: Procedings of the 3rd IEEE Aut. Face and Gesture Recog. (1998) 7. Ham, J., Lee, D., Saul, L.: Learning high dimensional correspondence from low dimensional manifolds. In: Workshop on The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, ICML-03. (2003) 8. Ham, J., Lee, D., Saul, L.: Semisupervised alignment of manifolds. In: Proceedings of the 8th International Workshop on Artificial Intelligence and Statics (AISTATS 2005). (2005) 9. Verbeek, J., Roweis, S., Vlassis, N.: Non-linear cca and pca by alignment of local models. In: Advances in NIPS-04. (2004) 10. Verbeek, J., Vlassis, N.: Gaussian fields for semi-supervised regression and correspondence learning. Pattern Recognition 39(10) (2006) 1864–1875 11. Boyd, S.P., Vandenberghe, L.: Convex Optimization. Cambridge, UK (2004) 12. Sturm, J.F.: Using sedumi 1.02, a matlab toolbox for optimization overy symmetric cones. Optimization Methods and Software 11-12 (1999) 625C653 13. Wang, F., Zhang, C.: Label propagation through linear neighborhoods. In: ICML-06. (2006) 14. Nene, S.A., Nayar, S.K., Murase, H.: Columbia object image library (coil-20). Technical report, Technical Report CUCS-005-96 (1996) 15. N. Gourier, D. Hall, J.L.C.: Estimating face orientation from robust detection of salient facial features. In: Proceedings of Pointing 2004, ICPR, International Workshop on Visual Observation of Deictic Gestures, Cambridge, UK. (2004) 16. Cohen, W.W., Schapire, R.E., Singer, Y.: Learning to order things. In: Advances in NIPS-98. (1998) 17. Athitsos, V., Alon, J., Sclaroff, S., Kollios, G.: Boostmap: A method for efficient approximate similarity rankings. In: CVPR-04. (2004) 18. Schultz, M., Joachims, T.: Learning a distance metric from relative comparisons. In: Advances in NIPS-03. (2003) 19. Rosales, R., Fung, G.: Learning sparse metrics via linear programming. In: KDD-06. (2006) 20. Bernhard Sch¨olkopf, Alexander Smola, K.R.M.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10 (1998) 1299–1319 21. Weinberger, K., Fei, S., Saul, L.K.: Learning a kernel matrix for nonlinear dimensionality reduction. In: ICML-04. (2004)