Gray-Scale Morphological Associative Memories - Unicamp

10 downloads 0 Views 2MB Size Report
Peter Sussner and Marcos Eduardo Valle. Abstract—Neural models of associative memories are usually concerned with the storage and the retrieval of binary ...
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 17, NO. 3, MAY 2006

559

Gray-Scale Morphological Associative Memories Peter Sussner and Marcos Eduardo Valle

Abstract—Neural models of associative memories are usually concerned with the storage and the retrieval of binary or bipolar patterns. Thus far, the emphasis in research on morphological associative memory systems has been on binary models, although a number of notable features of autoassociative morphological memories (AMMs) such as optimal absolute storage capacity and one-step convergence have been shown to hold in the general, gray-scale setting. In this paper, we make extensive use of minimax algebra to analyze gray-scale autoassociative morphological memories. Specifically, we provide a complete characterization of the fixed points and basins of attractions which allows us to describe the storage and recall mechanisms of gray-scale AMMs. Computer simulations using gray-scale images illustrate our rigorous mathematical results on the storage capacity and the noise tolerance of gray-scale morphological associative memories (MAMs). Finally, we introduce a modified gray-scale AMM model that yields a fixed point which is closest to the input pattern with respect to the Chebyshev distance and show how gray-scale AMMs can be used as classifiers. Index Terms—Basin of attraction, fixed point, gray-scale morphological associative memory, minimax algebra, morphological neural network.

I. INTRODUCTION

I

N RECENT years, several researchers have attempted to devise effective associative memory models for the storage and retrieval of gray-scale images in the presence of noise. To this end, binary models have been generalized to multistate models. Suppose the goal is to recall -dimensional memory vectors (images) with gray levels. An obvious attempt to accomplish this goal consists in replacing the conventional bistate activation function with an -stage quantizer. Sophisticated and computationally expensive design procedures have been proposed to carve the desired equilibrium points into the energy landscape [1]–[3]. Another, similar approach is based on complex-valued neural networks and employs the complex-signum activation function [4]. For a complex input value , this -stage phase quantizer yields the value located on the complex unit circle if . The network and only if is endowed with complex weights. Upon presentation to the with entries in network, an integral-valued input pattern is first transformed into a complex-valued that maps pattern by means of a vector-valued function to . Once the network has converged to an equilibrium point with entries in Manuscript received March 11, 2005; revised September 16, 2005. This work was supported in part by Brazilian National Science Foundation (CNPq) under Grants 303362/03-0 and 142196/03-7. The authors are with the Institute of Mathematics, Statistics, and Scientific Computation, State University of Campinas, Campinas, CEP13081-970, São Paulo, Brazil (e-mail: [email protected]; [email protected]). Digital Object Identifier 10.1109/TNN.2006.873280

, the transformation is applied where to obtain an integral-valued pattern. Instead of synthesizing the weight matrix using standard Hebbian learning, Müezzinoˇglu et al. convert the problem of determining appropriate weights into an optimization problem [5]. The desired weight matrix can be found as a solution of a certain system of inequalities. then -dimensional vectors with values If can also be stored using a binary neural in neurons. Note however that this approach network with of interconnections. Therefore, leads to the huge number Costantini et al. suggest using independent binary neural networks [6]. This approach reduces the number of intercon. Moreover, this approach takes advantage of nections to the robustness of binary neural networks with respect to noise and to inaccuracies in implementation and of the availability of proven learning algorithms. Specifically, Costantini et al. opt for the brain-state-in-a-box (BSB) model together with a certain iterative learning algorithm for the weights. In contrast to the models cited above, gray-scale morphological associative memories (MAMs) were not conceived as generalizations of the corresponding binary models. Instead, the MAM model has been proposed from the outset as an associative memory model for the storage and the recall of real-valued patterns. The MAM model belongs to the class of morphological neural networks. In this setting, computing the next state of a neuron or performing the next layer computation involves the nonlinear operation of adding neural values and their synaptic strengths and taking the maximum or the minimum of the results (“additive maximum” or “additive minimum”). This operation can be viewed as a nonlinear matrix-vector product in the mathematical theory of minimax algebra [7], [8] as well as a nonlinear convolution-type operation that corresponds to an erosion or a dilation of mathematical morphology [9], [10]. Applications of morphological and hybrid morphological/linear neural nets include automatic target recognition, land mine detection, handwritten character recognition, control of vehicle suspension, and prediction of financial markets [11]–[14]. MAMs [15] that are discussed in this paper have been applied to the problems of face-localization, self-localization, and hyperspectral image analysis by Raducanu, Graña et al. [16], [17]. Although the focus in research on MAMs has been on the binary autoassociative case, a number of notable features of autoassociative morphological memories (AMMs) such as optimal absolute storage capacity and one-step convergence have been shown to hold in the general case for real-valued patterns [15]. More importantly, these results remain valid for integer-valued patterns since MAMs can be applied in this setting without any roundoff errors. As an additional advantage, MAMs can be very easily implemented in hardware [18]. The learning rule suggested for MAMs is a very simple minimax algebra analogue

1045-9227/$20.00 © 2006 IEEE

560

of correlation recording. Thus, the information on the fundamental memory associations is well distributed over the weights in contrast to models such as the exponential capacity associative memory (ECAM) and the kernel associative memory (KAM) [19], [20]. The latter models implicitly compare the input pattern to each one of the original patterns and the information on each fundamental memory association is located in different column vectors of the weight matrix. Moreover, the generation of the MAM weight matrix does not cause any roundoff errors unlike linear and semi-linear models such as the optimal linear associative memory (OLAM) [21] and the KAM that require calculating a pseudoinverse matrix or solving a number of least-squares problems [22]. On the downside, MAMs have a large number of spurious memories and a limited error correction capability [23]. We have previously characterized the fixed points and basins of attraction of binary AMMs [24], [23]. These results have led to a number of new binary models that exhibit a reduced number of spurious memories and an improved tolerance with respect to noise [25]. Gray-scale AMMs have not yet been thoroughly analyzed. In this paper, we make extensive use of theorems from the mathematical theory of minimax algebra [8], [7] to describe the fixed points and the basins of attraction of real and integer valued AMMs. To this end, we introduce the notions of eigenvalues and eigenvectors in minimax algebra since a fixed point of an AMM can be viewed as an eigenvector with corresponding eigenvalue 0. We will show that the set of fixed points of an AMM is given by the set of all linear combinations of the “fundamental” eigenvectors. In analogy to the binary case, we will prove that an input pattern is attracted to the supremum, the infimum, respectively, in the set of fixed points. Using another theorem of minimax algebra, we determine the fixed point that minimizes the Chebyshev-distance to a given input pattern . We introduce a modified AMM model that yields this fixed point as an output upon presentation of the input . Finally, we confirm these exact mathematical results in a number of experiments concerning the tolerance with respect to noise using gray-scale images. Using these experiments, we compare the error correction capabilities of gray-scale AMM models and some other gray-scale models that have recently appeared in the literature. The organization of this paper is as follows. First, we present some information on the minimax algebra background. Then we provide a brief introduction to MAMs and give an overview of the most important properties of MAMs. In Sections IV, we employ some of the results stated in Section II and III to prove some new theorems providing valuable insight into the storage and recall behavior of autoassociative morphological memories for real-valued and integer-valued patterns. We also introduce an improved gray-scale AMM model which produces as an output a fixed point that minimizes the Chebyshev distance to a given input pattern. Furthermore, we compare our gray-scale AMM models with the gray-scale models of Müezzinoˇglu et al. Constantini et al., and with the OLAM and KAM models in some simulations using gray-scale images. Finally, our last experiment indicates the utility of gray-scale AMMs for classification problems. Detailed proofs of the new mathematical theorems contained in this paper can be found in the Appendix.

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 17, NO. 3, MAY 2006

II. SOME MINIMAX ALGEBRA BACKGROUND FOR MORPHOLOGICAL ASSOCIATIVE MEMORIES MAMs can be described in terms of matrix products that are defined in the mathematical theory of minimax algebra [7], [8]. The following theorems and definitions will be used in this paper to gain valuable insight about the MAMs that we introduce in Section III. We would like to point out, however, that Sections III through V are self-contained. The reader who is primarily interested in applying our AMM models may choose to skip Section II or read this section later. A. Max Algebra and Min Algebra We distinguish between max algebra and min algebra. In max , algebra, we employ the primal weighting system where the symbol “ ” denotes the operation “maximum.” In min algebra, we employ the dual weighting system where the symbol “ ” denotes “minimum.” In a general lattice algebra setting, the maximum “ ” represents the operation “join” and the minimum “ ” represents the operation “meet” and [26]. Mathematically speaking, represent linear commutative belts in minimax algebra [7]. Other important examples of linear commutative belts are given and and we can also conduct by and min algebra in . max algebra in Therefore, the results of this section remain valid in the (extended) integer valued setting. and are belts. A function Suppose that is called belt homomorphism if it is compatible with the operations, i.e., and for every . In analogy to the concept of vector space over a field in linear algebra, Cuninghame-Green defines the concept of band-space or briefly space over a belt [7]. A (left) band-space over is equipped with a left scalar product a belt . A number of axioms hold for all and for all . In particular (1) (2) For now, let us consider discrete event systems (DES) in max algebra. Later, we will show that there is a duality relation between max algebra and min algebra. Every statement in max algebra, in particular one of the statements uttered below, is mirrored in min algebra via this duality relation. represents a certain state. A DES in max A vector in algebra is governed by a forward recursion using a system ma. The next state of the DES is given by the matrix in . In general, for a matrix trix-vector product and a matrix , we define the matrix where . Note that the identity matrix has off-diagonal elements and diagonal elements 0 because is the null element and 0 is the neutral element with respect to the operation “ .” For matrices and for , the maximum and the scalar “multiplication” are performed

SUSSNER AND VALLE: GRAY-SCALE MORPHOLOGICAL ASSOCIATIVE MEMORIES

elementwise. If is a vector in , then it easily follows . that . We say that is a linear comLet be a vector in bination of the vectors , if and only , such that if there exist finite scalars . B. Graphs and Matrices in Max Algebra We will see that there is a one-to-one correspondence between and directed graphs. For convenience, we will matrices in of a matrix sometimes choose to denote the entry by the symbol . and its arc set . An A graph is given by its node set arc is an ordered pair of nodes. Hence, the arc set of a graph with nodes is a subset of . In the special case where has its full complement of arcs, we speak of a complete graph. Directed graphs are usually arc-weighted. As mentioned before, we focus on the primal weighting system in which (in the dual weighting case each weight is an element of ). In this setting, a system, each weight is an element of directed, weighted graph is a triple where is a . To save constantly repeating the adjecfunction tives “directed” and “weighted,” we will from now on simply use the word “graph” whenever we are referring to a directed, weighted graph. denotes If is a complete graph having nodes, then with . We say that the the matrix graph and the matrix are corresponding. Note that we have a bijective mapping from the set of complete graphs with nodes . Given a matrix denotes the correto sponding complete graph, i.e., the graph which corresponds to under the inverse mapping. A graph which is not complete, can be completed by adjoining the “missing” arcs, i.e., the elements of , to its arc set and by attaching weights of ( in the dual weighting system) to the new arcs. Conversely, deleting all arcs whose weights are infinite from a complete graph yields the underlying finite graph of , which . is denoted by A path in a graph is a sequence of nodes such that and for . We say that contains and has length . A graph the nodes is called strongly connected if for each ordered pair of nodes there exists a path from to . The weight of a path is given by . There are some special types of paths which are important and the last node for the rest of this paper. If the first node coincide, then the path is called a cycle. For any cycle, the cycle mean refers to the ratio of its weight divided by its length. such that for all A path is called elementary. Recall that graphs are closely related to the corresponding , the symbol denotes matrices. For a matrix which equals the greatest cycle mean of all cycles in . We the greatest cycle mean of all elementary cycles in . The term speak of a critical cycle if its cycle mean equals

561

that is contained in a criteigennode refers to a node ical cycle. In this case, we say that the index is an eigenindex. , we define , the th power of For a matrix , as the expression , where the “ ”-symbol times. Note that the operation “ ” is associative occurs and thus there is no need for bracketing. and the The (max-algebraic) weak transitive closure strong transitive closure of D are given by the infinite and formal matrix power sums , respectively. , we denote the finite matrix power “sum” For by . C. Special Types of Matrices In this section, we will study three special types of primal-weighted matrices: Finite matrices, -regular matrices, and definite matrices. These types of matrices have a number of properties which will turn out to be useful to prove results on autoassociative morphological memories since they arise as weight matrices of AMMs. We say that a matrix is finite if every row vector and every column vector has at least one finite entry. A square matrix is called -regular if . is -regular for some Theorem 1: A matrix if and only if . , the Floyd–Warshall algoGiven a square matrix . We rithm [8] constructs a sequence of matrices . For any , we compute the elements of set using the elements of the previous matrix according to the following rule: if

and

if

or

. (3)

Theorem 2: Let . If then and thus the Floyd–Warshall algorithm can be used and in to compute the transitive closure matrices steps. implies that is -regular for some . Recall that and is strongly In the special case where connected, we say that is definite [27]. D. Eigenvectors and Eigenvalues , we seek an on In the eigenproblem for which the action of the matrix is the same as that of a scalar , i.e., . In this case, we call an eigenvector of and we call the corresponding eigenvalue. We say that the eigenproblem for is finitely soluble if there exist finite such that . Theorem 3: If the eigenproblem is finitely soluble for , then the eigenvalue of is unique and equals . In the statement of the following theorems, we use the symbol to denote the th column of where . be finite and definite. For every Theorem 4: Let is a finite eigenvector of with corresponding eigenindex , eigenvalue 0.

562

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 17, NO. 3, MAY 2006

The columns of corresponding to eigenindices are called fundamental eigenvectors of . For a finite, square matrix with finitely soluble eigenproblem, the symbol denotes the set of all finite eigenvectors of . We call the eigenspace of . Theorem 5: Let be a finite, square matrix. Any linear com, where , of finite eigenvectors bination is again a finite eigenvector. then Moreover, if has a strongly connected consists precisely of all linear combinations of the fundamental eigenvectors of . E. Conjugation The theory of minimax algebra includes a theory of conjugation. The conjugate of a certain algebraic structure is denoted corresponds to an element by . An element under the isomorphism of conjugation . The element is called conjugate of . We have in particular the conjugacy of the belts and . Combining the belts and yields the algebraic structure which represents a bounded lattice ordered group or blog. The operations “ ” and “ ” act like the usual sum opwith the following exeration and are identical on ceptions: and . is self-conjugate, We say that the blog is the conjugate of the belt because the belt under the isomorphism given by if if if Note that blog. A matrix . Each entry

(4)

also represents a self-conjugate corresponds to a conjugate matrix of is given by (5)

Obviously, for all , and thus the isomoris involutive. phism of conjugation The maximum and the minimum of two matrices are per, we have formed elementwise. For matrices and

(6)

There are two types of products of matrices with entries in . For an matrix and a matrix with entries , the matrix , also called the max product from , also called the min of and , and the matrix product of and , are defined by (7) For appropriately sized matrices , we obtain and

and

with entries in (8)

Note that the second halves of (6) and (8) are the duals of the first halves. As another example for this duality relationship, the reader may find a true statement of minimax algebra in (9), as well as the corresponding dual statement in (10) [7]. The and are assumed to be appropriately sized matrices (9) (10) Finally, note that (6) and (8) imply that every statement in minimax algebra induces a dual statement which simply arises by replacing each “ ” symbol with a “ ” symbol and vice versa, and by reversing each inequality. Taking advantage of this fact, we only present and prove primal statements on gray-scale associative memories in Sections V and VI and we refrain from providing the corresponding dual statements. F. Chebyshev Approximation Approximation theory relies on certain measures of closeness of two vectors . This task can be accomplished by using the Chebyshev distance which is given by the greatest componentwise absolute difference between the vectors. The Chebyshev distance between finite vectors and is denoted and can be expressed in terms of , and the conby jugated vectors (11) Let us consider the general constrained optimization subject to . Here problem: Minimize and is a mapping from some set to . A solution to this problem will be called Chebyshev-best approximation of by (or simply Chebyshev-best solution of ) subject . In general, such a solution is not unique. to , where Given an inequality of the form and , the symbol denotes . The vector represents the greatest solution of the inequality and is called the principal solution [8]. Using the is the isotonicity of the -product, we conclude that such that satisfies closest approximation of in terms of . This fact is expressed in the following theorem. and , a ChebyshevTheorem 6: Given subject to the best solution to the approximation of by is given by and is the constraint greatest such solution. For the purposes of this paper, we need to solve the unconstrained optimization problem minimize

(12)

The following theorem provides the solution of this problem. Theorem 7: Given and , a Chebyshevbest solution to the approximation of by is given by , where is such that . III. INTRODUCTION TO MORPHOLOGICAL ASSOCIATIVE MEMORIES MAMs were originally conceived as simple matrix memories endowed with recording recipes that are similar to correlation recording [28]–[30]. Suppose that we want to record vector

SUSSNER AND VALLE: GRAY-SCALE MORPHOLOGICAL ASSOCIATIVE MEMORIES

pairs using a morphological associative whose column memory [15]. Let denote the matrix in and let denote the matrix in vectors are the vectors whose column vectors are the vectors , where . Before we introduce the recording schemes of MAMs, we would like to remind the reader of the matrix products defined in (7) and the conjugate matrix defined in (5). The first recording scheme consists of constructing an matrix as follows: (13) In other words, the entry of the matrix is given by . The second, dual scheme the equation matrix of the form consists of constructing an . If the matrix receives a vector as input, the product is formed. Dually, if the matrix receives a vector as input, the product is (i.e., , for ), we obtain formed. If the autoassociative morphological memories (AMMs) and [24], [23]. Note that the identity can be deduced from (13) and (8). Example 1:

(14) and

(15)

Note that although the number of stored patterns exceeds the length of the patterns in this example, we have perfect recall for undistorted patterns. The recall phase of AMMs can be described exactly in terms of their fixed points and their basins of attraction [24], [23]. Fixed points of AMMs are defined as follows. A vector is called a fixed point or stable state of if and only . Similarly, is a fixed point of if and if . We say that a fixed point is finite if only if . We denote the sets of finite fixed points of and only if and using the symbols and . Theorem 8 and Corollary 1 imply that the absolute storage caand is unlimited and that every output pacity of or pattern remains stable under repeated applications of [24], [15]. , the fixed points of both Theorem 8: For all and include the fundamental memories . , we have and Moreover, for every , where denotes the supremum of in the set and where denotes the infimum of of fixed points of in the set of fixed points of . As an immediate consequence expressed in the following corollary, we see that recall occurs in one step when using the or . autoassociative model . The set consists of Corollary 1: Let such that . Similarly, the set all

563

consists of all

such that . Furthermore, implies that and implies that . Example 2: Let be as in Example 1. Consider the patterns and . Calculating the max yields because equals , the supremum product . Calculating the max product yields of in which belongs to by Corollary 1. In the special case where and consist of binary patterns, and . A detailed analwe speak of binary MAMs ysis of binary MAMs can be found in a recent paper that investigates binary autoassociative as well as heteroassociative morphological memories (HMMs) [31]. The results of that paper , both and include the following fact. For yield expressions that combine with and operations. Expressions of this form are called lattice polynomials in [26]. More precisely, we adopt the following recursive def, is a lattice polynomial inition. Every , where . If and are lattice polynomials in in then and are also lattice polynomials in . IV. GRAY-SCALE AUTOASSOCIATIVE MORPHOLOGICAL MEMORIES In this section, we employ the results of Section II in order to analyze gray-scale associative memories. We begin by describing the fixed points of an autoassociative morphological in terms of max algebra. The fixed points of memory can be characterized in a similar fashion. We are particularly interested in the sets of finite fixed points of and . Note that a finite pattern is a fixed point of if and only if is an eigenvector of with corresponding eigenvalue 0. , the matrix is definite and Theorem 9: For has the unique eigenvalue 0. The set of finite fixed points of is equal to the eigenspace of . . The matrix has a zero Lemma 1: Suppose diagonal. If are fixed points of and then , and are also fixed points of . represents Recall that in particular every original pattern and . Therefore, Lemma 1 a fixed point of both implies that every lattice polynomial in multiples of the origor inal patterns remains fixed under applications of either . More precisely, we obtain the following theorem. . The set of fixed points of Theorem 10: Let includes all expressions corresponding to lattice polynomials in , where each is a multiple of some . Specifincludes the following ically, the set of fixed points of expressions: where

(16)

Theorem 10 provides sufficient conditions for fixed points of . Specifically, if can be written in the form given . Theorem 12 states that by (16) then is a fixed point of is necessarily of this form. In the proof of a fixed point of . Theorem 12, we employ the fact that

564

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 17, NO. 3, MAY 2006

TABLE I NMSEs PRODUCED BY AM MODELS IN APPLICATIONS TO INCOMPLETE PATTERNS OF FIG. 2

M

Fig. 1. and

W

W

Original images that were used in constructing the memories . Presenting the corresponding patterns as inputs to either one of or results in perfect recall.

M

Theorem 11: The following statements hold for . For all , we have . Consequently, the tranand coincide and are sitive closure matrices , i.e., . equal to , the set of the finite Theorem 12: For fixed points of consists exactly of the linear combina. Alternatively, the set tions of the columns of is given exactly by the expressions of the form where

(17)

Theorem 12 shows that every finite fixed point of is given by a join of meets in multiples of . Corollary 2 arises as an immediate consequence due to the fact that a join of meets . can be written as a meet of joins in the distributive lattice Corollary 3 combines the previous results in order to show that . The elements of this set are determined by (17) or, alternatively, by (18). . The set of the finite Corollary 2: Let consists exactly of the following: fixed points of where

(18)

Corollary 3: For , the sets and coincide. If denotes this set then consists exactly of the lattice polynomials in multiples of , where , of the form given by (17). Alternatively, the set can be characterized as the set of all lattice polynomials of the following form: where Moreover, given an arbitrary pattern and

TABLE II NMSEs PRODUCED BY AM MODELS IN APPLICATIONS TO PATTERNS THAT WERE CORRUPTED WITH PEPPER NOISE OF PROBABILITY 0.3 AND DILATIVE GAUSSIAN NOISE WITH ZERO MEAN AND VARIANCE 0.1

(19) , we have (20)

where is the supremum of in and where is the infimum of in . Corollary 3 induces necessary and sufficient conditions for the perfect recall of an original pattern . These conditions are formulated in Theorem 13. and let . The equality Theorem 13: Let holds if and only if and there is no linear combination such that . Example 3: Consider the images of size 64 64 with 256 gray levels displayed in Fig. 1 (these images represent

downsized versions of images that are contained in the database of the Computer Vision Group, University of Granada, of Spain). For each of these image, we generated a vector and length 4096. We synthesized the weight matrices of size 4096 4096, applied them to the fundamental memories, and we confirmed that perfect recall was achieved as we had pointed out in Theorem 14. We also using the OLAM, the stored the vectors KAM, the generalized BSB model of Constantini et al., and the complex-valued Hopfield net of Müezzinoˇglu et al. Example 4: In this experiment, we probed the associative memory models under consideration with incomplete patterns which arose from leaving away substantial parts of the original images. The outcome of this experiment is visualized in Fig. 2. Table I lists the resulting normalized mean square errors , (NMSEs) produced by the morphological memory the OLAM, the KAM, and the generalized BSB model of Constantini et al. for each partial image. Example 5: In another simulation, we introduced randomly generated erosive noise of probability 0.5 and intensity level 255 (“pepper noise”) into the original images. Almost perfect ( recall is achieved when using the memory in 100 experiments each with erosive noise). Corollary 3 and Theorem 13 provide some explanation for the suc. The first column of cess of this experiment concerning with the Table II compares the performance of the AMM performances of the OLAM, the KAM, the generalized BSB of Constantini et al., and the complex-valued Hopfield net of Müezzinoˇglu et al. in terms of the NMSE in 100 experiments . Fig. 3 provides for a visual for each pattern , interpretation of this experiment. Example 6: Corollary 3 and the dual version of Theorem 13 exhibits tolerance with respect indicate that the AMM to dilative noise. In order to exemplify this type of noise tolerance, we added the absolute value of Gaussian noise with zero mean and with variance 0.1 to the original patterns . We compared with the patterns that memory and the other associative were retrieved by the memory models. The second column of Table II displays the

SUSSNER AND VALLE: GRAY-SCALE MORPHOLOGICAL ASSOCIATIVE MEMORIES

565

M

Fig. 4. Top row: Original Lena image, corrupted image generated by adding . dilative Gaussian noise, and output of the morphological memory Bottom row: Corresponding recalled patterns using - from left to right - the OLAM, the generalized BSB, and the complex-valued Hopfield model.

Fig. 2. Top row: Severely incomplete versions of original face images. The following rows show—from top to bottom—the corresponding recalled , the OLAM, and the patterns using the morphological memory generalized BSB model.

W

W

Fig. 3. Top row: Original tree image, eroded version (“pepper noise”), and . Bottom row: Images that image retrieved by the morphological memory were recalled using the OLAM, the generalized BSB, and the complex-valued Hopfield model.

resulting NMSEs for each model in 100 experiments for each pattern . Fig. 4 provides for a visual interpretation of this simulation. We would like to point out that we did not conduct the experiment of Example 4 using the complex-valued Hopfield net for the following reasons. Due to computational limitations, the complex-valued Hopfield net can only store small segments of the images. In Example 4, we may have an input segment that contains no information at all, making it impossible to recover the desired image segment. We also owe the reader an explanation for the fact that Figs. 2–4 do not include the patterns that were retrieved by the KAM model. We also applied the KAM models to the corrupted patterns that we generated in Examples 4–6 but we refrained from exhibiting the results in Figs. 2–4 since the KAM model produced the constant pattern as an output in each case. This phenomenon occurs because the activity of corresponds to the similarity each hidden unit

between the input and . In Examples 4–6, the distances and exceed a certain threshold which causes between all hidden unit activations to vanish and ultimately produces the zero pattern as an output. For more details on the design of the KAM model, the complex-valued Hopfield net, and the generalized BSB model including the choice of the parameters, we refer the reader to the Appendix. Theorem 13 is concerned with the perfect recall of a fundagiven a corrupted (eroded or dilated) version mental memory of . Figs. 2–4 illustrate the almost perfect retrieval of the original face images from eroded or dilated versions. Note however that Theorem 13 cannot be applied to arbitrarily corrupted versions of the original patterns. In fact, a recipe for dealing with arbitrarily corrupted patterns using MAMs has not been found thus far. Our approach to this problem is based on the following idea. Suppose that is an arbitrarily corrupted version of . Let us than to any other pattern . Note assume that is closer to that for all . Upon presentation of an arbitrary input pattern , our new strategy will generate as an output the fixed point that is closest to . To this end, we will employ the following corollary of Theorem 7. and an arbitrary pattern Theorem 14: Given , a Chebyshev-best solution to the approximation of by an is given by , where element of and is such that . after Theorem 14 suggests the introduction of a factor . Specifically, after forming the application of the AMM , we add a constant to the output where is computed as . The next theorem shows that this modification leaves the set of fixed points unaltered. Moreover, this model preserves two of the main characteristics of and : Optimal absolute the original AMM models storage capacity and one-step convergence. Theorem 15: Consider the morphological associative memory given by (21) where . For brevity, we denote the model described by (21) using the and we denote the corresponding dual model notation

566

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 17, NO. 3, MAY 2006

TABLE III NMSEs PRODUCED BY AM MODELS IN APPLICATIONS TO PATTERNS THAT WERE CORRUPTED BY INTRODUCING VARIOUS TYPES OF NOISE WITH ZERO MEAN

Fig. 6. Top row: Original image, version containing uniformly distributed random noise (variance 0.12), and outputs of the AMM’s W  and M . Bottom row: Patterns that were recalled using the OLAM, the KAM, the generalized BSB, and the complex-valued Hopfield net.

+

+

Fig. 5. Top row: Original camera man image, version containing uniformly  distributed random noise (variance 0.03), and outputs of the AMMs W and M . Bottom row: Patterns that were recalled using the OLAM, the KAM, the generalized BSB, and the complex-valued Hopfield net.

+

+

by . The memory has the same set of fixed and . In addition, the memory points as is endowed with optimal absolute storage capacity and one-step convergence. Example 7: First, an input pattern was generated by corrupting one of the original patterns using uniformly distributed random noise with mean 0 and variance 0.03. We computed and where consists of the patterns that correspond to the original images of Fig. 1. The first column of Table III compares the performance of the morphoand with the performances logical models of other associative memory models. Fig. 5 visualizes the results of this experiment. In a second similar experiment, we corrupted the original patterns by introducing random noise with mean 0 and variance 0.12 and compared the performances of the associative memory models under consideration. The second column of Table III shows the NMSEs of these models in 100 . Fig. 6 illustrates experiments for each pattern this simulation. Example 8: Finally, we probed the morphological models and as well as the other associative memories with corrupted patterns containing Gaussian noise with mean 0 and variance 0.01. We conducted 100 experiments for each original pattern and for each model under consideration. The NMSEs listed in the third column of Table III indicate that the semi-linear models outperform the morphological models concerning tolerance with respect to Gaussian (erosive and dilative) noise. Nevertheless, Fig. 7 indicates that the AMMs and exhibit a considerable amount of error correction in this simulation—in contrast to the basic AMM and . models

Fig. 7. First two images: Original image and corrupted image containing Gaussian noise of zero mean and variance : . Remaining images in the top  and M . Bottom row: Patterns row: outputs of the AMMs W that were recalled using the OLAM, the KAM, the generalized BSB, and the complex-valued Hopfield net.

+

0 02

+

Example 9: The following experiment indicates that the and can autoassociative morphological memories be applied to -class classification tasks. Like any arbitrary model of autoassociator, an AMM can act as a one-class learning machine by storing only training patterns that belong to a certain class. represents the matrix that consists of all Suppose that training patterns belonging to class . For a given test pattern , we computed the Chebyshev distance where denotes . The smallest error indicates the class that corresponds to . Obviously, the same principle of classification can be applied using other AMM models. In a similar vein, we can employ autoassociators such as the KAM together with the euclidean distance. Let us consider a specific classification task. The glass recognition data can be found in the Repository of Machine Learning Databases of the University of California, Irvine (UCI) [32]. The data set consists of six types of glass. Each type has 70, 17, 76, 13, 9, or 27 instances. The goal is to determine the glass type from nine attributes. Zhang et al. have considered this problem in [33]. Several classifiers such as multilayer perceptrons and support vector machines were tested using two-fold cross-validation. The data to remove the scale effect, were normalized to the range and each network was fine-tuned. Table IV shows the results of the experiment. The acronym KAA-2 denotes an extension of the KAM that was introduced by H. Zhang et al. [33].

SUSSNER AND VALLE: GRAY-SCALE MORPHOLOGICAL ASSOCIATIVE MEMORIES

TABLE IV RESULTS OF THE GLASS CLASSIFICATION PROBLEM

567

APPENDIX A. Proofs of Theorems, Corollaries, and Lemmas

Table IV also displays the results obtained via applications of the morphological autoassociative memories. The modified and yield the same results. AMM models Note that the AMMs outperformed the other classifiers. Moreover, an application of the AMM model does not require any fine tuning of the network. V. CONCLUDING REMARKS This paper is the first to present a profound analysis of gray-scale autoassociative memories. Due to their mathematical foundations in minimax algebra, MAMs differ drastically from other models of associative memory. Unlike other models that have been suggested for dealing with multivalued patterns, gray-scale MAMs have not been constructed as generalizations of the corresponding binary model. All operations can be executed within the discrete domain without any roundoff errors and the model allows for a convenient implementation in hardware. In contrast to many other associative memory models that require complicated algorithms to generate the weight matrix, MAMs involve a very low computational effort in synthesizing the weight matrix and the information on the associations is well distributed over the weights. This paper provides an exact characterization of the fixed points and the basins of attraction of gray-scale AMMs in terms of the eigenvectors of the weight matrix. The set of fixed points consists exactly of all “linear” combinations of the fundamental eigenvectors. In view of these results, the recall phase—in particular the error correction capability—of gray-scale AMMs can be easily and completely understood and it becomes evident that AMMs are suited for dealing with certain types of noise such as incomplete patterns or erosive noise when using the primal model. Arbitrary and uniformly distributed noise can be partially removed by means of a gray-scale AMM yielding the fixed point that minimizes the Chebyshev distance to the noisy input pattern. We compared the noise tolerance of AMMs and some of the most relevant gray-scale models in a number of simulations with gray-scale images. Models such as the KAM, the generalized BSB of Constantini et al., and the complex-valued Hopfield net reveal many of the disadvantages that we cited above and are not as tolerant with respect to incomplete patterns and salt/pepper noise. However, these models perform better in the presence of some other types of noise such as Gaussian noise. In the future, we plan to use the exact mathematical insights on AMMs gained in this paper to develop new morphological models with improved error correction capabilities and a reduced number of fixed points. We will furthermore generalize the results of this paper to include the heteroassociative case which will allow us to derive applications of HMMs as fuzzy associative memories.

Proof of Theorem 9: By Theorem 8, the fundamental are eigenvectors with eigenvalue 0 of memories , and thus the eigenproblem is finitely soluble for . Theorem 3 ensures that the eigenvalue 0 of is unique and . These facts imply that is definite and equal to equals the set of finite fixed points of . that Proof of Lemma 1: First note that and that each one of the matrices has a also has a zero diagonal. zero diagonal. Therefore, Let be fixed points of and let . are exactly the finite By Theorem 5, the fixed points of eigenvectors with eigenvalue 0. Thus, and are elements of . By Theorem 5, the linear combinations the eigenspace of and are also finite eigenvectors of if . has the unique eigenvalue 0, and are Since for finite . If then all entries of fixed points of and all entries of are equal to . Thus, is also a fixed point of . A similar argumentation is a fixed point of . shows that , where are fixed points of Finally, consider . Let denote , let denote , and let denote . On one hand, we have for all . Thus, . On the other hand, the inequality follows from (9) and the fact that and are fixed points of . Proof of Theorem 10: The proof of the first part of Theorem 10 follows immediately from Theorem 8 and Lemma 1. where Consider a lattice polynomial in for some . Since is a distributive lattice, this lattice polynomial can be written as a join of meets, i.e., in the form given by (16) [26]. is definite. ThereProof of Theorem 11: Recall that and we can compute fore, using the Floyd–Warshall algorithm according to Theorem 2. In fact, it is easy to show that for all by induction. . Assume The Floyd–Warshall algorithm sets that for all . For simplicity, let denote . By (3), the elements of are computed as follows: if if

and or (22)

Consider . The inequality below follows from applications of the principles of opening and closing in minimax and for all , we algebra [7]. For all have . implies that In view of (22), the inequality which completes the proof of for all and .

568

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 17, NO. 3, MAY 2006

Finally, we show the identity . Note that holds because has a zero diagthe inequality onal. Therefore, . denote the set of fiProof of Theorem 12: Let nite fixed points of . Theorem 9 states that which consists precisely of all linear combinations of by Theorem 5. the fundamental eigenvectors of By definition, the fundamental eigenvectors are the columns of corresponding to eigenindices. We have already shown in Theorem 11, that . Thus, for the proof of the first part of the theorem, it only remains to show is an eigenindex. that every index We have seen in Theorem 9 that is definite and thus the greatest cycle mean of all cycles in is 0. Since has a zero diagonal by Lemma 1, every cycle is a critical one. Therefore, we have that every index is an eigenindex. Now, let us consider the second part of the theorem. On one hand, we know from Theorem 10 that every expression of the . On the other form given by (17) is contained in hand, we have already proven that , where denotes the set of all linear combinations of the columns of . Thus, it suffices to show that every element of can be written in the form given by (17). . Note that the th

Recall that

column of the matrix can be written as for some . Each element of is a linear combination of the columns of . Therefore for some

Proof of Corollary 3: On one hand, invoking the dual consists statement of Corollary 2, we obtain that of all expressions of the form given by (16) and, therefore, . On the other Theorem 10 implies that consists exactly hand, Theorem 12 states that the set of all expressions given in terms of (17). These expressions are according to the dual of Corollary 2 and contained in thus . , the next two Given the identity claims of Corollary 3 follow immediately from Theorem 12 and its dual statement. To conclude the proof of the corolis finite since lary, we apply Theorem 8. Note that and . Proof of Theorem 13: First, let us assume that . By Lemma 1 of [24], the inequality holds for all . Therefore, our assumpimplies that . Now consider tion such that there exists a linear combinaan arbitrary of the fundamental memories , tion satisfying . We infer from Corollary 3 that since contains and since the . upper bound of satisfies the inequalities The proof of the necessity direction of the theorem also fol. Suppose that there exists no lows from Corollary 3. Let of the fundamental memories such linear combination . By Corollary 3, we have where that represents the supremum of in . Therefore, the pattern can be written as in (19) and is bounded by and . Hence, such that there exist scalars

(23)

(26)

for an arbitrary Let us consider . Since is a band-space over [7], we may apply (1) and (2) yielding . Substituting with gives . Proof of Corollary 2: On one hand, Theorem 10 reveals that includes every expression given in (18). On the other hand, we will show that is contained in the set of expressions in (18). By Theorem 12, and the following set coincide:

. If denotes Consider an arbitrary index then satisfies . Thus, the equality holds due to our assumptions. Employing and substituting the the same procedure for all which concludes resulting equations into (26) yields the proof of the theorem. Proof of Theorem 14: Theorem 14 arises from an applica. tion of Theorem 7 where By Corollary 1, the set of all such that coincides with which in turn equals by Corollary 3. As before, we denote this set using the symbol . By definition, the principal solution of the inequality is given by . Using (8), we conclude and thus is contained in that . We have which implies that a Chebyshev-best solution to the approximation of by a fixed point of either or is given by where is . such that and be arProof of Theorem 15: Let . bitrary. We begin by showing the identity For every fixed point , the constant , which computes and , vanhalf of the Chebyshev-distance between . Therefore, we have ishes since . On the other hand, suppose that . We infer that . By definition, the constant

(24) of this set. Consider an arbitrary element An expression of this form represents a lattice polynomial in . Recall that every lattice polynomial in the distributive lattice can be written as a meet of joins. Using the fact that the set constitutes a chain for every , we deduce the following equation that concludes the proof of the corollary: (25) for some

and

.

SUSSNER AND VALLE: GRAY-SCALE MORPHOLOGICAL ASSOCIATIVE MEMORIES

is computed as follows: . is zero

and Thus, the Chebyshev-distance between which implies that is a fixed point of . We will now address the absolute storage capacity of . For arbitrary and , every fundamental by Thememory remains fixed under an application of orem 8. Since , every fundamental memory is a fixed point of . For the proof of one-step convergence, consider an arbitrary . Let denote the result of the application of the pattern . Using the terminology new model, i.e., of Theorem 14, the pattern represents the Chebyshev-best ap. Therefore, we have proximation of by a fixed point of which equals as we have already seen. This observation concludes the proof of the theorem. B. The Choice of the Parameters for the Associative Memory Models Under Consideration 1) Kernel Associative Memory: Recall that the KAM model employs a kernel method that maps the input data into a high-dimensional feature space by means of an adequate mapping . do not need to be computed explicThe mapped patterns between itly. It suffices to compute the dot product mapped patterns and . In this paper, we adopted the popular Gaussian kernel function and thus the dot product is given by the Gaussian radial basis function (27) The KAM disposes of hidden units corresponding to the prototype patterns . The activity of the hidden unit is computed between the input and the prototype as the distance . The resulting value which indicates the similarity between and depends on the choice of . We followed the suggestion of Stokbro et al. and computed by taking the average Euclidian distance between every pair of different RBF centers [34], [20] (28) 2) Generalized BSB Model: The approach of Costantini et al. decomposes an image with gray levels into binary images. If the usual coding strategy is used, small quantities of noise may result in large changes in bits (e.g., consider ). Therefore, we resorted to the reflected-binary or gray code that avoids these problems [6]. Each resulting binary image is then stored in a separate binary BSB network described by the following difference equation: (29) . We computed the synaptic weight where matrix following the iterative method presented in equations and . This iterative ap5 and 6 of [6] using caused the proach was introduced in [35]. The value of weight matrix to converge within two iterations.

569

3) Complex-Valued Hopfield Net: The complex-valued Hopfield associative memory is a fully connected Hopfield net whose neurons are equipped with the complex-signum activation function [4], [5]. The synaptic weight matrix was obtained following the design method proposed in [5] which consists inequalities. This design method in solving a system of carves local minima into the energy landscape at the desired locations and eliminates some of the spurious memories that would arise using Hebbian learning. We used the MATLAB command) to determine linear programming solver ( a solution of the system of inequalities mentioned above. In view of computational limitations, each image of size 64 64 with 256 gray levels was partitioned into 128 vectors of size 32. This procedure led to 128 different complex-valued Hopfield networks, each one storing four vectors of length 32. REFERENCES [1] J. Zurada, I. Cloete, and E. van der Poel, “Generalized Hopfield networks for associative memories with multi-valued stable states,” Neurocomput., vol. 13, no. 2–4, pp. 135–149, 1996. [2] B. Baird and F. Eeckman, “A Normal Form Projection Algorithm for Associative Memory,” in Associative Neural Memories, Theory and Implementation, M. Hassoun, Ed. Oxford, U.K.: Oxford University Press, 1993, ch. 7, pp. 135–166. [3] H. Rieger, “Storing an extensive number of gray-toned patterns in a neural network using multistate neurons,” J. Phys. A, Math. Gen., vol. 23, no. 23, pp. L1273–L1279, Dec. 1990. [4] S. Jankowski, A. Lozowski, and J. Zurada, “Complex-valued multi-state neural associative memory,” IEEE Transactions on Neural Netw., vol. 7, no. 6, pp. 1491–1496, 1996. [5] M. Müezzinoˇglu, C. Güzelis¸, and J. Zurada, “A new design method for the complex-valued multistate Hopfield associative memory,” IEEE Trans. Neural Netw., vol. 14, no. 4, pp. 891–899, July 2003. [6] G. Costantini, D. Casali, and R. Perfetti, “Neural associative memory storing gray-coded gray-scale images,” IEEE Trans. Neural Netw., vol. 14, no. 3, pp. 703–707, May 2003. [7] R. Cuninghame-Green, Minimax Algebra: Lecture Notes in Economics and Mathematical Systems 166. New York: Springer-Verlag, 1979. [8] , “Minimax algebra and applications,” in Advances in Imaging and Electron Physics, P. Hawkes, Ed. New York: Academic Press, 1995, vol. 90, pp. 1–121. [9] J. Davidson, “Foundation and applications of lattice transforms in image processing,” in Advances in Electronics and Electron Physics, P. Hawkes, Ed. New York: Academic Press, 1992, vol. 84, pp. 61–130. [10] H. Heijmans, Morphological Image Operators. New York: Academic Press, 1994. [11] P. D. Gader, M. Khabou, and A. Koldobsky, “Morphological regularization neural networks,” Pattern Recognit., vol. 33, no. 6, pp. 935–945, Jun. 2000. Special Issue on Mathematical Morphology and Its Applications. [12] M. Khabou and P. Gader, “Automatic target detection using entropy optimized shared-weight neural networks,” IEEE Trans. Neural Netw., vol. 11, no. 1, pp. 186–193, Jan. 2000. [13] L. Pessoa and P. Maragos, “Neural networks with hybrid morphological/rank/linear nodes: A unifying framework with applications to handwritten character recognition,” Pattern Recognit., vol. 33, pp. 945–960, Jun. 2000. [14] Handbook of Neural Computation, E. Fiesler and R. Beale, Eds., IOP Publishing and Oxford University Press, 1997, pp. C1.8:1–C1.8:14. W. Armstrong, M. Thomas, “Adaptive logic networks”. [15] G. X. Ritter, P. Sussner, and J. L. D. de Leon, “Morphological associative memories,” IEEE Trans. Neural Netw., vol. 9, no. 2, pp. 281–293, Mar. 1998. [16] B. Raducanu, M. Graña, and X. F. Albizuri, “Morphological scale spaces and associative morphological memories: Results on robustness and practical applications,” J. Math. Imaging and Vision, vol. 19, no. 2, pp. 113–131, 2003. [17] M. Graña, J. Gallego, F. J. Torrealdea, and A. D’Anjou, “On the application of associative morphological memories to hyperspectral image analysis,” Lecture Notes in Comp. Sci., vol. 2687, pp. 567–574, 2003.

570

[18] J. Stright, P. Coffield, and G. Brooks, “An analog VLSI implementation of a morphological associative memory,” in Proc. SPIE Parallel and Distributed Methods for Image Process. II, vol. 3452, San Diego, CA, July 1998. [19] T. Chiueh and R. Goodman, “Recurrent correlation associative memories,” IEEE Trans. Neural Netw., vol. 2, no. 2, pp. 275–284, Mar. 1991. [20] B.-L. Zhang, H. Zhang, and S. S. Ge, “Face recognition by applying wavelet subband representation and kernel associative memory,” IEEE Trans. Neural Netw., vol. 15, no. 1, pp. 166–177, Jan. 2004. [21] T. Kohonen, Self-Organization and Associative Memory. New York: Springer-Verlag, 1984. [22] G. Golub and C. van Loan, Matrix Computations. Baltimore, MD: John Hopkins University Press, 1996. [23] P. Sussner, “Generalizing operations of binary morphological autoassociative memories using fuzzy set theory,” J. Math. Imaging and Vision, vol. 9, no. 2, pp. 81–93, Sep. 2003. Special Issue on Morphological Neural Networks. , “Fixed points of autoassociative morphological memories,” in [24] Proc. Int. Joint Conf. Neural Networks, Como, Italy, Jul. 2000, pp. 611–616. [25] , “Associative morphological memories based on variations of the kernel and dual kernel methods,” Neural Netw., vol. 16, no. 5, pp. 625–632, July 2003. [26] G. Birkhoff, Lattice Theory, 3rd ed. Providence, RI: American Mathematical Society, 1993. [27] B.A. Carré, “An algebra for network routing problems,” J. Inst. Math. Appl., vol. 7, no. 3, pp. 273–294, 1971. [28] J. Anderson, “A simple neural network generating interactive memory,” Math. Biosci., vol. 14, pp. 197–220, Aug. 1972. [29] T. Kohonen, “Correlation matrix memory,” IEEE Trans. Comput., vol. C-21, no. 4, pp. 353–359, Apr. 1972. [30] K. Nakano, “Associatron: A model of associative memory,” IEEE Trans. Syst., Man, Cybern., vol. SMC-2, no. 3, pp. 380–388, 1972. [31] P. Sussner, “New results on binary auto- and heteroassociative morphological memories,” in Proc. Int. Joint Conf. Neural Networks, Montreal, QC, Canada, Aug. 2005, pp. 1199–1204. [32] UCI repository of machine learning databases [Online]. Available: http://www.ics.uci.edu/~mlearn/MLRepository.html [33] H. Zhang, W. Huang, Z. Huang, and B. Zhang, “A kernel autoassociator approach to pattern classification,” IEEE Trans. Syst., Man and Cybern., B, Cybern., vol. 35, no. 3, pp. 593–606, Jun. 2005.

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 17, NO. 3, MAY 2006

[34] K. Stokbro, D. Umberger, and J. Hertz, “Exploiting neurons with localized receptive fields to learn chaos,” Complex Syst., vol. 4, no. 6, pp. 603–622, 1990. [35] R. Perfetti and G. Costantini, “Multiplierless digital learning algorithm for cellular networks,” IEEE Trans. Circuits Syst. I, vol. 48, pp. 630–635, May 2001.

Peter Sussner received the Ph.D. degree in mathematics, partially supported by a Fulbright Scholarship, from the University of Florida, Gainesville, in 1996. Then, he worked at the Center of Computer Vision and Visualization at the University of Florida, as a Researcher. Currently, he is an Assistant Professor at the Department of Applied Mathematics of the State University of Campinas, Campinas, Brazil. He also is a Researcher of the Brazilian National Science Foundation (CNPq). He has regularly published articles in refereed international journals, book chapters, and conference proceedings in the areas of artificial neural networks, fuzzy systems, computer vision, mathematical imaging, and global optimization. His current research interests include neural networks, fuzzy systems, mathematical morphology, and lattice algebra. Dr. Sussner is a member of the International Neural Networks Society.

Marcos Eduardo Valle is working toward the Ph.D. degree at the Department of Applied Mathematics of the State University of Campinas, Campinas, Brazil. He holds a scholarship of Brazilian National Science Foundation (CNPq). His current research interests include fuzzy set theory, neural networks, and mathematical morphology.