Information Aﬃnity: A New Similarity Measure for Possibilistic Uncertain Information Ilyes Jenhani1 , Nahla Ben Amor1 , Zied Elouedi1 , Salem Benferhat2 , and Khaled Mellouli1 1

LARODEC, Institut Supérieur de Gestion de Tunis, Tunisia 2 CRIL, Université d’Artois, Lens, France [email protected], [email protected], [email protected], [email protected], [email protected]

Abstract. This paper addresses the issue of measuring similarity between pieces of uncertain information in the framework of possibility theory. In a ﬁrst part, natural properties of such functions are proposed and a survey of the few existing measures is presented. Then, a new measure so-called Information Aﬃnity is proposed to overcome the limits of the existing ones. The proposed function is based on two measures, namely, a classical informative distance, e.g. Manhattan distance which evaluates the diﬀerence, degree by degree, between two normalized possibility distributions and the well known inconsistency measure which assesses the conﬂict between the two possibility distributions. Some potential applications of the proposed measure are also mentioned in this paper. Keywords: Possibility theory, Similarity, Divergence measure, Distance, Inconsistency measure.

1

Introduction

Most of real-world decision problems are faced with uncertainty. Uncertainty about values of given variables (e.g. the type of a detected target in military applications, the disease aﬀecting a patient in medical applications, etc.) can result from some errors and hence from non-reliability (in the case of sensors) or from diﬀerent background knowledge (in the case of agents: doctors, etc.). As a consequence, it is possible to obtain diﬀerent uncertain pieces of information about a given value from diﬀerent sources. Obviously, comparing these pieces of information could be very interesting to support decision making. Comparing pieces of uncertain information given by several sources has attracted a lot of attention for a long time. For instance, we can mention the well-known Euclidean and KL-divergence [17] for comparing probability distributions. Another distance has been proposed by Chan and al. [4] for bounding probabilistic belief change. Moving to belief function theory [20], several distance measures between bodies of evidence deserve to be mentioned. Some distances K. Mellouli (Ed.): ECSQARU 2007, LNAI 4724, pp. 840–852, 2007. c Springer-Verlag Berlin Heidelberg 2007

Information Aﬃnity: A New Similarity Measure

841

have been proposed as measures of performance (MOP) of identiﬁcation algorithms [8] [14]. Another distance was used for the optimization of the parameters of a belief k -nearest neighbor classiﬁer [26]. In [21], the authors proposed a distance for the quantiﬁcation of errors resulting from basic probability assignment approximations. Similarity measures between two fuzzy sets A and B have been also proposed in the literature [6] [9] [23] [24]. For instance, in the work by Bouchon-Meunier and al. [3], the authors proposed a similarity measure between fuzzy sets as an extension of Tversky’s model on crisp sets [22]. The measure was then used to develop an image search engine. Contrary to probability, belief function and fuzzy set theories, few works are dedicated to distance measures in possibility theory despite its popularity. Hence, in this paper, we will focus on measures for the comparison of uncertain information represented by possibility distributions. In a ﬁrst part, we will study the few existing works and show their limits, then, we will propose a new similarity measure, so-called Information Aﬃnity which satisﬁes very natural properties. Our measure would be useful in many real-world applications where the uncertainty is modeled by means of possibility theory. For instance, it could be used as a critical parameter for distance based possibilistic machine learning algorithms, it could also be used for the evaluation of possibilistic classiﬁers, for the comparison of expert opinions, etc. The rest of the paper is organized as follows: Section 2 starts by giving the necessary background concerning possibility theory. Section 3 provides diﬀerent properties that a similarity measure should satisfy. Section 4 represents an overview of the existing similarity measures within the possibilistic setting with detailed examples and critics. The deﬁnition and the contrast of the new Information Aﬃnity measure with existing measures are proposed in Section 5. Some potential applications of the proposed measure are shown in Section 6. Finally, Section 7 concludes the paper.

2

Possibility Theory

Possibility theory represents a non-classical theory (distinct from probability theory), ﬁrst introduced by Zadeh [25] and then developed by several authors (e.g., Dubois and Prade [7]). In this section, we will give a brief recalling on possibility theory. Possibility distribution Given a universe of discourse Ω = {ω1 , ω2 , ..., ωn }, a fundamental concept of possibility theory is the possibility distribution denoted by π. π corresponds to a function which associates to each element ωi from the universe of discourse Ω a value from a bounded and linearly ordered valuation set (L,s(π3 , π4 ). Suppose that ∀j = 1..4, and ωp , ωq ∈ Ω, we have πj (ωp ) = πj (ωq ) and πj (ωq ) = πj (ωp ), hence we should obtain s(π1 , π2 )>s(π3 , π4 ).

4

Measuring Similarity of Possibilistic Uncertain Information

Measuring similarity of uncertainty based information has attracted a lot of attention in probability theory [4,17], in belief function theory [8,14,21,26], in fuzzy set theory [3,6,9,23,24] and in credal set theory [1]. This is not the case for possibilistic uncertain information, in fact, few works have been done in this direction. Let us present, chronologically, some of these measures and show their weaknesses in expressing information divergence between any given two agents (or sensors) who are expressing their opinions (or measures), especially, in the form of possibility distributions. 4.1

Information Closeness

The ﬁrst paper, especially dedicated to the problem of measuring information similarity between two possibility distributions was the one of Higashi and Klir in 1983 [11]. They proposed an information variation based measure which they called information closeness denoted by G. Function G is computed using their U -uncertainty measure [10] (Equation (2)) and it is applicable to any pair of normalized possibility distributions. The less the value of G is, the more the information are similar (G behaves as a distance measure). Definition 1. Let π1 and π2 be two possibility distributions on the same universe of discourse Ω. The information closeness G between π1 and π2 is deﬁned as: G(π1 , π2 ) = g(π1 , π1 ∨ π2 ) + g(π2 , π1 ∨ π2 ) (3) where g(πi , πj ) = U (πj ) − U (πi ). ∨ is taken as the maximum operator and U is the non-speciﬁcity measure given by Equation (2). Consequently, function G can be written as G(π1 , π2 ) = 2 ∗ U (π1 ∨ π2 ) − U (π1 ) − U (π2 ). Example 2. Consider the following distributions π1 , π2 , π3 and π4 over Ω = {ω1 , ω2 , ω3 , ω4 }: π1 [1, 0.5, 0.3, 0.7], π2 [1, 0, 0, 0], π3 [0.9, 1, 0.3, 0.7], π4 [0, 1, 0.3, 0.7]. Let us try to ﬁnd an order expressing which from the information given by π2 , π3 and π4 is closer to π1 . G(π1 , π2 ) = 1.12, G(π1 , π3 ) =

Information Aﬃnity: A New Similarity Measure

845

0.52, G(π1 , π4 ) = 1.08. According to G, π3 is the closest to π1 and π4 is closer to π1 than π2 . The dissimilarity measure G does not satisfy Property 4. In fact, G(πi , πj ) should take its maximum value ∀ πi , πj satisfying items i) to iv) (see Property 4 ). Example 3. Let us consider these distributions: π1 [1, 0, 0, 0], π2 [0, 1, 1, 1], π3 [0, 1, 0, 1] and π4 [1, 0, 1, 0] . Clearly, π1 = 1−π2 and π3 = 1−π4 . Hence, G should take its maximum value when comparing π1 and π2 as well as π3 and π4 . Nevertheless, according to G, we obtain: G(π1 , π2 ) = 2 ∗ log2(4) − log2 (3) = 2.41, G(π3 , π4 ) = 2 ∗ log2 (4) − 2 ∗ log2 (2) = 2. It means that π3 and π4 are more similar to each others than π1 and π2 are, which is contrary to what we expect: G(π1 , π2 ) should be maximal and equal to G(π3 , π4 ). 4.2

Sangüesa et al. Distance

In a work by Sangüesa et al. [19] focusing on learning possibilistic causal networks, the authors proposed a modiﬁed version of a distance measure [18] between two possibility distributions for DAG (Directed Acyclic Graph) learning and evaluation. This is done by measuring the distance (which must be minimized) between the possibility distribution implied by a DAG and the one underlying the database. This idea is based on the interpretation of independence as information similarity. Definition 2. Given two possibility distributions π1 and π2 on the same universe of discourse Ω. The distance between π1 and π2 is deﬁned as the nonspeciﬁcity of the distribution diﬀerence distance(π1 , π2 ) = U (|π1 − π2 |)

(4)

This measure gives diﬀerent results from the previous one. Example 4. If we take the same distributions π1 , π2 , π3 and π4 of Example 2, we obtain: distance(π1 , π2 ) = U ([0, 0.5, 0.3, 0.7]) = 1.27, distance(π1 , π3 ) = U ([0.1, 0.5, 0, 0]) = 1.1, distance(π1 , π4 ) = U ([1, 0.5, 0, 0]) = 0.5. Hence according to this measure, π2 remains the farthest but π4 becomes the closest to π1 . This measure has a serious problem when the distribution diﬀerence (|π1 − π2 |) is sub-normalized (which occurs most of the time). Indeed, it is in this situation that the second term of Equation (2) will be considered. If we concentrate in Equation (2), we can notice that measuring the non-speciﬁcity of a sub-normalized distribution π comes down to measure the non-speciﬁcity of its normalized distribution π s.t π (ωi ) = π(ωi ) + 1 − maxω∈Ω {π(ω)}. Obviously, this normalization scheme is not suited for the proposed distance. The following example shows this weakness: Example 5. Let us consider the following three possibility distributions: π1 [1, 0, 0, 0], π2 [1, 0, 0, 0], π3 [0, 1, 1, 1], π4 [1, 1, 0, 0]. Clearly, π2 is the closest possible distribution to π1 (the best case) while π3 is the farthest distribution (the worst case). Nevertheless, the distance measure does not agree: distance(π1 , π2 ) = U ([0, 0, 0, 0]) = 2 (maximum) distance(π1 , π3 ) = U ([1, 1, 1, 1]) = 2 (maximum)X

846

I. Jenhani et al.

distance(π1 , π4 ) = U ([0, 1, 0, 0]) = 0 (minimum) Hence, π1 and π2 are maximally distant from each other which violates Property 4. Property 3 is also violated since, according to the example, π1 and π4 are maximally similar to each other. 4.3

Information Divergence

A possibilistic analogy to the probabilistic measure of divergence was proposed by Kroupa [16]. The author has used the Choquet integral [5] as an aggregation operator of the possibility degrees characterizing the, generally, sub-normalized distribution diﬀerence (πd = |π1 (ωi ) − π2 (ωi )|, i=1..n) of any two normal distributions π1 and π2 . Definition 3. Given two possibility distributions π1 and π2 on the same universe of discourse Ω, the measure of divergence D(π1 |π2 ) is deﬁned as the discrete Choquet integral of the degrees of πd : D(π1 |π2 ) =

n

πd (ωσ(i) )[Π1 (Aσ(i) ) − Π1 (Aσ(i+1) )]

(5)

i=1

where σ is a permutation of indices such that πd (ωσ(i) ) ≤ ... ≤ πd (ωσ(n) ) and Aσ(i) = {ωσ(i) , ..., ωσ(n) }, i=1..n and Aσ(n+1) = 0. Example 6. Considering the distributions of Example 2, the application of the divergence measure gives: D(π1 |π2 ) = 0.49, D(π1 |π3 ) = 0.3, D(π1 |π4 ) = 1. Again, we obtain a diﬀerent order from Example 2 and Example 4: π3 is the closest to π1 and π4 is the farthest. Clearly, the measure D is not symmetric. Moreover, given any possibility distribution πi , the proposed information divergence measure gives the maximum divergence degree (Equal to 1) for all possibility distributions πj satisfying Inc(πi ∧ πj ) = 1, in other words, when the distribution diﬀerence πd is normalized. Hence, we can no longer discriminate between these πj ’s. Example 7 emphasizes this limit: Example 7. Let us consider the same distributions π1 and π4 of the previous example. Let us consider π5 [0, 1, 1, 1]. D(π1 |π5 ) = D(π1 |π4 ) = 1. We can conclude that this measure is not enough discriminatory since π4 appears closer to π1 than π5 was.

5

Information Aﬃnity: A New Possibilistic Similarity Measure

Considering the aforementioned weaknesses related to the existing measures of divergence between possibility distributions, we will propose a new measure that

Information Aﬃnity: A New Similarity Measure

847

overcomes these drawbacks. The proposed measure takes into account a classical informative distance along with the well known inconsistency measure. Among the classical informative distance functions (Manhattan, Euclidean, Chebyshev, Sorensen, etc.) we choose the Manhattan distance: a simple distance which, when combined with the inconsistency measure, satisﬁes the expected properties mentioned in Section 3. The choice of combining these two criteria is justiﬁed by the fact that neither the distance measure nor the inconsistency measure, taken separately, allows us to decide about the closest distribution to a given one (Example 8 emphasizes this problem). More formally, let us consider three possibility distributions π1 , π2 and π3 . Our aim is to determine which, from π2 and π3 , is closer to π1 . In the case of equal conﬂict, i.e., Inc(π1 ∧ π2 ) = Inc(π1 ∧ π3 ), it is the classical distance that will decide about the closest distribution. In the same way, when we have equal distances, i.e., d(π1 , π2 ) = d(π1 , π3 ), it is the turn of the conﬂict (inconsistency) measure to decide about the closest distribution, i.e., the less conﬂicting will be the closest. Example 8. Let us consider the following possibility distributions: π1 [1, 0, 0, 0], π2 [0.4, 1, 0.8, 0.5], π3 [0.2, 1, 1, 0.7]. If we use a classical distance measure (e.g. Manhattan distance), we obtain, d(π1 ,π2 )=d(π1 ,π3 )= 2.9 4 = 0.725. Hence, we can not decide wether π2 or π3 is closer to π1 . We can obtain similar situations even when using another distance (Euclidean, Chebyshev, etc.). Let us now consider the following possibility distributions: π1 [1, 0, 0, 0], π2 [0, 1, 0, 0], π3 [0, 1, 1, 1]. We have Inc(π1 , π2 ) = Inc(π1 , π3 ) = 1. Again, we can not decide which from π2 and π3 is closer to π1 . Definition 4. Let π1 and π2 be two possibility distributions on the same universe of discourse Ω. We deﬁne a measure InfoAﬀ(π1 , π2 ) as follows: d(π1 , π2 ) + Inc(π1 ∧ π2 ) (6) Inf oAf f (π1 , π2 ) = 1 − 2 where d(π1 , π2 ) = n1 ni=1 |π1 (ωi ) − π2 (ωi )| represents the Manhattan distance between π1 and π2 and Inc(π1 ∧ π2 ) tells us about the degree of conﬂict between the two distributions (see Equation (1)). Note that the 12 value is necessary to obtain the required range [0,1]. Two possibility distributions π1 and π2 are said to have a strong aﬃnity (resp. weak aﬃnity) if Inf oAf f (π1 , π2 ) = 1 (resp. Inf oAf f (π1 , π2 ) = 0). Proposition 1. The Inf oAf f measure satisﬁes the six properties. Proofs Property 1. Non-negativity: By deﬁnition, 0 ≤ d(a, b) ≤ 1. Moreover, 0 ≤ Inc(a, b) ≤ 1 (possibility de≤ 1 ⇒ 0 ≤ 1 − d(a,b)+Inc(a,b) ≤ 1 ⇒ grees ∈ [0,1]). ⇒ 0 ≤ d(a,b)+Inc(a,b) 2 2 Inf oAf f (a, b) ≥ 0.

848

I. Jenhani et al.

Property 2. Symmetry: Inf oAf f (b, a) = 1 −

d(b,a)+Inc(b∧a) 2

=1−

d(a,b)+Inc(a∧b) =Inf oAf f (a, b). 2

Property 3. Upper bound and Non-degeneracy: = 1 − (0+0) = 1. ∀ b = a, Inf oAf f (a, b) = Inf oAf f (a, a) = 1 − d(a,a)+Inc(a∧a) 2 2 Note that in the case of b = a, Inc(a ∧ b) could be equal to 0 but in any case we have d(a, b) = 0, consequently, Inf oAf f (a, b) could not be equal to 1. Moreover, Inf oAf f (a, b) = 1 occurs in the following two cases: Case 1: When d(a, b) = 0 and Inc(a, b) = 0, which occurs only when a = b. Case 2: When d(a, b) = −Inc(a, b) which is impossible because d(a, b) ≥ 0 and Inc(a, b) ≥ 0. Property 4. Lower bound: = 1⇔ d(a, b) + Inc(a ∧ b) = 2. Inf oAf f (a, b) = 0 ⇔ d(a,b)+Inc(a∧b) 2 Since max(d(a, b)) = 1 and max(Inc(a, b)) = 1, then obviously we have d(a, b) = 1 and Inc(a, b) = 1. These two equalities, simultaneously hold, only when a and b are maximally contradictory, i.e, when a and b simultaneously satisfy all the following conditions: i) a and b are binary possibility distributions, ii) nor a neither b could represent total ignorance, iii) a and b should be normalized and iv) b is the negation (the complement) of a (see Property 4). Property 5. Inclusion: If a is more speciﬁc than b which is in turn more speciﬁc then c, automatically, we can conclude that a, b and c are fully consistent with each others (they all share at least one state which is fully possible), i.e., Inc(a,b)=Inc(a,c)=Inc(b,c)=1. ≥ 1 − d(a,c)+1 Moreover, it is obvious to see that d(a,b)≤d(a,c). So, 1 − d(a,b)+1 2 2 ⇒ Inf oAf f (a, b) ≥ Inf oAf f (a, c). Property 6. Permutation: Suppose that we have Inf oAf f (a, b) > Inf oAf f (c, d). Hence a’, b’, c’ and d’ are possibility distributions obtained by permuting elements having the same indexes in a, b, c and d. Since we are computing d and Inc degree by degree, the pairwise permutation of the elements has no eﬀect on d and Inc. So we obtain d(a,b)=d(a’,b’) and Inc(c,d)=Inc(c’,d’) ⇒ Inf oAf f (a , b ) ≥ Inf oAf f (c , d ). Example 9. Let us revisit each one of the examples listed above and see the results given by our measure for these same examples: Examples 2, 4 and 6: π1 [1, 0.5, 0.3, 0.7], π2 [1, 0, 0, 0], π3 [0.9, 1, 0.3, 0.7], π4 [0, 1, 0.3, 0.7]. Inf oAf f (π1 , π2 ) = 0.82, Inf oAf f (π1 , π3 ) = 0.88, Inf oAf f (π1 , π4 ) = 0.66. Hence, π3 is the closest to π1 and π4 is the farthest: a diﬀerent order from the ones obtained in Example 2 and 4. Note that our measure gives the same order, for this example, as the one given by the divergence measure.

Information Aﬃnity: A New Similarity Measure

849

Example 3: π1 [1, 0, 0, 0], π2 [0, 1, 1, 1], π3 [0, 1, 0, 1], π4 [1, 0, 1, 0] . Inf oAf f (π1 , π2 ) = 0, Inf oAf f (π3 , π4 ) = 0. Inf oAf f is minimal for both cases: a diﬀerent result from the one obtained in Example 3. Example 5: π1 [1, 0, 0, 0], π2 [1, 0, 0, 0], π3 [0, 1, 1, 1]. Inf oAf f (π1 , π2 ) = 1, Inf oAf f (π1 , π3 ) = 0. Hence, π2 is the closest possible distribution to π1 and π3 represents the worst case. Again, we obtain a diﬀerent result from the one of Example 5. Still with Example 5, if we take possibility distributions π4 [0, 1, 1, 0] and π5 [0, 1, 0, 0], we obtain Inf oAf f (π1 , π4 ) = 0.125 and Inf oAf f (π1 , π5 ) = 0.25. Hence, π5 is closer to π1 than π4 . To ﬁnish, Example 8: π1 [1, 0, 0, 0], π2 [0.4, 1, 0.8, 0.5], π3 [0.2, 1, 1, 0.7]. Inf oAf f (π1 , π2 ) = 0.33, Inf oAf f (π1 , π3 ) = 0.16, ⇒ π2 is closer to π1 than π3 . If we take: π1 [1, 0, 0, 0], π2 [0, 1, 0, 0], π3 [0, 1, 1, 1]. Inf oAf f (π1 , π2 ) = 0.25, Inf oAf f (π1 , π3 ) = 0, ⇒ π2 is closer to π1 than π3 .

6

Practical Applications of Information Aﬃnity

We mention some ﬁelds in which Information Aﬃnity measure could be useful. 6.1

Machine Learning: Classification and Clustering

The proposed information aﬃnity measure could be used in many classiﬁcation and clustering algorithms, especially in those using possibility theory as a tool for dealing with existing uncertainty in the learning process [12] [13]. For instance, Inf oAf f could be used as the basis of an attribute selection measure for inducing decision trees from imprecisely labeled data. More formally, it will allow to select the attribute that, when chosen, will provide partitions of the training set containing maximally similar instances, i.e, instances having as much as possible similar possibility distributions on their classes. Still in classiﬁcation problems, Inf oAf f could be also used in most of distance based classiﬁers which are induced from imprecise data, e.g. k-nearest neighbor classiﬁers, genetic algorithms, artiﬁcial immune recognition systems, etc. Likewise, Inf oAf f could be used in possibilistic clustering [15] as the distance criterion which will allow to decide about the belonging or not of an instance to a given cluster which is characterized by a possibility distribution. 6.2

Evaluation of Possibilistic Classifiers

The use of our measure does not only comply with learning, it could also be used in the evaluation of possibilistic classiﬁers. Recall that within a possibilistic classiﬁer, the classiﬁcation result is given in the form of a possibility distribution (π res ) on the diﬀerent possible classes of the problem (Ω = {C1 , C2 , ..., Cn }). Generally, the well known percentage of correct classiﬁcation (P CC) is used to nbr _well_classif ied_inst evaluate classiﬁers (P CC = total_nbr_classif ied_inst × 100). In the possibilistic setting, it is used as follows: it chooses for each classiﬁed instance the class having

850

I. Jenhani et al.

the highest possibility degree (equal to 1). If more than one class is obtained, then one of them is chosen randomly. The obtained class is considered as the class of the testing instance. Consequently, nbr_well_classif ied_inst corresponds to the number of testing instances for which the class obtained by the possibilistic classiﬁer (the more plausible class) is the same as the real class.The limitation of this adaptation of the P CC criterion to the possibilistic setting, is that it chooses randomly one of the more plausible classes which may miss-classify some instances. Moreover, even when there is only one more plausible class, focusing on that class and ignoring the rest of the classes (classes with possibility degrees diﬀerent from 1) is problematic. In fact, ignoring the rest of the degrees implies ignoring a part of the information given by the resulting possibility distribution (π res ). So, a solution is to deﬁne an aﬃnity based criterion P CC_Af f (Equation (7)) which takes into account the mean aﬃnity relative to all the classiﬁed testing instances: the average of the similarities between the resulting possibility distribution (π res ) and the real (completely sure) possibility distribution (π j ) of each classiﬁed instance Ij , j = 1..n. When P CC_Af f is close to 100%, the classiﬁer is good whereas when it falls to 0%, it is considered as a bad classiﬁer. n res , πj ) j=1 Inf oAf f (π P CC_Af f = × 100 (7) total_nbr_classif ied_inst Note that an alternative P CC criterion for possibilistic classiﬁers, more precisely, for possibilistic decision trees was proposed in [2]. The so-called Qualitative PCC denoted by Q_P CC is diﬀerent from P CC_Af f : the former is based on an Euclidean distance between the real (completely sure) possibility distribution of each classiﬁed instance and its resulting qualitative possibility distribution which is induced from the leximin-leximax ordering on the diﬀerent classes given by the tree. 6.3

Comparing Opinions and Sensor Measures

In many situations, comparing opinions of diﬀerent agents supports decision making. For instance, suppose we have a group of candidates taking part in a competitive entry examination. Each candidate will be asked questions. Some ﬂexibility is oﬀered to the candidates which will allow them to give a possibility degree for each proposed response instead of giving a precise response. The ﬁnal best candidate will be the one giving possibility distributions which are the most similar to the true responses (possibility distributions corresponding to completely sure knowledge). Another interesting use of the Information Aﬃnity measure appears for sensor diagnosis. Suppose that we have many sensors measuring a given variable. These sensors are allowed to give measures with some errors, consequently, one can represent their outputs as possibility distributions over the diﬀerent possible values of the variable under study. Suppose that we are sure that a given sensor s0 is reliable (a new installed sensor). One should compare measures (the possibility

Information Aﬃnity: A New Similarity Measure

851

distributions) given by the diﬀerent sensors with the one given by s0 and reject or replace those giving diﬀerent measures to a certain extent.

7

Conclusion

This paper focuses on measuring the similarity between possibilistic uncertain information. One should note that, contrary to what has been done in other uncertainty formalisms, few works have been done in this direction for the case of possibility theory. After proposing some natural properties of a similarity measure between possibility distributions, after studying some few existing measures and showing their limits by examples, we have proposed a new similarity measure which takes its roots from both the measure of inconsistency and a classical distance. We have contrasted our measure with the existing ones and have shown that it represents a reliable measure which recovers the limits of the few existing ones. Potential applications of the proposed measure have been mentioned in the end of the paper.

References 1. Abellan, J., Gomez, M.: Measures of divergence on credal sets. Fuzzy Sets and Systems 157, 1514–1531 (2006) 2. Ben Amor, N., Benferhat, S., Elouedi, Z.: Qualitative classiﬁcation and evaluation in possibilistic decision trees. In: FUZZ-IEEE’04 (2004) 3. Bouchon-Meunier, B., Rifqi, M., Bothorel, S.: Towards general measures of comparison of objects. Fuzzy sets and systems 84(2), 143–153 (1996) 4. Chan, H., Darwiche, A.: A distance measure for bounding probabilistic belief change. International Journal of Approximate Reasoning 38, 149–174 (2005) 5. Choquet, G.: Theory of capacities. Annales de L’Institut Fourier 54, 131–295 (1953) 6. De Baets, B., De Meyer, H.: Transitivity-preserving fuzziﬁcation schemes for cardinality-based similarity measures. EJOR 160(1), 726–740 (2005) 7. Dubois, D., Prade, H.: Possibility theory: An approach to computerized processing of uncertainty. Plenum Press, New York (1988) 8. Fixsen, D., Mahler, R.P.S.: The modiﬁed Dempster-Shafer approach to classiﬁcation. IEEE. Trans. Syst. Man and Cybern. 27, 96–104 (1997) 9. Fono, L.A., Gwet, H., Bouchon-Meunier, B.: Fuzzy implication operators for difference operations for fuzzy sets and cardinality-based measures of comparison. EJOR 183, 314–326 (2007) 10. Higashi, M., Klir, G.J.: Measures of uncertainty and information based on possibility distributions. Int. J. General Systems 9(1), 43–58 (1983) 11. Higashi, M., Klir, G.J.: On the notion of distance representing information closeness: Possibility and probability distributions. IJGS 9, 103–115 (1983) 12. Hüllermeier, E.: Possibilistic instance-based learning. AI 148(1-2), 335–383 (2003) 13. Jenhani, I., Ben Amor, N., Elouedi, Z., Mellouli, K.: Decision Trees as Possibilistic Classiﬁers (paper submitted) 14. Jousselme, A.L., Grenier, D., Bossé, E.: A new distance between two bodies of evidence. Information Fusion 2, 91–101 (2001)

852

I. Jenhani et al.

15. Krishnapuram, R., Keller, J.M.: A possibilistic approach to clustering. IEEE Transactions on Fuzzy Systems 1(2), 98–110 (1993) 16. Kroupa, T.: Measure of divergence of possibility measures. In: Proceedings of the 6th Workshop on Uncertainty Processing, Prague, pp. 173–181 (2003) 17. Kullback, S., Leibler, R.A.: On information and suﬃciency. Annals of Mathematical Statistics 22, 79–86 (1951) 18. Sanguesa, R., Cabos, J., Cortes, U.: Possibilistic conditional independence: a similarity based measure and its application to causal network learning. IJAR (1997) 19. Sanguesa, R., Cortes, U.: Prior knowledge for learning networks in non-probabilistic settings. IJAR 24, 103–120 (2000) 20. Shafer, G.: A mathematical theory of evidence. Princeton University Press, Princeton (1976) 21. Tessem, B.: Approximations for eﬃcient computation in the theory of evidence. Artiﬁcial Intelligence 61, 315–329 (1993) 22. Tversky, A.: Features of similarity. Psychological Review 84, 327–352 (1977) 23. Wang, X., De Baets, B., Kerre, E.: A comparative study of similarity measures. Fuzzy Sets and Systems 73(2), 259–268 (1995) 24. Williams, M-A.: An Operational Measure of Similarity for Belief Revision Systems (1997) 25. Zadeh, L.A.: Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets ans Systems 1, 3–28 (1978) 26. Zouhal, L.M., Denoeux, T.: An evidence-theoric k -NN rule with paprameter optimization. IEEE Trans. Syst. Man Cybern. C 28(2), 263–271 (1998)

LARODEC, Institut Supérieur de Gestion de Tunis, Tunisia 2 CRIL, Université d’Artois, Lens, France [email protected], [email protected], [email protected], [email protected], [email protected]

Abstract. This paper addresses the issue of measuring similarity between pieces of uncertain information in the framework of possibility theory. In a ﬁrst part, natural properties of such functions are proposed and a survey of the few existing measures is presented. Then, a new measure so-called Information Aﬃnity is proposed to overcome the limits of the existing ones. The proposed function is based on two measures, namely, a classical informative distance, e.g. Manhattan distance which evaluates the diﬀerence, degree by degree, between two normalized possibility distributions and the well known inconsistency measure which assesses the conﬂict between the two possibility distributions. Some potential applications of the proposed measure are also mentioned in this paper. Keywords: Possibility theory, Similarity, Divergence measure, Distance, Inconsistency measure.

1

Introduction

Most of real-world decision problems are faced with uncertainty. Uncertainty about values of given variables (e.g. the type of a detected target in military applications, the disease aﬀecting a patient in medical applications, etc.) can result from some errors and hence from non-reliability (in the case of sensors) or from diﬀerent background knowledge (in the case of agents: doctors, etc.). As a consequence, it is possible to obtain diﬀerent uncertain pieces of information about a given value from diﬀerent sources. Obviously, comparing these pieces of information could be very interesting to support decision making. Comparing pieces of uncertain information given by several sources has attracted a lot of attention for a long time. For instance, we can mention the well-known Euclidean and KL-divergence [17] for comparing probability distributions. Another distance has been proposed by Chan and al. [4] for bounding probabilistic belief change. Moving to belief function theory [20], several distance measures between bodies of evidence deserve to be mentioned. Some distances K. Mellouli (Ed.): ECSQARU 2007, LNAI 4724, pp. 840–852, 2007. c Springer-Verlag Berlin Heidelberg 2007

Information Aﬃnity: A New Similarity Measure

841

have been proposed as measures of performance (MOP) of identiﬁcation algorithms [8] [14]. Another distance was used for the optimization of the parameters of a belief k -nearest neighbor classiﬁer [26]. In [21], the authors proposed a distance for the quantiﬁcation of errors resulting from basic probability assignment approximations. Similarity measures between two fuzzy sets A and B have been also proposed in the literature [6] [9] [23] [24]. For instance, in the work by Bouchon-Meunier and al. [3], the authors proposed a similarity measure between fuzzy sets as an extension of Tversky’s model on crisp sets [22]. The measure was then used to develop an image search engine. Contrary to probability, belief function and fuzzy set theories, few works are dedicated to distance measures in possibility theory despite its popularity. Hence, in this paper, we will focus on measures for the comparison of uncertain information represented by possibility distributions. In a ﬁrst part, we will study the few existing works and show their limits, then, we will propose a new similarity measure, so-called Information Aﬃnity which satisﬁes very natural properties. Our measure would be useful in many real-world applications where the uncertainty is modeled by means of possibility theory. For instance, it could be used as a critical parameter for distance based possibilistic machine learning algorithms, it could also be used for the evaluation of possibilistic classiﬁers, for the comparison of expert opinions, etc. The rest of the paper is organized as follows: Section 2 starts by giving the necessary background concerning possibility theory. Section 3 provides diﬀerent properties that a similarity measure should satisfy. Section 4 represents an overview of the existing similarity measures within the possibilistic setting with detailed examples and critics. The deﬁnition and the contrast of the new Information Aﬃnity measure with existing measures are proposed in Section 5. Some potential applications of the proposed measure are shown in Section 6. Finally, Section 7 concludes the paper.

2

Possibility Theory

Possibility theory represents a non-classical theory (distinct from probability theory), ﬁrst introduced by Zadeh [25] and then developed by several authors (e.g., Dubois and Prade [7]). In this section, we will give a brief recalling on possibility theory. Possibility distribution Given a universe of discourse Ω = {ω1 , ω2 , ..., ωn }, a fundamental concept of possibility theory is the possibility distribution denoted by π. π corresponds to a function which associates to each element ωi from the universe of discourse Ω a value from a bounded and linearly ordered valuation set (L,s(π3 , π4 ). Suppose that ∀j = 1..4, and ωp , ωq ∈ Ω, we have πj (ωp ) = πj (ωq ) and πj (ωq ) = πj (ωp ), hence we should obtain s(π1 , π2 )>s(π3 , π4 ).

4

Measuring Similarity of Possibilistic Uncertain Information

Measuring similarity of uncertainty based information has attracted a lot of attention in probability theory [4,17], in belief function theory [8,14,21,26], in fuzzy set theory [3,6,9,23,24] and in credal set theory [1]. This is not the case for possibilistic uncertain information, in fact, few works have been done in this direction. Let us present, chronologically, some of these measures and show their weaknesses in expressing information divergence between any given two agents (or sensors) who are expressing their opinions (or measures), especially, in the form of possibility distributions. 4.1

Information Closeness

The ﬁrst paper, especially dedicated to the problem of measuring information similarity between two possibility distributions was the one of Higashi and Klir in 1983 [11]. They proposed an information variation based measure which they called information closeness denoted by G. Function G is computed using their U -uncertainty measure [10] (Equation (2)) and it is applicable to any pair of normalized possibility distributions. The less the value of G is, the more the information are similar (G behaves as a distance measure). Definition 1. Let π1 and π2 be two possibility distributions on the same universe of discourse Ω. The information closeness G between π1 and π2 is deﬁned as: G(π1 , π2 ) = g(π1 , π1 ∨ π2 ) + g(π2 , π1 ∨ π2 ) (3) where g(πi , πj ) = U (πj ) − U (πi ). ∨ is taken as the maximum operator and U is the non-speciﬁcity measure given by Equation (2). Consequently, function G can be written as G(π1 , π2 ) = 2 ∗ U (π1 ∨ π2 ) − U (π1 ) − U (π2 ). Example 2. Consider the following distributions π1 , π2 , π3 and π4 over Ω = {ω1 , ω2 , ω3 , ω4 }: π1 [1, 0.5, 0.3, 0.7], π2 [1, 0, 0, 0], π3 [0.9, 1, 0.3, 0.7], π4 [0, 1, 0.3, 0.7]. Let us try to ﬁnd an order expressing which from the information given by π2 , π3 and π4 is closer to π1 . G(π1 , π2 ) = 1.12, G(π1 , π3 ) =

Information Aﬃnity: A New Similarity Measure

845

0.52, G(π1 , π4 ) = 1.08. According to G, π3 is the closest to π1 and π4 is closer to π1 than π2 . The dissimilarity measure G does not satisfy Property 4. In fact, G(πi , πj ) should take its maximum value ∀ πi , πj satisfying items i) to iv) (see Property 4 ). Example 3. Let us consider these distributions: π1 [1, 0, 0, 0], π2 [0, 1, 1, 1], π3 [0, 1, 0, 1] and π4 [1, 0, 1, 0] . Clearly, π1 = 1−π2 and π3 = 1−π4 . Hence, G should take its maximum value when comparing π1 and π2 as well as π3 and π4 . Nevertheless, according to G, we obtain: G(π1 , π2 ) = 2 ∗ log2(4) − log2 (3) = 2.41, G(π3 , π4 ) = 2 ∗ log2 (4) − 2 ∗ log2 (2) = 2. It means that π3 and π4 are more similar to each others than π1 and π2 are, which is contrary to what we expect: G(π1 , π2 ) should be maximal and equal to G(π3 , π4 ). 4.2

Sangüesa et al. Distance

In a work by Sangüesa et al. [19] focusing on learning possibilistic causal networks, the authors proposed a modiﬁed version of a distance measure [18] between two possibility distributions for DAG (Directed Acyclic Graph) learning and evaluation. This is done by measuring the distance (which must be minimized) between the possibility distribution implied by a DAG and the one underlying the database. This idea is based on the interpretation of independence as information similarity. Definition 2. Given two possibility distributions π1 and π2 on the same universe of discourse Ω. The distance between π1 and π2 is deﬁned as the nonspeciﬁcity of the distribution diﬀerence distance(π1 , π2 ) = U (|π1 − π2 |)

(4)

This measure gives diﬀerent results from the previous one. Example 4. If we take the same distributions π1 , π2 , π3 and π4 of Example 2, we obtain: distance(π1 , π2 ) = U ([0, 0.5, 0.3, 0.7]) = 1.27, distance(π1 , π3 ) = U ([0.1, 0.5, 0, 0]) = 1.1, distance(π1 , π4 ) = U ([1, 0.5, 0, 0]) = 0.5. Hence according to this measure, π2 remains the farthest but π4 becomes the closest to π1 . This measure has a serious problem when the distribution diﬀerence (|π1 − π2 |) is sub-normalized (which occurs most of the time). Indeed, it is in this situation that the second term of Equation (2) will be considered. If we concentrate in Equation (2), we can notice that measuring the non-speciﬁcity of a sub-normalized distribution π comes down to measure the non-speciﬁcity of its normalized distribution π s.t π (ωi ) = π(ωi ) + 1 − maxω∈Ω {π(ω)}. Obviously, this normalization scheme is not suited for the proposed distance. The following example shows this weakness: Example 5. Let us consider the following three possibility distributions: π1 [1, 0, 0, 0], π2 [1, 0, 0, 0], π3 [0, 1, 1, 1], π4 [1, 1, 0, 0]. Clearly, π2 is the closest possible distribution to π1 (the best case) while π3 is the farthest distribution (the worst case). Nevertheless, the distance measure does not agree: distance(π1 , π2 ) = U ([0, 0, 0, 0]) = 2 (maximum) distance(π1 , π3 ) = U ([1, 1, 1, 1]) = 2 (maximum)X

846

I. Jenhani et al.

distance(π1 , π4 ) = U ([0, 1, 0, 0]) = 0 (minimum) Hence, π1 and π2 are maximally distant from each other which violates Property 4. Property 3 is also violated since, according to the example, π1 and π4 are maximally similar to each other. 4.3

Information Divergence

A possibilistic analogy to the probabilistic measure of divergence was proposed by Kroupa [16]. The author has used the Choquet integral [5] as an aggregation operator of the possibility degrees characterizing the, generally, sub-normalized distribution diﬀerence (πd = |π1 (ωi ) − π2 (ωi )|, i=1..n) of any two normal distributions π1 and π2 . Definition 3. Given two possibility distributions π1 and π2 on the same universe of discourse Ω, the measure of divergence D(π1 |π2 ) is deﬁned as the discrete Choquet integral of the degrees of πd : D(π1 |π2 ) =

n

πd (ωσ(i) )[Π1 (Aσ(i) ) − Π1 (Aσ(i+1) )]

(5)

i=1

where σ is a permutation of indices such that πd (ωσ(i) ) ≤ ... ≤ πd (ωσ(n) ) and Aσ(i) = {ωσ(i) , ..., ωσ(n) }, i=1..n and Aσ(n+1) = 0. Example 6. Considering the distributions of Example 2, the application of the divergence measure gives: D(π1 |π2 ) = 0.49, D(π1 |π3 ) = 0.3, D(π1 |π4 ) = 1. Again, we obtain a diﬀerent order from Example 2 and Example 4: π3 is the closest to π1 and π4 is the farthest. Clearly, the measure D is not symmetric. Moreover, given any possibility distribution πi , the proposed information divergence measure gives the maximum divergence degree (Equal to 1) for all possibility distributions πj satisfying Inc(πi ∧ πj ) = 1, in other words, when the distribution diﬀerence πd is normalized. Hence, we can no longer discriminate between these πj ’s. Example 7 emphasizes this limit: Example 7. Let us consider the same distributions π1 and π4 of the previous example. Let us consider π5 [0, 1, 1, 1]. D(π1 |π5 ) = D(π1 |π4 ) = 1. We can conclude that this measure is not enough discriminatory since π4 appears closer to π1 than π5 was.

5

Information Aﬃnity: A New Possibilistic Similarity Measure

Considering the aforementioned weaknesses related to the existing measures of divergence between possibility distributions, we will propose a new measure that

Information Aﬃnity: A New Similarity Measure

847

overcomes these drawbacks. The proposed measure takes into account a classical informative distance along with the well known inconsistency measure. Among the classical informative distance functions (Manhattan, Euclidean, Chebyshev, Sorensen, etc.) we choose the Manhattan distance: a simple distance which, when combined with the inconsistency measure, satisﬁes the expected properties mentioned in Section 3. The choice of combining these two criteria is justiﬁed by the fact that neither the distance measure nor the inconsistency measure, taken separately, allows us to decide about the closest distribution to a given one (Example 8 emphasizes this problem). More formally, let us consider three possibility distributions π1 , π2 and π3 . Our aim is to determine which, from π2 and π3 , is closer to π1 . In the case of equal conﬂict, i.e., Inc(π1 ∧ π2 ) = Inc(π1 ∧ π3 ), it is the classical distance that will decide about the closest distribution. In the same way, when we have equal distances, i.e., d(π1 , π2 ) = d(π1 , π3 ), it is the turn of the conﬂict (inconsistency) measure to decide about the closest distribution, i.e., the less conﬂicting will be the closest. Example 8. Let us consider the following possibility distributions: π1 [1, 0, 0, 0], π2 [0.4, 1, 0.8, 0.5], π3 [0.2, 1, 1, 0.7]. If we use a classical distance measure (e.g. Manhattan distance), we obtain, d(π1 ,π2 )=d(π1 ,π3 )= 2.9 4 = 0.725. Hence, we can not decide wether π2 or π3 is closer to π1 . We can obtain similar situations even when using another distance (Euclidean, Chebyshev, etc.). Let us now consider the following possibility distributions: π1 [1, 0, 0, 0], π2 [0, 1, 0, 0], π3 [0, 1, 1, 1]. We have Inc(π1 , π2 ) = Inc(π1 , π3 ) = 1. Again, we can not decide which from π2 and π3 is closer to π1 . Definition 4. Let π1 and π2 be two possibility distributions on the same universe of discourse Ω. We deﬁne a measure InfoAﬀ(π1 , π2 ) as follows: d(π1 , π2 ) + Inc(π1 ∧ π2 ) (6) Inf oAf f (π1 , π2 ) = 1 − 2 where d(π1 , π2 ) = n1 ni=1 |π1 (ωi ) − π2 (ωi )| represents the Manhattan distance between π1 and π2 and Inc(π1 ∧ π2 ) tells us about the degree of conﬂict between the two distributions (see Equation (1)). Note that the 12 value is necessary to obtain the required range [0,1]. Two possibility distributions π1 and π2 are said to have a strong aﬃnity (resp. weak aﬃnity) if Inf oAf f (π1 , π2 ) = 1 (resp. Inf oAf f (π1 , π2 ) = 0). Proposition 1. The Inf oAf f measure satisﬁes the six properties. Proofs Property 1. Non-negativity: By deﬁnition, 0 ≤ d(a, b) ≤ 1. Moreover, 0 ≤ Inc(a, b) ≤ 1 (possibility de≤ 1 ⇒ 0 ≤ 1 − d(a,b)+Inc(a,b) ≤ 1 ⇒ grees ∈ [0,1]). ⇒ 0 ≤ d(a,b)+Inc(a,b) 2 2 Inf oAf f (a, b) ≥ 0.

848

I. Jenhani et al.

Property 2. Symmetry: Inf oAf f (b, a) = 1 −

d(b,a)+Inc(b∧a) 2

=1−

d(a,b)+Inc(a∧b) =Inf oAf f (a, b). 2

Property 3. Upper bound and Non-degeneracy: = 1 − (0+0) = 1. ∀ b = a, Inf oAf f (a, b) = Inf oAf f (a, a) = 1 − d(a,a)+Inc(a∧a) 2 2 Note that in the case of b = a, Inc(a ∧ b) could be equal to 0 but in any case we have d(a, b) = 0, consequently, Inf oAf f (a, b) could not be equal to 1. Moreover, Inf oAf f (a, b) = 1 occurs in the following two cases: Case 1: When d(a, b) = 0 and Inc(a, b) = 0, which occurs only when a = b. Case 2: When d(a, b) = −Inc(a, b) which is impossible because d(a, b) ≥ 0 and Inc(a, b) ≥ 0. Property 4. Lower bound: = 1⇔ d(a, b) + Inc(a ∧ b) = 2. Inf oAf f (a, b) = 0 ⇔ d(a,b)+Inc(a∧b) 2 Since max(d(a, b)) = 1 and max(Inc(a, b)) = 1, then obviously we have d(a, b) = 1 and Inc(a, b) = 1. These two equalities, simultaneously hold, only when a and b are maximally contradictory, i.e, when a and b simultaneously satisfy all the following conditions: i) a and b are binary possibility distributions, ii) nor a neither b could represent total ignorance, iii) a and b should be normalized and iv) b is the negation (the complement) of a (see Property 4). Property 5. Inclusion: If a is more speciﬁc than b which is in turn more speciﬁc then c, automatically, we can conclude that a, b and c are fully consistent with each others (they all share at least one state which is fully possible), i.e., Inc(a,b)=Inc(a,c)=Inc(b,c)=1. ≥ 1 − d(a,c)+1 Moreover, it is obvious to see that d(a,b)≤d(a,c). So, 1 − d(a,b)+1 2 2 ⇒ Inf oAf f (a, b) ≥ Inf oAf f (a, c). Property 6. Permutation: Suppose that we have Inf oAf f (a, b) > Inf oAf f (c, d). Hence a’, b’, c’ and d’ are possibility distributions obtained by permuting elements having the same indexes in a, b, c and d. Since we are computing d and Inc degree by degree, the pairwise permutation of the elements has no eﬀect on d and Inc. So we obtain d(a,b)=d(a’,b’) and Inc(c,d)=Inc(c’,d’) ⇒ Inf oAf f (a , b ) ≥ Inf oAf f (c , d ). Example 9. Let us revisit each one of the examples listed above and see the results given by our measure for these same examples: Examples 2, 4 and 6: π1 [1, 0.5, 0.3, 0.7], π2 [1, 0, 0, 0], π3 [0.9, 1, 0.3, 0.7], π4 [0, 1, 0.3, 0.7]. Inf oAf f (π1 , π2 ) = 0.82, Inf oAf f (π1 , π3 ) = 0.88, Inf oAf f (π1 , π4 ) = 0.66. Hence, π3 is the closest to π1 and π4 is the farthest: a diﬀerent order from the ones obtained in Example 2 and 4. Note that our measure gives the same order, for this example, as the one given by the divergence measure.

Information Aﬃnity: A New Similarity Measure

849

Example 3: π1 [1, 0, 0, 0], π2 [0, 1, 1, 1], π3 [0, 1, 0, 1], π4 [1, 0, 1, 0] . Inf oAf f (π1 , π2 ) = 0, Inf oAf f (π3 , π4 ) = 0. Inf oAf f is minimal for both cases: a diﬀerent result from the one obtained in Example 3. Example 5: π1 [1, 0, 0, 0], π2 [1, 0, 0, 0], π3 [0, 1, 1, 1]. Inf oAf f (π1 , π2 ) = 1, Inf oAf f (π1 , π3 ) = 0. Hence, π2 is the closest possible distribution to π1 and π3 represents the worst case. Again, we obtain a diﬀerent result from the one of Example 5. Still with Example 5, if we take possibility distributions π4 [0, 1, 1, 0] and π5 [0, 1, 0, 0], we obtain Inf oAf f (π1 , π4 ) = 0.125 and Inf oAf f (π1 , π5 ) = 0.25. Hence, π5 is closer to π1 than π4 . To ﬁnish, Example 8: π1 [1, 0, 0, 0], π2 [0.4, 1, 0.8, 0.5], π3 [0.2, 1, 1, 0.7]. Inf oAf f (π1 , π2 ) = 0.33, Inf oAf f (π1 , π3 ) = 0.16, ⇒ π2 is closer to π1 than π3 . If we take: π1 [1, 0, 0, 0], π2 [0, 1, 0, 0], π3 [0, 1, 1, 1]. Inf oAf f (π1 , π2 ) = 0.25, Inf oAf f (π1 , π3 ) = 0, ⇒ π2 is closer to π1 than π3 .

6

Practical Applications of Information Aﬃnity

We mention some ﬁelds in which Information Aﬃnity measure could be useful. 6.1

Machine Learning: Classification and Clustering

The proposed information aﬃnity measure could be used in many classiﬁcation and clustering algorithms, especially in those using possibility theory as a tool for dealing with existing uncertainty in the learning process [12] [13]. For instance, Inf oAf f could be used as the basis of an attribute selection measure for inducing decision trees from imprecisely labeled data. More formally, it will allow to select the attribute that, when chosen, will provide partitions of the training set containing maximally similar instances, i.e, instances having as much as possible similar possibility distributions on their classes. Still in classiﬁcation problems, Inf oAf f could be also used in most of distance based classiﬁers which are induced from imprecise data, e.g. k-nearest neighbor classiﬁers, genetic algorithms, artiﬁcial immune recognition systems, etc. Likewise, Inf oAf f could be used in possibilistic clustering [15] as the distance criterion which will allow to decide about the belonging or not of an instance to a given cluster which is characterized by a possibility distribution. 6.2

Evaluation of Possibilistic Classifiers

The use of our measure does not only comply with learning, it could also be used in the evaluation of possibilistic classiﬁers. Recall that within a possibilistic classiﬁer, the classiﬁcation result is given in the form of a possibility distribution (π res ) on the diﬀerent possible classes of the problem (Ω = {C1 , C2 , ..., Cn }). Generally, the well known percentage of correct classiﬁcation (P CC) is used to nbr _well_classif ied_inst evaluate classiﬁers (P CC = total_nbr_classif ied_inst × 100). In the possibilistic setting, it is used as follows: it chooses for each classiﬁed instance the class having

850

I. Jenhani et al.

the highest possibility degree (equal to 1). If more than one class is obtained, then one of them is chosen randomly. The obtained class is considered as the class of the testing instance. Consequently, nbr_well_classif ied_inst corresponds to the number of testing instances for which the class obtained by the possibilistic classiﬁer (the more plausible class) is the same as the real class.The limitation of this adaptation of the P CC criterion to the possibilistic setting, is that it chooses randomly one of the more plausible classes which may miss-classify some instances. Moreover, even when there is only one more plausible class, focusing on that class and ignoring the rest of the classes (classes with possibility degrees diﬀerent from 1) is problematic. In fact, ignoring the rest of the degrees implies ignoring a part of the information given by the resulting possibility distribution (π res ). So, a solution is to deﬁne an aﬃnity based criterion P CC_Af f (Equation (7)) which takes into account the mean aﬃnity relative to all the classiﬁed testing instances: the average of the similarities between the resulting possibility distribution (π res ) and the real (completely sure) possibility distribution (π j ) of each classiﬁed instance Ij , j = 1..n. When P CC_Af f is close to 100%, the classiﬁer is good whereas when it falls to 0%, it is considered as a bad classiﬁer. n res , πj ) j=1 Inf oAf f (π P CC_Af f = × 100 (7) total_nbr_classif ied_inst Note that an alternative P CC criterion for possibilistic classiﬁers, more precisely, for possibilistic decision trees was proposed in [2]. The so-called Qualitative PCC denoted by Q_P CC is diﬀerent from P CC_Af f : the former is based on an Euclidean distance between the real (completely sure) possibility distribution of each classiﬁed instance and its resulting qualitative possibility distribution which is induced from the leximin-leximax ordering on the diﬀerent classes given by the tree. 6.3

Comparing Opinions and Sensor Measures

In many situations, comparing opinions of diﬀerent agents supports decision making. For instance, suppose we have a group of candidates taking part in a competitive entry examination. Each candidate will be asked questions. Some ﬂexibility is oﬀered to the candidates which will allow them to give a possibility degree for each proposed response instead of giving a precise response. The ﬁnal best candidate will be the one giving possibility distributions which are the most similar to the true responses (possibility distributions corresponding to completely sure knowledge). Another interesting use of the Information Aﬃnity measure appears for sensor diagnosis. Suppose that we have many sensors measuring a given variable. These sensors are allowed to give measures with some errors, consequently, one can represent their outputs as possibility distributions over the diﬀerent possible values of the variable under study. Suppose that we are sure that a given sensor s0 is reliable (a new installed sensor). One should compare measures (the possibility

Information Aﬃnity: A New Similarity Measure

851

distributions) given by the diﬀerent sensors with the one given by s0 and reject or replace those giving diﬀerent measures to a certain extent.

7

Conclusion

This paper focuses on measuring the similarity between possibilistic uncertain information. One should note that, contrary to what has been done in other uncertainty formalisms, few works have been done in this direction for the case of possibility theory. After proposing some natural properties of a similarity measure between possibility distributions, after studying some few existing measures and showing their limits by examples, we have proposed a new similarity measure which takes its roots from both the measure of inconsistency and a classical distance. We have contrasted our measure with the existing ones and have shown that it represents a reliable measure which recovers the limits of the few existing ones. Potential applications of the proposed measure have been mentioned in the end of the paper.

References 1. Abellan, J., Gomez, M.: Measures of divergence on credal sets. Fuzzy Sets and Systems 157, 1514–1531 (2006) 2. Ben Amor, N., Benferhat, S., Elouedi, Z.: Qualitative classiﬁcation and evaluation in possibilistic decision trees. In: FUZZ-IEEE’04 (2004) 3. Bouchon-Meunier, B., Rifqi, M., Bothorel, S.: Towards general measures of comparison of objects. Fuzzy sets and systems 84(2), 143–153 (1996) 4. Chan, H., Darwiche, A.: A distance measure for bounding probabilistic belief change. International Journal of Approximate Reasoning 38, 149–174 (2005) 5. Choquet, G.: Theory of capacities. Annales de L’Institut Fourier 54, 131–295 (1953) 6. De Baets, B., De Meyer, H.: Transitivity-preserving fuzziﬁcation schemes for cardinality-based similarity measures. EJOR 160(1), 726–740 (2005) 7. Dubois, D., Prade, H.: Possibility theory: An approach to computerized processing of uncertainty. Plenum Press, New York (1988) 8. Fixsen, D., Mahler, R.P.S.: The modiﬁed Dempster-Shafer approach to classiﬁcation. IEEE. Trans. Syst. Man and Cybern. 27, 96–104 (1997) 9. Fono, L.A., Gwet, H., Bouchon-Meunier, B.: Fuzzy implication operators for difference operations for fuzzy sets and cardinality-based measures of comparison. EJOR 183, 314–326 (2007) 10. Higashi, M., Klir, G.J.: Measures of uncertainty and information based on possibility distributions. Int. J. General Systems 9(1), 43–58 (1983) 11. Higashi, M., Klir, G.J.: On the notion of distance representing information closeness: Possibility and probability distributions. IJGS 9, 103–115 (1983) 12. Hüllermeier, E.: Possibilistic instance-based learning. AI 148(1-2), 335–383 (2003) 13. Jenhani, I., Ben Amor, N., Elouedi, Z., Mellouli, K.: Decision Trees as Possibilistic Classiﬁers (paper submitted) 14. Jousselme, A.L., Grenier, D., Bossé, E.: A new distance between two bodies of evidence. Information Fusion 2, 91–101 (2001)

852

I. Jenhani et al.

15. Krishnapuram, R., Keller, J.M.: A possibilistic approach to clustering. IEEE Transactions on Fuzzy Systems 1(2), 98–110 (1993) 16. Kroupa, T.: Measure of divergence of possibility measures. In: Proceedings of the 6th Workshop on Uncertainty Processing, Prague, pp. 173–181 (2003) 17. Kullback, S., Leibler, R.A.: On information and suﬃciency. Annals of Mathematical Statistics 22, 79–86 (1951) 18. Sanguesa, R., Cabos, J., Cortes, U.: Possibilistic conditional independence: a similarity based measure and its application to causal network learning. IJAR (1997) 19. Sanguesa, R., Cortes, U.: Prior knowledge for learning networks in non-probabilistic settings. IJAR 24, 103–120 (2000) 20. Shafer, G.: A mathematical theory of evidence. Princeton University Press, Princeton (1976) 21. Tessem, B.: Approximations for eﬃcient computation in the theory of evidence. Artiﬁcial Intelligence 61, 315–329 (1993) 22. Tversky, A.: Features of similarity. Psychological Review 84, 327–352 (1977) 23. Wang, X., De Baets, B., Kerre, E.: A comparative study of similarity measures. Fuzzy Sets and Systems 73(2), 259–268 (1995) 24. Williams, M-A.: An Operational Measure of Similarity for Belief Revision Systems (1997) 25. Zadeh, L.A.: Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets ans Systems 1, 3–28 (1978) 26. Zouhal, L.M., Denoeux, T.: An evidence-theoric k -NN rule with paprameter optimization. IEEE Trans. Syst. Man Cybern. C 28(2), 263–271 (1998)