Bertoluzza et al. - Repositorio de la Universidad de Oviedo

4 downloads 0 Views 490KB Size Report
Fernández, Ana Colubi, Sara de la Rosa de Sáa, Marta Garcıa-Bárzana, Gil González-. Rodrıguez, Ana Belén Ramos-Guajardo and Beatriz Sinova.
METRON

Bertoluzza et al.’s metric as a basis for analyzing fuzzy data∗ ´ Mar´ıa Rosa Casals · Norberto Corral · Mar´ıa Angeles Gil · Mar´ıa Teresa L´ opez · Mar´ıa Asunci´ on Lubiano · Manuel Montenegro · Gloria Naval · Antonia Salas

Received: date / Accepted: date

Abstract Since Bertoluzza et al.’s metric between fuzzy numbers has been introduced, several studies involving it have been developed. Some of these studies concern equivalent expressions for the metric which are useful for either theoretical, practical or simulation purposes. Other studies refer to the potentiality of Bertoluzza et al.’s metric to establish statistical methods for the analysis of fuzzy data. This paper shortly reviews such studies and examine part of the scientific impact of the metric. Keywords Bertoluzza et al.’s metric · fuzzy numbers · random fuzzy sets

1 Introduction When analyzing fuzzy-valued data from a statistical perspective the use of suitable metrics between fuzzy data plays a crucial role. On one hand, some of the main drawbacks associated with the difference operation can be often overcome by using distances. Thus, as for the usual fuzzy arithmetic there is no difference operation always well-defined and preserving the main properties of the real/vectorial-valued case, this operation can be replaced by a distance between fuzzy data when the ‘sign’ of the deviation is not relevant. On the other hand, distances are also essential in formalizing errors in estimating, statistical convergences in stating limit results, and so on. ∗

This paper is a modest tribute to our admired and beloved friend Professor Carlo Bertoluzza from the University of Pavia Departamento de Estad´ıstica e I.O. y D.M., Universidad de Oviedo, 33007 Spain E-mail: [email protected]

2

Casals, Corral, Gil, L´ opez, Lubiano, Montenegro, Naval, Salas

On quantifying the distance between fuzzy data two relevant features ought to be integrated, namely, the ease-to-handle and the intuitiveness of the interpretation. In this regard, Bertoluzza et al. [1] have introduced a generalized metric on the space of fuzzy numbers which is friendly to use and can be intuitively supported. This paper aims to review Bertoluzza et al.’s generalized metric between fuzzy numbers, some equivalent expressions, as well as some topological properties. The choice of particular parameters/functions characterizing the metric is discussed. A concise review on some of the statistical methods for fuzzy data developed in this century and based on Bertoluzza et al.’s metric and the notion of random fuzzy sets [29] (originally coined as fuzzy random variables) is also given. The paper ends with some statistics on the scientific impact associated with Bertoluzza et al.’s metric [1].

2 Original definition, interpretation and metric properties In the course of some studies on fuzzy regression analysis, Bertoluzza, Corral and Salas (Bertoluzza et al.) introduced in [1] a distance between fuzzy numbers extending the Euclidean one between real numbers. By a fuzzy number (sometimes referred to as Zadeh’s fuzzy number -see, for instance, Herencia and Lamata [18,19]-) we mean (see Goetschel and Voxman [11]) a fuzzy subset of the space of real numbers R, that is, a mapping e : R → [0, 1], which is convex, normal and upper semi-continuous with comU pact support. e : R → [0, 1] such that for Equivalently, a fuzzy number is a mapping U e e (x) ≥ α} if α > 0, each α ∈ [0, 1], the α-level set (given by Uα = {x ∈ R : U e (x) > 0} if α = 0) is a nonempty compact interval. U e (x) is = cl{x ∈ R : U usually interpreted as the ‘degree of compatibility of the real number x with e ,’ or ‘degree of truth of the assertion “x is U e ”.’ the property associated with U Alternatively, Goetschel and Voxman [12] have proven that a fuzzy number e : R → [0, 1] such that is a mapping U – – – –

e(·) : [0, 1] → R is a bounded non-decreasing function, inf U e(·) : [0, 1] → R is a bounded non-increasing function, sup U e e1 , inf U1 ≤ sup U e e(·) are left-continuous on (0, 1] and right-continuous at 0. inf U(·) and sup U

Let Fc (R) denote the space of fuzzy numbers. Bertoluzza et al. have suggested to compute the distance between two elements in Fc (R) “... as a suitable weighted mean of the distances between the α-levels of the fuzzy numbers.” Consequently, “... the main difficulty is concerned with the definition of the distance between intervals,... so our first task consists on defining a measure of the distance between two intervals.”

Bertoluzza et al.’s metric and fuzzy data analysis

3

Bertoluzza et al. have pointed out some concerns related to the use of wellknown distances on the space Kc (R) of the nonempty compact intervals, like Hausdorff L∞ -metric, which for A, B ∈ Kc (R) is given by dH (A, B) = max{| inf A − inf B|, | sup A − sup B|}, or the Lp -metrics (see, for instance, Vitale [33]), which for A, B ∈ Kc (R) and 1 ≤ p < ∞ are given by ( ) p 1 p 1/p 1 δp (A, B) = inf A − inf B + sup A − sup B . 2 2 In this way, the fact that dH ([0, 5], [6, 7]) = dH ([0, 5], [6, 10]) or δp ([−2, 2], [−1, 1]) = δp ([−2, 1], [−1, 2]), although in both cases the second intervals intuitively appear to be more distant, prevent from using these metrics in the statistical setting (and especially in the context of quantifying errors in estimation). To overcome these drawbacks, when defining a new L2 -metric in Kc (R) Bertoluzza et al. suggest to involve not only the distances between the extreme values of the intervals, | inf A − inf B| and | sup A − sup B|, but also those between other values of the intervals. More concretely, to quantify the distance between intervals A and B – A bijection between them is first established by associating for any arbitrary t ∈ [0, 1]: A[t] ↔ B [t] (where A[t] = t · sup A + (1 − t) · inf A); – The root mean square Euclidean distance between the points associated [t] [t] 2 through the bijection (see Figure 1), that is, A −B , is later computed. inf A

sup A

inf B

At

sup B

Bt |A

[t]

−B | [t]

Fig. 1 The dW -distance is given by a root mean square distance, the distance being the one which is graphically displayed, |A[t] − B [t] |

The suggested L2 -distance in Kc (R) is stated as follows: Definition 1 Let W be a normalized weighting measure on the measurable space ([0, 1], B[0,1] ) which is formalized as a probability measure associated with a non-degenerate distribution. The proposed distance is given for A, B ∈ Kc (R) by √∫ A[t] − B [t] 2 dW (t). dW (A, B) = [0,1]

4

Casals, Corral, Gil, L´ opez, Lubiano, Montenegro, Naval, Salas

Although the weighting measure W is formally associated with a probability measure, it has no stochastic but weighting meaning and mission. In particular, if W is associated with the uniform distribution on {0, 1}, then dW reduces to δ2 . On the other hand, if W is associated with the uniform distribution on [0, 1], which will be denoted along the paper by ℓ, then dℓ ([0, 5], [6, 7]) = 4.1663 < 5.5076 = dℓ ([0, 5], [6, 10]), dℓ ([−2, 2], [−1, 1]) = 0.5774 < 1 = dℓ ([−2, 1], [−1, 2]). On extending this metric from Kc (R) to Fc (R), to quantify the distance e and Ve between fuzzy numbers U e and Ve is first established by associating – a double bijection between U eα ↔ Veα , and - for any arbitrary α ∈ [0, 1]: U [t] [t] e - for any arbitrary t ∈ [0, 1]: Uα ↔ Veα ; – the root mean square Euclidean distance between the [t]points[t] associated eα − Veα 2 , is later through the double bijection (see Figure 2), that is, U computed.

Fig. 2 The (W, φ)-distance is given by a root mean square distance, the distance being the eα[t] − Veα[t] | one which is graphically displayed, |U

The suggested L2 -distance in Fc (R) is stated as follows: Definition 2 Let W be a normalized weighting measure on the measurable space ([0, 1], B[0,1] ) which is formalized as a probability measure associated with a non-degenerate distribution, and φ be a normalized weighting measure on ([0, 1], B[0,1] ) which is formalized as a probability measure associated with an absolutely continuous distribution function being strictly increasing on [0, 1]. e , Ve ∈ Fc (R), then the (W, φ)-distance between two fuzzy numbers is given If U by the value √∫ [ ]2 φ e e eα , Veα ) dφ(α) dW (U DW ( U , V ) = [0,1]

Bertoluzza et al.’s metric and fuzzy data analysis

v u∫ u =t

[∫ [0,1]

5

] [t] 2 [t] eα − Veα dW (t) dφ(α). U

[0,1]

Although the weighting measure φ is formally associated with a probability measure, as it happens for W its meaning and mission are simply weighting but not stochastic in nature. Actually, φ weights the influence or importance of each level (i.e., degree of ‘vagueness’, ‘fuzziness’,...). Thus, φ will be mainly sensitive to ‘location’ changes; – if φ ≡ ℓ, DW – if, for instance, φ = Beta(1, p) with p >> 1 the lower the degree of compatφ ibility, the higher the weight, whence DW will be very sensitive to changes at the lowest levels of compatibility; – if, for instance, φ = Beta(p, 1) with p >> 1 the higher the degree of φ compatibility, the higher the weight, whence DW will be very sensitive to changes at the highest levels of compatibility. Bertoluzza et al. have recommended that levels with high degree of compatibility should count more in the distance than those with low degree. φ DW defines a metric on the space Fc (R). Thus, φ Proposition 1 DW is a metric on Fc (R).

Proof Indeed, • Non-negativity: trivial to prove. • Identity of indiscernibles: As W is not associated with a degenerate distriφ e e bution, DW (U , V ) = 0 if, and only if, ∫ [t] eα − Veα[t] 2 dW (t) = 0 a.s. [φ] U [0,1]

and, is associated with an absolutely continuous distribution and [t] as φ eα − Veα[t] 2 is left-continuous at α ∈ (0, 1] and right-continuous at α = 0, U [t] ∫ eα − Veα[t] 2 dW (t) = 0 for all α ∈ [0, 1]. U then [0,1]

For any α ∈ [0, 1], since W is associated with a non-degenerate distri [t] ∫ eα − Veα[t] 2 dW (t) = 0 implies that there exist two values bution, [0,1] U t1 (α), t2 (α) ∈ [0, 1], t1 (α) < t2 (α), such that eα − inf Veα ) + (1 − t1 (α))(sup U eα − sup Veα ) = 0, t1 (α)(inf U eα − inf Veα ) + (1 − t2 (α))(sup U eα − sup Veα ) = 0, t2 (α)(inf U and hence [ ] [ ] eα − inf Veα ) − (sup U eα − sup Veα ) = 0. t2 (α) − t1 (α) · (inf U In case either t1 (α) or t2 (α) belong to (0, 1), the unique possibility for the three preceding equalities to hold is that eα = inf Veα , sup U eα = sup Veα . inf U

6

Casals, Corral, Gil, L´ opez, Lubiano, Montenegro, Naval, Salas

In case t1 (α) = 0 and t2 (α) = 1, then also the unique possibility for the two first preceding equalities to hold is that eα = inf Veα , sup U eα = sup Veα . inf U e = Ve . Consequently, U • Symmetry: trivial to prove. • Triangular inequality: quite trivial to prove because of [t] [t] eα − Veα[t] 2 ≤ U eα − Teα[t] 2 + Teα[t] − Veα[t] 2 U e , Ve , Te ∈ Fc (R). for all α ∈ [0, 1] and U



Remark 1 It should be pointed out that for the developments in [1] authors restrict W to be a mixture of a discrete-finite distribution and a continuous one, but in fact there is no need for such a constraint in the general setting. Similarly, the absolute continuity of φ could be weakened by simply demanding φ a condition guaranteeing the identity of indiscernibles for DW , but the assumed condition seems to be ease-to-use and rather natural in practice. 3 Definitional and topological equivalences As it has been detailed in previous studies (see, for instance, Blanco-Fern´andez φ can be alternatively expressed in some different ways. et al. [3]), the metric DW The expression as it was introduced by Bertoluzza et al. is definitely the easiest version to interpret as it involves the choice of the weighting measure W . Nevertheless, for computations, simulations, theoretical developments and the extension to higher dimensionality spaces, some equivalences become more appropriate. These ‘definitional’ equivalences has been also described in Blanco-Fern´andez et al. [3]. 3.1 Equivalent definition based on weighting extremes and a relevant location point of the α-levels As eventually happens in Maths, the generalized metric in Definition 2 can be equivalently characterized by means of one of its particularizations. φ Thus, DW can be fully characterized (see Bertoluzza et al. [1], Lubiano et al. [23]) by particularizing the general weighting measure W to a discrete one weighting (for each level) at three points: the two extremes and an intermediate one (often the mid-point). ∫ Thus, given W and φ, if one denotes tW = [0,1] t dW (t) and λ = (λ1 , λ2 , λ3 ) with ∫





(t − tW )2 dW (t) λ1 =

[0,1]

1 − tW

(t − tW )2 dW (t)

t(1 − t) dW (t) , λ2 =

[0,1]

tW (1 − tW )

, λ3 =

[0,1]

tW

,

Bertoluzza et al.’s metric and fuzzy data analysis

then,

7

φ e e φ e e DW (U , V ) = Dλ (U , V )

√∫

(

=

[ ]2 [ ]2 [ ]2 ) eα[1] − Veα[1] + λ2 U eα[tW ] − Veα[tW ] + λ3 U eα[0] − Veα[0] λ1 U dφ(α).

(0,1]

It can be easily verified that λ1 > 0, λ2 ≥ 0, λ3 > 0, and λ1 + λ2 + λ3 = 1. Moreover, if tW = 0.5 (like it happens, for instance, if W is associated with a symmetric distribution w.r.t. t = 0.5, which is often a reasonable selection), eα[tW ] = mid U eα = centre of U eα = (inf U eα + sup U eα )/2. then U Although choosing W is more intuitive than choosing the weighting vector λ = (λ1 , λ2 , λ3 ), the last one would be easy-to-handle in many other developments. Some possible selections for the weighting vector with the correspondent measure W are gathered in Table 1. W

λ (

Beta(p, q) { Uniform 0, k1 , . . . ,

k−1 ,1 k

}

p q , p+q , (p+q)(p+q+1) p+q+1 (p+q)(p+q+1)

(

Binom(k, p) k

k+2 2k−2 k+2 , 3k , 6k 6k

(

p k−1 1−p , k , k k

)

)

)

Table 1 Some possible choices for λ = (λ1 , λ2 , λ3 ) which are based on choices for W

3.2 Equivalent definition based on weighting the centers and radii of the α-levels It is well-known that Hausdorff’s metric can be equivalently expressed in the interval-valued case as dH (A, B) = |mid A − mid B| + |spr A − spr B|, where mid A = (inf A + sup A)/2 = centre (mid-point) of A, spr A = (sup A − inf A)/2 = radius (spread) of A. φ In a similar way, DW can also be expressed (see, for instance, Gil et al. [8,9], Trutschnig et al. [32]) by replacing the general weighting measure W by a nonstochastic weighting of the squared distances between the intermediate points associated with tW and the squared distances between the radii. Thus, given ∫ W and φ, if one denotes θ = 4 [0,1] (t − tW )2 dW (t) = 4 λ1 (1 − tW ) ∈ (0, 1], then φ e e e , Ve ) DW (U , V ) = Dθφ (U √∫ ([ ]2 [ ]2 ) [tW ] [tW ] e e e e Uα − Vα = + θ spr Uα − spr Vα dφ(α). [0,1]

8

Casals, Corral, Gil, L´ opez, Lubiano, Montenegro, Naval, Salas

If tW = 0.5 (in particular, if W is associated with a symmetric distribution w.r.t. 0.5), then e , Ve ) Dθφ (U

√∫ =

([ ]2 [ ]2 ) eα − mid Veα + θ spr U eα − spr Veα mid U dφ(α).

[0,1]

Consequently, the choices of W and θ allow us to weight for each α the effect of the deviation in ‘shape/imprecision’ in contrast to the effect of the deviation in ‘location/position’. From a theoretical perspective we could extend the parameter θ to range on (0, ∞), but in practice it seems more reasonable to constrain θ to (0, 1] so that the deviation in shape/imprecision is weighted up to the deviation in location. Although choosing W is more intuitive than choosing the weighting parameter θ, the last one would be easy-to-handle in many other developments. Some possible selections for the weighting parameter with the correspondent measure W are gathered in Table 2. W

θ

Beta(p, q)

4pq (p+q)2 (p+q+1)

{ Uniform 0, k1 , . . . , Binom(k, p) k

}

k−1 ,1 k

k+2 3k 4p(1−p) k

Table 2 Some possible choices for θ which are based on choices for W

3.3 Equivalent definition based on the support functions of the fuzzy numbers Fuzzy numbers (in general, convex fuzzy sets) can also be characterized by e means of the so-called support function (see Puri and Ralescu [28]). If U e is the real valued function s e on {−1, 1} ∈ Fc (R), the support function of U U eα and s e (1, α) = sup U eα . × [0, 1] such that sUe (−1, α) = − inf U U φ By using this function, DW can be expressed (see N¨ather [27], K¨orner and N¨ather [21]) by replacing the general weighting measure W by a definite positive and symmetric kernel K defined on {−1, 1}2 × [0, 1]2 such that  0  K (u, v) dφ(α) if β = α dK(u, v, α, β) =  0 otherwise ∫

with

t2 dW (t) = λ1 (1 − tW ) + t2W ,

0

K (1, 1) = [0,1]

Bertoluzza et al.’s metric and fuzzy data analysis

9

∫ K 0 (−1, −1) =

(1 − t)2 dW (t) = λ1 (1 − tW ) + (1 − tW )2 , [0,1]



K (1, −1) = K (−1, 1) = 0

t(1 − t) dW (t) = (tW − λ1 )(1 − tW ).

0

[0,1]

Thus, given W and φ, by considering the inner product ⟨·, ·⟩K associated with the L2 -distance on the space of the Lebesgue integrable functions on {−1, 1} × [0, 1] w.r.t. the above definite positive and symmetric kernel K, we have that √ φ e e φ e e DW (U , V ) = DK (U , V ) = ⟨sUe − sVe , sUe − sVe ⟩K √∫ = ({−1,1}×[0,1])2

(sUe (u, α) − sVe (u, α))(sUe (v, β) − sVe (v, β)) dK(u, v, α, β).

Although choosing W is more intuitive than choosing the definite positive and symmetric kernel K, the latter would be convenient for certain developments, as we will see in the next section. Some possible selections for the kernel with the correspondent measure W are gathered in Table 3.  

W

K 0 (1, 1)

K 0 (1, −1)

 

K 0 (−1, 1) K 0 (−1, −1)   

Beta(p, q)

{ Uniform 0,

pq (p+q)(p+q+1)

pq (p+q)(p+q+1)

q(1+q) (p+q)(p+q+1)



}



1 , . . . , k−1 ,1 k k

 Binom(k, p) k

p(1+p) (p+q)(p+q+1)

 

2k+1 6k

k−1 6k

k−1 6k

2k+1 6k

  

 

p[(1−p)+kp] k

p(1−p)(k−1) k

p(1−p)(k−1) k

(1−p)[p+k(1−p)] k

  

Table 3 Some possible choices for the definite positive and symmetric kernel which are based on choices for W

As a summary implication of the equivalences which have been just stated, Table 4 jointly collects some particular choices of the weighting λ, θ and ( ) K 0 (1, 1) K 0 (1, −1) for certain symmetric selections of W (the symmetric K 0 (−1, 1) K 0 (−1, −1) being usually the most natural ones). On the other hand, Bertoluzza et al.’s metric is topologically equivalent to φ well-known separable metrics, which leads to valuable features for DW , as can be seen in the following subsection.

10

Casals, Corral, Gil, L´ opez, Lubiano, Montenegro, Naval, Salas ( W

λ

θ

Beta(1, 1) = ℓ

(1/6, 2/3, 1/6)

1/3

Beta(2, 2)

(1/10, 4/5, 1/10)

1/5

Beta(1/4, 1/4)

(1/3, 1/3, 1/3)

2/3

Beta(1/8, 1/8)

(2/5, 1/5, 2/5)

4/5

Uniform{0, 1/2, 1}

(1/3, 1/3, 1/3)

2/3

Binom(4, 1/2)/4

(1/8, 3/4, 1/8)

1/4

K 0 (1, 1) K 0 (1, −1) K 0 (−1, 1) K 0 (−1, −1)

( ( ( ( ( (

1/3 1/6 1/6 1/3

)

)

3/10 1/5 1/5 3/10 5/12 1/12 1/12 5/12 9/20 1/20 1/20 9/20 5/12 1/12 1/12 5/12 5/16 3/16 3/16 5/16

) ) ) ) )

Table 4 Some particular choices for the weighting vector, parameter and definite positive and symmetric kernel which are based on choices for W

3.4 Topological properties Bertoluzza et al.’s metric is topologically equivalent to the L2 -metric ρ2 between fuzzy numbers based on δ2 (Diamond and Kloeden [7]), which is given by √∫ [ ]2 e e eα , Veα ) dα ρ2 (U , V ) = δ2 (U √∫ = [0,1]

[0,1]

[

] 2 2 1 1 e e e e · inf Uα − inf Vα + · sup Uα − sup Vα dα, 2 2

and can be easily extended to √∫ [ ] 2 2 1 1 φ e e e e e e ρ2 (U , V ) = · inf Uα − inf Vα + · sup Uα − sup Vα dφ(α). 2 [0,1] 2 Whenever θ ∈ (0, 1], the last metric is equivalent to Dθφ . Thus, Proposition 2 Let φ be a normalized weighting measure on the measurable space ([0, 1], B[0,1] ) which is formalized as a probability measure associated with an absolutely continuous distribution function being strictly increasing on [0, 1], and let θ ∈ (0, 1]. The metric Dθφ is topologically equivalent to the metric ρφ 2 on Fc (R). More precisely, √ φ e e φ e e e e θ · ρφ 2 (U , V ) ≤ Dθ (U , V ) ≤ ρ2 (U , V ) e , Ve ∈ Fc (R). for all U

Bertoluzza et al.’s metric and fuzzy data analysis

11

e , Ve ∈ Fc (R) it is obvious that Proof Indeed, for each α ∈ [0, 1] and U eα , Veα ))]2 = θ · mid U eα − mid Veα 2 + θ · spr U eα − spr Veα 2 θ · [δ2 (U eα − mid Veα 2 + θ · spr U eα − spr Veα 2 ≤ mid U eα − mid Veα 2 + spr U eα − spr Veα 2 = [δ2 (U eα , Veα ))]2 . ≤ mid U Since eα , Veα ) = dUniform{0,1} (U eα , Veα ) δ2 ( U = then,

√ eα − mid Veα 2 + spr U eα − spr Veα 2 , mid U √ φ e e φ e e e e θ · ρφ 2 (U , V ) ≤ Dθ (U , V ) ≤ ρ2 (U , V ).

Therefore, Dθφ and ρφ 2 are topologically equivalent.



φ 2 Given that ρφ 2 is topologically equivalent to d2 , which extends the L metric d2 in Diamond and Kloeden as follows: √∫ [ ]2 φ e e eα , Veα ) dφ(α), d (U , V ) = dH (U 2

[0,1] φ Dθφ , ρφ 2 and d2 share all the topological advantages of the last one, separability among them. Thus, by following arguments similar to those in Diamond and Kloeden [7], the separability of the metric space (Fc (R), dφ 2 ) can be proved and, hence, φ Proposition 3 (Fc (R), DW ) is a separable metric space. φ φ Although DW and Dλ have been the first versions of Bertoluzza et al.’s φ φ metric, DK and Dθ have been preferred for most statistical developments. Some of the arguments supporting such a preference (see [3]) are the following:

– the mid/spread representation of fuzzy numbers provides some valuable results, especially in connection with regression studies; φ – DK and Dθφ can be extended to fuzzy sets of higher dimension Euclidean φ spaces (see N¨ather [27], K¨orner and N¨ather [21] for the extension of DK , φ and Trutschnig et al. [32] for the extension of Dθ ) on the basis of the support function of fuzzy sets (Puri and Ralescu [28]), which is an alternative characterization of fuzzy sets with compact convex levels through their boundaries; – the covariance of two random mechanisms producing fuzzy data can be formalized prompted by the ideas for generalized space-valued random elements.

12

Casals, Corral, Gil, L´ opez, Lubiano, Montenegro, Naval, Salas

4 Some applications to the statistical analysis of fuzzy data The arithmetic of fuzzy numbers is a basic tool for statistically analyzing fuzzy data. More concretely, the the sum of fuzzy numbers and the product of a real by a fuzzy number are the key operations in this setting. The usual arithmetic to be considered on Fc (R) is that based on Zadeh’s extension principle [34], which level-wise inherits the usual and natural interval e , Ve ∈ Fc (R) and γ ∈ R the sum U e + Ve is the fuzzy arithmetic, that is, for U number such that for each α ∈ [0, 1] e + Ve )α = Minkowski sum of U eα and Veα (U eα + inf Veα , sup U eα + sup Veα ], = [inf U e is the fuzzy number such that for each and the product by the scalar γ · U α ∈ [0, 1]

e )α = γ · U eα = (γ · U

[ ] eα , γ · sup U eα if γ ≥ 0  γ · inf U   [ ]   eα , γ · inf U eα otherwise.  γ · sup U

φ When the metric DW is combined with the usual fuzzy arithmetic it can be concluded that it is translational invariant, i.e., φ e φ e e DW (U + Te, Ve + Te) = DW (U , V ),

and in case W is associated with a symmetric distribution on [0, 1] (more generally, in case and only in case tW = 0.5) it is also rotational invariant, i.e., φ e , Ve ). e , (−1) · Ve ) = Dφ (U ((−1) · U DW W

Random fuzzy sets is another basic tool for the analysis of fuzzy data, especially to support appropriately the methods of analysis within a probabilistic framework. This concept was originally coined by Puri and Ralescu [29] as fuzzy random variables. Random fuzzy sets mean a mathematical model for the random mechanism generating fuzzy data. In the one-dimensional case, Fc (R), a random fuzzy set (or random fuzzy number) is formalized as follows: Definition 3 Given a probability space (Ω, A, P ), a random fuzzy number associated with it is a mapping X : Ω → Fc (R) such that the α-level mappings Xα : Ω → Kc (R), with Xα (ω) = (X (ω))α for every α ∈ [0, 1], is a random interval.

Bertoluzza et al.’s metric and fuzzy data analysis

13

A random fuzzy number can be proven to be Borel-measurable w.r.t. the φ Borel σ-field generated by the topology induced by DW (see, for instance, Gonz´alez-Rodr´ıguez et al. [15]). Consequently, one can trivially refer to the induced distribution of a random fuzzy number, the independence of two fuzzy numbers, and so on. The Aumann-type mean of X is one of the most valuable measures to summarize the information in the distribution of a random fuzzy number. If it e ) ∈ Fc (R) such that for each α ∈ [0, 1] exists, it is defined as the unique E(X ( ) e ) = [E(inf Xα ), E(sup Xα )] . E(X α

This notion is coherent with fuzzy arithmetic, so that if X = 1xe1 · x e1 + . . . + 1xer · x er , where x ei ∈ Fc (R) (i = 1, . . . , r) and 1 denoting the indicator function in Ω, then, e ) = P (X = x E(X e1 ) · x e1 + . . . + P (X = x er ) · x er . φ e ) is coherent with Fr´echet’s principle for D , that is, Moreover, E(X W ([ ]2 ) e ) = arg min E Dφ (X , U e) E(X . W e ∈Fc (R) U

e ) is supported by different Strong Laws of Large Numbers (see, for inE(X stance, Colubi et al. [6]). Other relevant summary measures of the distribution of a random fuzzy φ number are the Fr´echet variance based on DW (see, for instance, Lubiano et al. [23], Blanco-Fern´ andez et al. [2]), and the L1 -medians by Sinova et al. [30, 31]. The covariance of two random fuzzy numbers can be also introduced (see Gonz´alez-Rodr´ıguez et al. [13], Blanco-Fern´andez et al. [2]) in connection with the simple linear regression analysis between random fuzzy sets, although in φ but is based on the support function. this case it does not involve DW Estimating the population fuzzy-valued Aumann-type mean of X on the basis of a sample of independent observations from it is one of the statistical problems in which Bertoluzza et al.’s metric is involved. More concretely (see, for instance, Lubiano and Gil [22], Gonz´alez-Rodr´ıguez et al. [17], BlancoFern´andez et al. [2]), e ), the metric Dφ is used – in what concerns to the ‘point’ estimation of E(X W to quantify the estimation error; e ), Dφ is used to – in what concerns to the ‘confidence’ estimation of E(X W construct a confidence ball of this fuzzy-valued parameter. Another statistical problem involving Bertoluzza et al.’s metric is that of testing about the population fuzzy-valued Aumann-type mean of one or more random fuzzy numbers on the basis of a sample of independent observations from it or them. More concretely (see K¨orner [20], Montenegro et al. [25,26], Gonz´alez-Rodr´ıguez et al. [16,15], Gil et al. [10], and Blanco-Fern´andez et al. [2]),

14

Casals, Corral, Gil, L´ opez, Lubiano, Montenegro, Naval, Salas

e ) – in what concerns the one-sample testing of the null hypothesis H0 : E(X φ e = U the metric DW is used to test the equivalent null hypothesis H0 : φ e e ) = 0; DW (E(X ), U e ) – in what concerns the two-sample testing of the null hypothesis H0 : E(X φ e = E(Y) the metric DW is used to test the equivalent null hypothesis φ e e (E(X ), E(Y)) = 0 for independent and dependent samples; H 0 : DW e 1) – in what concerns the k-sample testing of the null hypothesis H0 : E(X φ e = . . . = E(Xm ) the metric DW is used to test the equivalent null hypothe)]2 ∑m [ φ ( e sis H0 : D E(Xi ), 1 (X1 + . . . + Xm ) = 0 for independent and i=1

W

m

dependent samples. A third statistical problem in which Bertoluzza et al.’s metric has been shown to be useful is that of the linear regression analysis between two random fuzzy numbers (see, for instance, Gonz´alez-Rodr´ıguez et al. [13], Blancoφ Fern´andez et al. [2]). DW has been employed to develop a least squares approach to solve the linear regression problem when the usual fuzzy arithmetic is considered. φ A fourth problem using DW is that of classifying fuzzy data (see Colubi et al. [5], Blanco-Fern´ andez et al. [2]). The metric has been considered to compute the distance between the fuzzy data to be classified and the set of training fuzzy data. An R package (http://cran.r-project.org/web/packages/SAFD/index.html) has been designed, and it is being continuously updated, by Lubiano and Trutschnig (see, for instance, Lubiano and Trutschnig [24]). It provides several basic functions to carry out statistics with one-dimensional fuzzy data in accordance with the statistical methodology based on Bertoluzza et al.’s metric.

5 Analyzing the impact of Bertoluzza et al.’s distance To end this paper an elementary statistical analysis is to be considered in connection with the impact of Bertoluzza et al.’s distance. For this purpose, we have examined three scientific databases, namely, the Web of Science (Thomson Reuters), SCOPUS (Elsevier) and Google Scholar. It should be highlighted that Mathware & Soft Computing, the journal Bertoluzza et al.’s distance has been published in, has not yet entered the two first databases. However, the three databases include the number of citations the paper has received, this number varying among the databases because of the type of documents they cover. Table 5 shows the number of citations per periods of three years in accordance with the three databases (notice that the number of citations in the last considered period is likely to increase since the period is not yet ended). It can easily be concluded from the table that this number is rather increasing, which means the concept is being widely used.

Bertoluzza et al.’s metric and fuzzy data analysis

15

1999–2001 2002–2004 2005–2007 2008–2010 2010–2013

Total

Web of Science

10

8

14

20

17

69

SCOPUS

10

6

15

23

18

72

Google Scholar

11

13

24

30

40

118

Table 5

Citations received by [1] Bertoluzza, C., Corral, N., Salas, A.: On a new class of distances between fuzzy numbers. Mathw. & Soft Comp. 2 71-84 (1995), in accordance with Web of Science, SCOPUS and Google Scholar

The citations have been classified in different categories. The eight first (by citations number) according to the Web of Science classification) are shown in Figure 3. Most of them correspond to Statistics & Probability, branch which was the original motivation for introducing the distance. It has also been widely applied in Computer Science and Mathematics.

Fig. 3 Distribution (percentages) of the papers citing [1] Bertoluzza, C., Corral, N., Salas, A.: On a new class of distances between fuzzy numbers. Mathw. & Soft Comp. 2 71-84 (1995) by Web of Science Categories (eight first categories) (Source: Web of Science)

Figure 4 shows the six first (by citations number) journals the citations have been published in. The paper has been also cited in chapters of multiauthors books published by Springer and included in the WoS.

Fig. 4 Distribution of the journals papers citing [1] Bertoluzza, C., Corral, N., Salas, A.: On a new class of distances between fuzzy numbers. Mathw. & Soft Comp. 2 71-84 (1995) by journals (six first ones) (Source: Web of Science)

16

Casals, Corral, Gil, L´ opez, Lubiano, Montenegro, Naval, Salas

Finally, Figure 5 shows the first (by citations number) countries the authors institutions belong to.

Fig. 5 Distribution of the papers citing [1] Bertoluzza, C., Corral, N., Salas, A.: On a new class of distances between fuzzy numbers. Mathw. & Soft Comp. 2 71-84 (1995) by countries (eight first ones) (Source: Web of Science)

These figures prove an increasing interest on [1], so we foresee that, in a few years, the numbers in the last two figures will substantially increase. 6 Concluding remarks In this paper we have presented a review on how Bertoluzza et al.’s metric has been applied aiming to analyze fuzzy data generated through a random process. It should be mentioned that this metric between fuzzy numbers can also be considered in order to test about distributions of real-valued random variables. We can state a statistical distance between probability distributions of real-valued random variables on the basis of the so-called characterizing fuzzy representation of a random variable (see Gonz´alez-Rodr´ıguez et al. [14], and also Blanco-Fern´ andez et al. in this issue [4]). This distance is given by Bertoluzza et al.’s one between the Aumann-type means of the characterizing fuzzy representations of these distributions. It can be used for estimating the distribution of a random variable, for Goodness-of-Fit testing, and for testing the equality of two or more distributions. The corresponding estimation and testing procedures derive from the particularization of the estimators of/tests about means of random fuzzy numbers we have succinctly recalled in Section 4. Acknowledgements This research has been partially supported by the Spanish Ministry of Science and Innovation Grant MTM2009-09440-C02-01 and the Principality of Asturias Grant SV-PA-13-ECOEMP-66. Their financial support is gratefully acknowledged. The authors also thank referees of the original manuscript for their helpful comments. The authors wish to mention that they are the ‘oldest’ representatives of the research group SMIRE of the University of Oviedo (Spain) in which all the members have, before or later, considered Bertoluzza et al.’s metric in their research. Therefore, this paper can be ´ also intended to be a tribute from the ‘youngest’ part of the group, namely, Angela BlancoFern´ andez, Ana Colubi, Sara de la Rosa de S´ aa, Marta Garc´ıa-B´ arzana, Gil Gonz´ alezRodr´ıguez, Ana Bel´ en Ramos-Guajardo and Beatriz Sinova.

Bertoluzza et al.’s metric and fuzzy data analysis

17

References 1. Bertoluzza, C., Corral, N., Salas, A.: On a new class of distances between fuzzy numbers. Mathw. & Soft Comp. 2 71–84 (1995) 2. Blanco-Fern´ andez, A., Casals, M.R., Colubi, A., Corral, N., Garc´ıa-B´ arzana, M., Gil, M.A., Gonz´ alez-Rodr´ıguez, G., L´ opez, M.T., Lubiano, M.A., Montenegro, M., RamosGuajardo, A.B., de la Rosa de S´ aa, S., Sinova, B.: Random fuzzy sets: a mathematical tool to develop statistical fuzzy data analysis. Iran. J. Fuzzy Syst. 10 1–28 (2013) 3. Blanco-Fern´ andez, A., Casals, M.R., Colubi, A., Corral, N., Garc´ıa-B´ arzana, M., Gil, M.A., Gonz´ alez-Rodr´ıguez, G., L´ opez, M.T., Lubiano, M.A., Montenegro, M., RamosGuajardo, A.B., de la Rosa de S´ aa, S., Sinova, B.: A distance-based statistical analysis of fuzzy number-valued data. Int. J. Approx. Reas. (doi:10.1016/j.ijar.2013.09.020) 4. Blanco-Fern´ andez, Ramos-Guajardo, A.B., Colubi, A.: Fuzzy representations of realvalued random variables: Applications to exploratory and inferential studies, Metron 71 (3), in this issue (doi:10.1007/s40300-013-0019-7) 5. Colubi, A., Gonz´ alez-Rodr´ıguez, G., Gil, M.A., Trutschnig, W.: Nonparametric criteria for supervised classification of fuzzy data. Int. J. Approx. Reas. 52 1272–1282 (2011) opez-D´ıaz, M., Dom´ınguez-Menchero, J.S., Gil, M.A.: A generalized Strong 6. Colubi, A., L´ Law of Large Numbers. Prob. Theor. Rel. Fields 114 401–417 (1999) 7. Diamond, P., Kloeden, P.: Metric spaces of fuzzy sets. Fuzzy Sets Syst. 100 63–71 (1999) 8. Gil, M.A., Gonz´ alez-Rodr´ıguez, G., Colubi, A., Montenegro, M.: Testing linear independence in linear models with interval-valued data. Comp. Stat. Data Anal. 51 3002–3015 (2007) 9. Gil, M.A., Lubiano, M.A., Montenegro, M., L´ opez, M.T.: Least squares fitting of an affine function and strength of association for interval-valued data. Metrika 56 97–111 (2002) alez-Rodr´ıguez, G., Colubi, A., Casals, M.R.: Boot10. Gil, M.A., Montenegro, M., Gonz´ strap approach to the multi-sample test of means with imprecise data. Comp. Stat. Data Anal. 51 148–162 (2006) 11. Goetschel, R. Jr., Voxman, W.: Topological properties of fuzzy numbers. Fuzzy Sets Syst. 10 87–99 (1983) 12. Goetschel, R. Jr., Voxman, W.: Elementary fuzzy calculus. Fuzzy Sets Syst. 18 31–43 (1986) 13. Gonz´ alez-Rodr´ıguez, G., Blanco, A., Colubi, A., Lubiano, M.A.: Estimation of a simple linear regression model for fuzzy random variables. Fuzzy Sets Syst. 160 357–370 (2009) 14. Gonz´ alez-Rodr´ıguez, G., Colubi, A., Gil, M.A. (2006). A fuzzy representation of random variables: an operational tool in exploratory analysis and hypothesis testing. Comp. Stat. Data Anal. 51 (1) 163–176 (2006) 15. Gonz´ alez-Rodr´ıguez, Colubi, A., Gil, M.A.: Fuzzy data treated as functional data. A one-way ANOVA test approach. Comp. Stat. Data Anal. 56 943–955 (2012) alez-Rodr´ıguez, G., Montenegro, M., Colubi, A., Gil, M.A.: Bootstrap techniques 16. Gonz´ and fuzzy random variables: Synergy in hypothesis testing with fuzzy data. Fuzzy Sets Syst. 157 2608–2613 (2006) alez-Rodr´ıguez, G., Trutschnig, W., Colubi, A.: Confidence regions for 17. Gonz´ the mean of a fuzzy random variable. In: Abstracts of IFSA-EUSFLAT 2009, http://www.eusflat.org/publications/proceedings/IFSA-EUSFLAT 2009/pdf/tema 1433.pdf. 18. Herencia, J.A., Lamata, M.T.: Solving a decision problem with graded rewards. Int. J. Intel. Syst. 14 (1) 21–44 (1999) 19. Herencia, J.A., Lamata, M.T.: A total order for the graded numbers used in decision problems. Int. J. Unc. Fuzz. Knowl. Based Syst. 7 267–276 (1999) orner, R.: An asymptotic α-test for the expectation of random fuzzy variables., J. 20. K¨ Stat. Plann. Infer. 83 331–346 (2000) 21. K¨ orner, R., N¨ ather, W.: On the variance of random fuzzy variables. In: Bertoluzza, C., Gil, M.A., Ralescu, D.A. (eds.), Statistical Modeling, Analysis and Management of Fuzzy Data. Physica-Verlag, Heidelberg (2002) 22. Lubiano, M.A., Gil, M.A.: Estimating the expected value of fuzzy random variables in random samplings from finite populations. Stat. Pap. 40 277–295 (1999) 23. Lubiano, M.A., Gil, M.A., L´ opez-D´ıaz, M., L´ opez, M.T.: The λ-mean squared dispersion associated with a fuzzy random variable. Fuzzy Sets Syst. 111 307–317 (2000)

18

Casals, Corral, Gil, L´ opez, Lubiano, Montenegro, Naval, Salas

24. Lubiano, M.A., Trutschnig, W.: ANOVA for fuzzy random variables using the R-package SAFD. In: Borgelt, C., Gonz´ alez-Rodr´ıguez, G., Trutschnig, W., Lubiano, M.A., Gil, M.A., Grzegorzewski. P., Hryniewicz, O. (eds.), Combining Soft Computing and Statistical Methods in Data Analysis. Springer-Verlag, Heidelberg (2010) 25. Montenegro, M., Casals, M.R., Lubiano, M.A., Gil, M.A.: Two-sample hypothesis tests of means of a fuzzy random variable. Inform. Sci. 133 89–100 (2001) 26. Montenegro, M., Colubi, A., Casals, M.R., Gil, M.A.: Asymptotic and Bootstrap techniques for testing the expected value of a fuzzy random variable. Metrika 59 31–49 (2004) 27. N¨ ather, W.: Random fuzzy variables of second order and applications to statistical inference. Inform. Sci. 133 69–88 (2001) 28. Puri, M.L., Ralescu, D.A.: The concept of normality for fuzzy random variables. Ann. Probab. 11 1373–1379 (1985) 29. Puri, M.L., Ralescu, D.A.: Fuzzy random variables. J. Math. Anal. Appl. 114 409–422 (1986) 30. Sinova, B., Gil, M.A., Colubi, A., Van Aelst, S.: The median of a random fuzzy number. The 1-norm distance approach. Fuzzy Sets Syst. 200 99–115 (2012) aa, S., Gil, M.A.: A generalized L1 -type metric between fuzzy 31. Sinova, B., de la Rosa de S´ numbers for an approach to central tendency of fuzzy data. Inform. Sci. 242 22-34 (2013) 32. Trutschnig, W., Gonz´ alez-Rodr´ıguez, G., Colubi, A., Gil, M.A.: A new family of metrics for compact, convex (fuzzy) sets based on a generalized concept of mid and spread. Inform. Sci. 179 3964–3972 (2009) 33. Vitale, R.A.: Lp Metrics for compact, convex Sets. J. Approx. Th. 45 280–287 (1985) 34. Zadeh, L.A.: The concept of a linguistic variable and its application to approximate reasoning, Part 1. Inform. Sci. 8 199–249 (1975); Part 2. Inform. Sci. 8 301–353 (1975); Part 3. Inform. Sci. 9 43–80 (1975)