A clustering-based visualization of colocation

1 downloads 0 Views 3MB Size Report
Visualization of colocations. Application. Conclusion. Application Context. New Caledonia. Exceptional biodiversity and caledonian lagoons declared a World.
A clustering-based visualization of colocation patterns Elise Desmier 1 , Frédéric Flouvat2 , Dominique Gay Selmaoui-Folcher 2

3

and Nazha

1

2

Université de Lyon, LIRIS, UMR5205 CNRS, Villeurbanne, France [email protected] University of New Caledonia, PPME, EA3325, Nouméa, New Caledonia [email protected] [email protected] 3 TECH/ASAP/PROF, Orange Labs, Lannion, France [email protected]

IDEAS’11, Lisboa

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

Toward a better visualization of spatial patterns One of the major issues in data mining (Han and Kamber 06) "the presentation and visualization of discovered knowledge expressed in high-level languages, visual representations, or other expressive forms so that the knowledge can be easily understood and directly usable by humans"

Problem with existing solutions No solutions to display spatial patterns (colocations) in a simple, concise and intuitive way for experts

Contribution A new visualization of colocations based on a heuristic clustering method easily usable and interpretable by domain experts additional spatial and thematic informations wrt "classical" colocations

Frédéric Flouvat

A clustering-based visualization of colocations

2 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

Outline

1

Context

2

Spatial pattern mining and visualization

3

Visualization of colocations

4

Application

5

Conclusion

Frédéric Flouvat

A clustering-based visualization of colocations

3 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

Application Context New Caledonia Exceptional biodiversity and caledonian lagoons declared a World Heritage site by the UNESCO But important mining projects (25% of world resources in Nickel), a tropical climate with cyclones and bush fires

Important soil erosion Strong impact on terrestrial and littoral ecosystems

➫ FO.S.T.ER. project (financed by the French government) A multidisciplinary consortium composed of specialists in data mining, image processing and geology Providing to geologists a semi-automatic and complete process for monitoring soil erosion Frédéric Flouvat

A clustering-based visualization of colocations

4 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

Data Complex data Heterogenous data : DEM, vegetation, soils occupation , climate, ... Large and spatial data ➫ Need of advanced analysis and modelization methods to assist experts

Spatial data mining Extracting interesting useful and unexpected knowledge in spatial data A large number of descriptive and/or predictive methods

• e.g. spatial decision trees, clustering, spatial pattern mining ... Focus on colocations (spatial patterns)

Frédéric Flouvat

A clustering-based visualization of colocations

5 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

Outline

1

Context

2

Spatial pattern mining and visualization

3

Visualization of colocations

4

Application

5

Conclusion

Frédéric Flouvat

A clustering-based visualization of colocations

6 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

What is a colocation ? First, the data Spatial objects associated to different features

• e.g. object 1 is characterized as "sparse vegetation" (A), object 7 as "mine" (C), and object 8 as "river erosion" (B) ➫ A1 , C7 and B8

Then, the pattern Colocation = subset of features whose objects are "often" located close to each other

• e.g. {A, C, B}, i.e. {sparse vegetation, mine, river erosion} Colocation instance = subset of objects having the features of the colocation and close to each other

• set of all instances of a colocation = table instance T I Frédéric Flouvat

A clustering-based visualization of colocations

7 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

Mining colocations(Shekhar et al. 01) Two important aspects The neighborhood relationship • e.g. euclidean distance, intersection, ... The measure "often located close to each other" • participation index (anti-monotone)

Mining Input : a set of spatial objects each one associated to a feature, a neighborhood relationship, and a threshold for the measure • data stored in a GIS Output : "frequent" colocations, i.e. those whose participation index is greater than a threshold Algorithm : classical levelwise mining algorithm • such as Apriori for itemset mining Frédéric Flouvat

A clustering-based visualization of colocations

8 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

Methods unsuited to expert needs Many works on colocations Improving algorithms performance Extracting local patterns Reducing the number of colocations ...

Problems No visualization of colocations adapted to expert needs and practices

• necessary to extract relevant informations

Frédéric Flouvat

A clustering-based visualization of colocations

9 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

Visualizing data mining results Three main approaches to visualize data mining results : 1. Textual representation

• basically a list of patterns with interestingness measures • ex. : textual visualization of colocation patterns

➫ simple but not easily understandable by domain experts

Frédéric Flouvat

A clustering-based visualization of colocations

10 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

Visualizing data mining results Three main approaches to visualize data mining results : 2. Abstract representation (e.g. plots, matrices, graphs, trees or cubes)

• condense and informative visual representations of the solutions with statistics

• • •

ex. : grid representation of association rules in MineSet (Brunk et al. 97) ex. : radial hierarchical layout to represent frequent itemsets (Keim et al. 05) ex. : orthogonal graphs to represent frequent itemsets (Leung et al. 08)

➫ not really adapted to spatial patterns

• •

in spatial pattern mining, spatiality is not just an other dimension of analysis for domain experts, the spatial dimension is the basis of their interpretation

Frédéric Flouvat

A clustering-based visualization of colocations

11 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

Visualizing data mining results Three main approaches to visualize data mining results : 3. Cartographic representation

• first solution : visualization of spatial pattern instances on a map

• •

ex. : classical cartographic visualization of spatial clusters with colors ex. : select an association rule and visualize its interestingness measure for each country (Andrienko 99)

➫ not possible to display all colocations instances (such as in spatial cluster analysis) ➫ "select a pattern and display its instances" gives only a local view of one pattern

Frédéric Flouvat

A clustering-based visualization of colocations

12 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

Visualizing data mining results Three main approaches to visualize data mining results : 3. Cartographic representation

• second solution : generating visual representations of the solutions



ex. : clusters of trajectories summarized by "representative trajectories" using a classifier and visual refinement (Andrienko 09)

➫ not directly usable for colocation patterns but an interesting approach Frédéric Flouvat

A clustering-based visualization of colocations

13 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

Outline

1

Context

2

Spatial pattern mining and visualization

3

Visualization of colocations

4

Application

5

Conclusion

Frédéric Flouvat

A clustering-based visualization of colocations

14 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

Our approach Problem How to visualize interesting colocations on a map ?

Motivations Have a easily usable and interpretable visual representation for experts Give additional spatial and thematic informations Give a global cartographic view of the solutions

Frédéric Flouvat

A clustering-based visualization of colocations

15 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

A colored and labeled clique representation of colocations A natural visual representation of a colocation A clique node = object-type (i.e. feature) vertex = neighborhood relationship Example : Colocation {mining zone, sparse vegetation, sensitive trail, river erosion} Visual representation :

Frédéric Flouvat

A clustering-based visualization of colocations

16 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

A colored and labeled clique representation of colocations Additional informations Node coloration to represent thematic informations Edge coloration to visualize the interestingness measure, i.e. the prevalence of the colocation Example : Colocation {mining zone, sparse vegetation, sensitive trail, river erosion} with participation index = 0.8 Visual representation :

Frédéric Flouvat

A clustering-based visualization of colocations

17 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

Spatial representation of colocations How to position the visual representations of colocations on the map ? In other words, how to position the clique nodes ? ➫ Using a "spatialization" function Summarize spatial informations on its colocation instances • only spatial objects (instances) have spatial informations Allow to visualize where and how instances of an interesting colocation are generally located

Frédéric Flouvat

A clustering-based visualization of colocations

18 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

A first basic spatialization function A centroid based spatialization function The centroid = a basic approach to summarize a set of points • "average" of all points ➫ For each clique node, generate the centroid of its feature instances • ex. : for colocation {A, B, C}, node A is the centroid of spatial objects {A1 , A5 } (i.e. objects with A belonging to the table instance of {A, B, C})

Frédéric Flouvat

A clustering-based visualization of colocations

19 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

A first basic spatialization function Problem with this centroid based spatialization function

➫ Solution : using clustering to allow several representations for each colocation Frédéric Flouvat

A clustering-based visualization of colocations

20 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

A clustering-based spatial representation of colocations Principle For each interesting colocation, Cluster its instances Process the position of each colocation feature in each cluster, using the centroid based spatialization function Draw the colored and labeled clique representation

Frédéric Flouvat

A clustering-based visualization of colocations

21 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

A clustering-based spatial representation of colocations Interest of this approach : a better representation of colocation instances Show where an interesting colocation is generally located Ex. : colocation {A, B, C} is generally located the north-west (and mainly in this area) Show how features in a colocation are w.r.t. each others Ex. : objects in colocation {A, B, C} are relatively far from each other • show for example that mines and sparse vegetation have an indirect impact on erosion (colocation {mine, sparse vegetation, erosion}) ➫ Difficult to have such informations with "classical" approaches

One major problem : scalability All clusterings (one for each colocation) may be computationally expensive Frédéric Flouvat

A clustering-based visualization of colocations

22 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

Improving performance Optimizing memory occupation by combining colocation mining algorithm and visualization Clustering and visualization not done in a post-processing step

• Avoid storage of all colocation instances in memory Each colocation is mined one by one and their visualization is done at the same time

• Integrate visualization in the mining algorithm Optimizing execution time using a heuristic clustering method

Frédéric Flouvat

A clustering-based visualization of colocations

23 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

Heuristic clustering approach Observation : colocation instances share lots of spatial objects e.g. colocations {A, B} and {A, B, C} share spatial objects A1 and A5 ➫ If clustering in post-processing step, some processing will be done several times e.g. computing distances between A1 and A5

Proposition A two-step clustering approach integrated in the mining algorithm a clustering of each feature instances, run once at the beginning of the algorithm

• i.e. one clustering for A objects, one clustering for B objects, ... a clustering of each colocation instances based on the previous clusters, using a merge and split approach Frédéric Flouvat

A clustering-based visualization of colocations

24 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

Heuristic clustering approach

Frédéric Flouvat

A clustering-based visualization of colocations

25 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

Focus on the Merge and Split approach Principle : Select the feature f in the current colocation C, having the highest number of clusters Split instances of C w.r.t. clusters of f Problem : "conflictual clusters", i.e. object instances belonging to several partitions Ex. : Y2 is in the first instance partition and in the second one Solution : merge clusters leading to a conflict Ex. : merge first and second clusters of Z ➫ Merge and split approach : alternate merge and split until no change Frédéric Flouvat

A clustering-based visualization of colocations

26 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

Outline

1

Context

2

Spatial pattern mining and visualization

3

Visualization of colocations

4

Application

5

Conclusion

Frédéric Flouvat

A clustering-based visualization of colocations

27 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

Experimentations Data Studied area : mountainous watershed of 9km2 3 thematic layers : • erosion : "not bare ground" or different types of "bare ground" (6 features) • nature of the ground : lithology (13 features) • vegetation : types of vegetation (13 features) ➫ 32 features and more than 7000 objects

Experimental protocol Spatial relationships : euclidean distance between areas Several participation index thresholds ➫ Results studied by a geologist expert in soil erosion of the studied area

Frédéric Flouvat

A clustering-based visualization of colocations

28 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

Experimentations : Map readability Number of patterns displayed to users an important indicator for visualization methods if too much patterns are displayed, then interpretation is difficult Distance 200m

300m

nb colocations avg nb instances for a colocation total nb instances for all colocations nb colocations displayed by our approach nb colocations avg nb instances for a colocation total nb instances for all colocations nb colocations displayed by our approach

Participation index threshold 0.5 0.3 0.1 21 68 266 16 478 11 974 8 365 346 046 814 263 2 225 118 31 112 510 55 50 803 2 794 205 84

163 78 347 12 770 670 258

711 87 100 61 928 727 1349

➫ No more than twice the number of colocations If too much, possibility to use the zoom functionality of the GIS to filter ➫ Enables to compare our approach with classical visualization approaches "select a pattern and display its instances" approach = average number of instances for a colocation Frédéric Flouvat

A clustering-based visualization of colocations

29 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

Experimentations : Performance evaluation Execution time versus our approach and a "basic" post-processing clustering approach post-processing approach = executing a DBScan clustering on each table instance after colocation extraction 5802 objects and 18 features 100000

7642 objects and 32 features 100000

Spatial clustering-based colocation mining Colocation mining then DBScan clustering

10000

Total Time (sec)

Total Time (sec)

10000

Spatial clustering-based colocation mining Colocation mining then DBScan clustering

1000

100

10

1000

100

10

1

1 0.8

0.7

0.6

0.5

0.4

0.3

Minimum participation index

0.2

0.1

0

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

Minimum participation index

➫ Our approach more efficient than the basic approach

Frédéric Flouvat

A clustering-based visualization of colocations

30 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

Experimentations : Expert feedback Example of result provided to our expert by our prototype

➫ Point out known correlations about soil erosion in this area e.g. highlight the environmental damage near the areas where there are humans activities ➫ Interest of our approach for experts Give a global picture on where and how colocations are generally located Quickly identify new patterns, then focus on some of these patterns and study more deeply their instances Frédéric Flouvat

A clustering-based visualization of colocations

31 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

Outline

1

Context

2

Spatial pattern mining and visualization

3

Visualization of colocations

4

Application

5

Conclusion

Frédéric Flouvat

A clustering-based visualization of colocations

32 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

Conclusion & Perspectives Conclusion Proposition of a new clustering based visualization of colocations A colored and labeled clique representation with thematic and prevalence informations A spatialization of colocation using a heuristic clustering method and a centroid based positioning ➫ An easily usable and interpretable global picture of the solutions ➫ Good scalability

Main perspectives Improving algorithm performance with dedicated data structures, spatial indexes, or new mining strategies Improving our prototype Extending our approach to other patterns, e.g. sequential spatio-temporal patterns Frédéric Flouvat

A clustering-based visualization of colocations

33 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

Questions ? Thank you

Frédéric Flouvat

A clustering-based visualization of colocations

34 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

Our approach Problem How to visualize interesting colocations on a map ?

Principle of our solution Generate a clique representation of each colocation and georeference this representation on a map using clustering

Frédéric Flouvat

A clustering-based visualization of colocations

35 / 36

Context

Spatial pattern mining and visualization

Visualization of colocations

Application

Conclusion

Formal definition of a visual colocation representation A colored and labeled clique representation of a colocation C = a colored and labeled clique Gcol C = (VC , EC , Ltype , Lpi , Ltheme ), where VC is the set of vertices, EC = {(u, v) ∈ VC × VC | u 6= v} is the set of edges, Ltype : VC → C is a labelling function that assigns an object-type f ∈ C to a vertex v ∈ VC , S Lpi : C EC → Col is a coloring function that assigns a color k ∈ Col = {1, 2, ..., m} (m ≥ 1) to a colocation edge based on the prevalence measure pi(C) ∈ [0, 1] (saturation factor), and Ltheme : VC → Coltheme is a coloring function that assigns the thematic color k ′ ∈ Coltheme = {1, 2, ..., m′ } (m′ ≥ 1) of object-type Ltype (v) to a vertex v ∈ VC .

Frédéric Flouvat

A clustering-based visualization of colocations

36 / 36