Comparing Meta-Learning Algorithms - Semantic Scholar

1 downloads 0 Views 1MB Size Report
Pontifical Catholic University of Paraná – PUCPR. Curitiba – PR, Brazil. Graduate Program in Computer Science - PPGIA. Comparing Meta-Learning. Algorithms.
Pontifical Catholic University of Paraná – PUCPR Curitiba – PR, Brazil Graduate Program in Computer Science - PPGIA

Comparing Meta-Learning Algorithms Fabrício Enembreck Bráulio Coelho Ávila

1

Outline  Introduction – – – –

The problem Motivation Distributed Data Mining Concepts Hypothesis

 Our Approach  Results  Discussion

Pontifical Catholic University of Paraná - PUCPR

2

Introduction  The Problem – How to discover correct and understandable knowledge from different sites? – Low communication rates – Low data transfer rates – How to integrate knowledge acquired from distributed sites  Motivations – Domains with distributed data (e.g. network of stores) – Processing of partitioned data improves performance – Confidentiality and volume inhibit data transfer

Pontifical Catholic University of Paraná - PUCPR

3

Introduction New Instance

. . . Data Sets

. . . Learning Algorithms

. . . Rules Sets (Classifiers)

Combination Strategy

New Instance Classified

Classification Distributed Data-Mining Process Pontifical Catholic University of Paraná - PUCPR

4

Introduction New Instance Arbiter

. . . Data Sets

. . .

Predictions

Learning Algorithms

Base Classifiers

Arbitration Rule

New Instance Classified

Classification Arbiter Combination Strategy Pontifical Catholic University of Paraná - PUCPR

5

Introduction New Instance

. . . Data Sets

. . . Learning Algorithms

. . .

Combiner Combination Strategy

Base Classifiers

New Instance Classified

Classification Combiner Strategy Pontifical Catholic University of Paraná - PUCPR

6

Introduction  Hypothesis – Well-known meta-learning approaches can satisfy partially our expectations – Knowledge is supposed to be understandable – Knowledge integration can be accomplished with a search into a space of hypothesis – Any rule-base learning algorithm searches a space of hypothesis!!! – A global classifier is a composition of local classifiers

Pontifical Catholic University of Paraná - PUCPR

7

Our Approach - KNOMA

New Instance

. . . Data Sets

. . .

. . .

Single Rule Set R

Learning Preprocessing Meta-Training Algorithm Set

Learning Rules Sets Ri Algorithms (Classifiers)

MetaClassifier

New Instance Classified

Classification

Pontifical Catholic University of Paraná - PUCPR

8

Our Approach Binary Attributes

R1 R2 R3 . . .

Atest1 Atest2 ... A...test-m Aclass

IF test1 and test2 Then class1 IF test3 Then class1 IF test4 and test5 and test6 Then class2 ... IF test7 and test8 Then class3 IF test1 Then class1 IF test5 Then class2 ... IF test2 Then class4 IF test3 and test9 and test10 Then class2 IF test11 and test12 Then class1 ...

. . .

Meta-Training Data

R Pontifical Catholic University of Paraná - PUCPR

9

Experiments  Evaluates accuracy of the knowledge integration process  Uses RIPPER and C4.5Rules as base-learning algorithm  Evaluates sensitivity in relation to the stability of different algorithms (RIPPER and C4.5)

RIPPER, Bagging, Boosting, C45Rules 10% 10 times 90%

Pontifical Catholic University of Paraná - PUCPR

KNOMA

10

Results DB \ Alg. Monk2 Audiology Monk3 Ionosphere Thyroid Tic-Tac-Toe Iris Soybean Monk1 Glass

KNOMA-RIP 66,67 +- 6,71 79,72 +- 6,07 89,42 +- 9,65 92,72 +- 5,05 98,16 +-0,48 98,33 +- 1,33 98,00 +-3,05 86,37 +- 2,58 68,59 +- 7,37 68,92 +- 9,37

RIPPER 58,58 +- 7,72 73,45 +- 11,30 86,06 +- 11,16 90,31 +- 5,14 97,08 +- 0,51 97,07 +- 1,13 95,33 +- 6,70 91,21 +- 3,71 82,25 +- 13,75 70,56 +- 8,49

Bagging-RIP 56,8 +- 9,11 77,87 +- 11,28 90,98 +- 6,94 92,02 +- 5,97 97,85 +- 0,42 98,22 +- 1,32 96,00 +-4,42 92,02 +-2,41 97,58 +- 4,92 74,76 +-7,96

Pontifical Catholic University of Paraná - PUCPR

Boosting-RIP 62,13 +- 6,33 82,74 +- 6,18 89,34 +- 4,87 92,59 +- 4,43 98,00 +- 0,49 98,64 +- 1,69 94,67 +- 6,53 92,59 +- 2,78 86,29 +- 11,24 74,30 +-10,50

11

Results DB \ Alg. Monk2 Audiology Monk3 Ionosphere Thyroid Tic-Tac-Toe Iris Soybean Monk1 Glass

KNOMA-C45R 62,13 +-2,77 67,31 +-7,13 93,29 +-6,9 96,02 +-3,63 97,03 +-0,75 73,16 +-2,40 98,00 +-3,05 96,19 +-2,38 68,72 +-15,18 47,62 +-4,47

C45Rules 62,12 +-2,78 51,51 +-13,37 81,51 +-10.79 93,84 +-2,24 96,41 +-1,89 70,42 +-7,22 95,99 +-4,42 13,47 +-0,66 81,21 +-10,56 66,2 +-5,83

Pontifical Catholic University of Paraná - PUCPR

C45 61,56 +-5,79 77,91 +-10,27 93,47 +-7,23 92,83 +-3,13 97,85 +-0,48 93,42 +-2,28 96,29 +-4,56 91,52 +-4,85 71,30 +-11,44 67,49 +-5,27

12

Results DB \ Alg. Monk2 Audiology

KNOMA-C45R 62,13 +-2,77 67,31 +-7,13

C45 61,56 +-5,79 77,91 +-10,27

Bagging-C45 60,96 +-9,85 81,97 +-8,22

Boosting-C45 64,48 +-5,30 85,02 +-7,02

Monk3 Ionosphere Thyroid Tic-Tac-Toe Iris Soybean Monk1 Glass

93,29 +-6,9 96,02 +-3,63 97,03 +-0,75 73,16 +-2,40 98,00 +-3,05 96,19 +-2,38 68,72 +-15,18 47,62 +-4,47

93,47 +-7,23 92,83 +-3,13 97,85 +-0,48 93,42 +-2,28 96,29 +-4,56 91,52 +-4,85 71,30 +-11,44 67,49 +-5,27

92,63 +-6,91 93,18 +-4,00 97,98 +-0,45 92,17 +-2,92 95,33 +-5,20 91,03 +-3,10 73,53 +-10,64 72,79 +-10,40

89,42 +-8,90 93,17 +-3,38 97,56 +-0,53 96,65 +-1,47 93,33 +-6,67 92,97 +-2,34 86,34 +-8,12 73,29 +-9,11

Pontifical Catholic University of Paraná - PUCPR

13

Discussion  KNOMA has improved the performance of RIPPER and C45Rules  With Ripper, KNOMA is very close to Bagging and Boosting  With C45Rules, Boosting seems to be better  Boosting is better for non-stable algorithms like C45 and C45Rules

Pontifical Catholic University of Paraná - PUCPR

14

Discussion  KNOMA depends on the stability of the base-learning algorithm: error is high when base-classifiers are very different  Improvements should be done: – Meta-Attributes: • (RI >=1,5172) (RI >= 1,517)

– Meta-Instances: • Rules can have different weights (accuracy/support)

– Experiments: • % of data vs number of partitions

Pontifical Catholic University of Paraná - PUCPR

15