Pontifical Catholic University of Paraná â PUCPR. Curitiba â PR, Brazil. Graduate Program in Computer Science - PPGIA. Comparing Meta-Learning. Algorithms.
Pontifical Catholic University of Paraná – PUCPR Curitiba – PR, Brazil Graduate Program in Computer Science - PPGIA
Comparing Meta-Learning Algorithms Fabrício Enembreck Bráulio Coelho Ávila
1
Outline Introduction – – – –
The problem Motivation Distributed Data Mining Concepts Hypothesis
Our Approach Results Discussion
Pontifical Catholic University of Paraná - PUCPR
2
Introduction The Problem – How to discover correct and understandable knowledge from different sites? – Low communication rates – Low data transfer rates – How to integrate knowledge acquired from distributed sites Motivations – Domains with distributed data (e.g. network of stores) – Processing of partitioned data improves performance – Confidentiality and volume inhibit data transfer
Pontifical Catholic University of Paraná - PUCPR
3
Introduction New Instance
. . . Data Sets
. . . Learning Algorithms
. . . Rules Sets (Classifiers)
Combination Strategy
New Instance Classified
Classification Distributed Data-Mining Process Pontifical Catholic University of Paraná - PUCPR
4
Introduction New Instance Arbiter
. . . Data Sets
. . .
Predictions
Learning Algorithms
Base Classifiers
Arbitration Rule
New Instance Classified
Classification Arbiter Combination Strategy Pontifical Catholic University of Paraná - PUCPR
5
Introduction New Instance
. . . Data Sets
. . . Learning Algorithms
. . .
Combiner Combination Strategy
Base Classifiers
New Instance Classified
Classification Combiner Strategy Pontifical Catholic University of Paraná - PUCPR
6
Introduction Hypothesis – Well-known meta-learning approaches can satisfy partially our expectations – Knowledge is supposed to be understandable – Knowledge integration can be accomplished with a search into a space of hypothesis – Any rule-base learning algorithm searches a space of hypothesis!!! – A global classifier is a composition of local classifiers
Pontifical Catholic University of Paraná - PUCPR
7
Our Approach - KNOMA
New Instance
. . . Data Sets
. . .
. . .
Single Rule Set R
Learning Preprocessing Meta-Training Algorithm Set
Learning Rules Sets Ri Algorithms (Classifiers)
MetaClassifier
New Instance Classified
Classification
Pontifical Catholic University of Paraná - PUCPR
8
Our Approach Binary Attributes
R1 R2 R3 . . .
Atest1 Atest2 ... A...test-m Aclass
IF test1 and test2 Then class1 IF test3 Then class1 IF test4 and test5 and test6 Then class2 ... IF test7 and test8 Then class3 IF test1 Then class1 IF test5 Then class2 ... IF test2 Then class4 IF test3 and test9 and test10 Then class2 IF test11 and test12 Then class1 ...
. . .
Meta-Training Data
R Pontifical Catholic University of Paraná - PUCPR
9
Experiments Evaluates accuracy of the knowledge integration process Uses RIPPER and C4.5Rules as base-learning algorithm Evaluates sensitivity in relation to the stability of different algorithms (RIPPER and C4.5)
RIPPER, Bagging, Boosting, C45Rules 10% 10 times 90%
Pontifical Catholic University of Paraná - PUCPR
KNOMA
10
Results DB \ Alg. Monk2 Audiology Monk3 Ionosphere Thyroid Tic-Tac-Toe Iris Soybean Monk1 Glass
KNOMA-RIP 66,67 +- 6,71 79,72 +- 6,07 89,42 +- 9,65 92,72 +- 5,05 98,16 +-0,48 98,33 +- 1,33 98,00 +-3,05 86,37 +- 2,58 68,59 +- 7,37 68,92 +- 9,37
RIPPER 58,58 +- 7,72 73,45 +- 11,30 86,06 +- 11,16 90,31 +- 5,14 97,08 +- 0,51 97,07 +- 1,13 95,33 +- 6,70 91,21 +- 3,71 82,25 +- 13,75 70,56 +- 8,49
Bagging-RIP 56,8 +- 9,11 77,87 +- 11,28 90,98 +- 6,94 92,02 +- 5,97 97,85 +- 0,42 98,22 +- 1,32 96,00 +-4,42 92,02 +-2,41 97,58 +- 4,92 74,76 +-7,96
Pontifical Catholic University of Paraná - PUCPR
Boosting-RIP 62,13 +- 6,33 82,74 +- 6,18 89,34 +- 4,87 92,59 +- 4,43 98,00 +- 0,49 98,64 +- 1,69 94,67 +- 6,53 92,59 +- 2,78 86,29 +- 11,24 74,30 +-10,50
11
Results DB \ Alg. Monk2 Audiology Monk3 Ionosphere Thyroid Tic-Tac-Toe Iris Soybean Monk1 Glass
KNOMA-C45R 62,13 +-2,77 67,31 +-7,13 93,29 +-6,9 96,02 +-3,63 97,03 +-0,75 73,16 +-2,40 98,00 +-3,05 96,19 +-2,38 68,72 +-15,18 47,62 +-4,47
C45Rules 62,12 +-2,78 51,51 +-13,37 81,51 +-10.79 93,84 +-2,24 96,41 +-1,89 70,42 +-7,22 95,99 +-4,42 13,47 +-0,66 81,21 +-10,56 66,2 +-5,83
Pontifical Catholic University of Paraná - PUCPR
C45 61,56 +-5,79 77,91 +-10,27 93,47 +-7,23 92,83 +-3,13 97,85 +-0,48 93,42 +-2,28 96,29 +-4,56 91,52 +-4,85 71,30 +-11,44 67,49 +-5,27
12
Results DB \ Alg. Monk2 Audiology
KNOMA-C45R 62,13 +-2,77 67,31 +-7,13
C45 61,56 +-5,79 77,91 +-10,27
Bagging-C45 60,96 +-9,85 81,97 +-8,22
Boosting-C45 64,48 +-5,30 85,02 +-7,02
Monk3 Ionosphere Thyroid Tic-Tac-Toe Iris Soybean Monk1 Glass
93,29 +-6,9 96,02 +-3,63 97,03 +-0,75 73,16 +-2,40 98,00 +-3,05 96,19 +-2,38 68,72 +-15,18 47,62 +-4,47
93,47 +-7,23 92,83 +-3,13 97,85 +-0,48 93,42 +-2,28 96,29 +-4,56 91,52 +-4,85 71,30 +-11,44 67,49 +-5,27
92,63 +-6,91 93,18 +-4,00 97,98 +-0,45 92,17 +-2,92 95,33 +-5,20 91,03 +-3,10 73,53 +-10,64 72,79 +-10,40
89,42 +-8,90 93,17 +-3,38 97,56 +-0,53 96,65 +-1,47 93,33 +-6,67 92,97 +-2,34 86,34 +-8,12 73,29 +-9,11
Pontifical Catholic University of Paraná - PUCPR
13
Discussion KNOMA has improved the performance of RIPPER and C45Rules With Ripper, KNOMA is very close to Bagging and Boosting With C45Rules, Boosting seems to be better Boosting is better for non-stable algorithms like C45 and C45Rules
Pontifical Catholic University of Paraná - PUCPR
14
Discussion KNOMA depends on the stability of the base-learning algorithm: error is high when base-classifiers are very different Improvements should be done: – Meta-Attributes: • (RI >=1,5172) (RI >= 1,517)
– Meta-Instances: • Rules can have different weights (accuracy/support)
– Experiments: • % of data vs number of partitions
Pontifical Catholic University of Paraná - PUCPR
15