Mining Hyperclique Patterns with Confidence ... - Semantic Scholar

2 downloads 100 Views 304KB Size Report
atl City Corp Й , Wells FargosЙ , BankAmerica Corp Й ... Paul Cos Й , Transamerica CorgЙ , Torchmark Corp Й ,В .S. BanCorp Й ... 24 Bemis CosЙ , Columbia/HCA Hlt Й , Dayton Hudson Й , astmanЈ odak Й , Grace(W.R.)WЙ , Illinois Tool ...
Mining Hyperclique Patterns with Confidence Pruning Hui Xiong

Pang-Ning Tan

Vipin Kumar

Computer Science Department University of Minnesota 200 Union Street SE Minneapolis, MN-55455,USA

Computer Science Department University of Minnesota 200 Union Street SE Minneapolis, MN-55455,USA

Computer Science Department University of Minnesota 200 Union Street SE Minneapolis, MN-55455,USA

[email protected]

[email protected]

[email protected]

% X  /x.+^ Bjo+  $(*X + ) l%4[ &!  "yYz DR{DG| DE}D z)~D9z5D_z Dq 5€7!:+ +  "[    +?@=+ &, l *+ $(*F,  / f  F:% F+ 6/  *  o,   6 B ,G . $/ <  +)79   # $"* $% !9  l%&‚8:,  jy | €- &S„ƒ„ " 5H Ol'M  '6 $(E,  / +>7

ABSTRACT

        !  "#"  $% &'% )(* +  % # , ,-* . 0/ +1,    "  +"234 5(*+ 6$+ +. " ,  +  )798:$%   " %;% +  J  + + 7LKM% ;,  / N/-++ !+O(*+1& = $)CB /-^+ ‚% 4z5·=7„s:)HMl( +D %= 5Hn  , ,G .C% + % *$#  "  + +&)(O ,       +  @$(* $( ";$+! Be *N $QR+ +*& , ,- .#$(*+)7]Kq3 Be2L%  J  & +a   &)D-%  +%  a 66 .^/-O/  #b&+$+$2> 5(*+ ^ + +. "^,    ‚j(*+ .2X5HT , ,- .„% + % *$ )7jKq^$B2 % C ++  6 a  +!+*)D% 9+l%  $a C6  q/-M/ ‚',    )HM)2=& .% Bp .+ O%      4Be +a +^,  + [" + +  "="  $% !' %  ;8:,    IrF%  [D:,  .+  $2n>5H $(*+> B  , ,-* .)7



}7 c !   + /GOdHF;,-+9, , )  @ BC!  ";%2 ,G+ a „,    ` +   "&)  ;O   ++T J  T% >! 6  +  J  + +? B`G,G  / :  +C% M+ #/G^o  l+=Be  † " $(*+O,   E79µ^O% .   "*Y2;$&, $+ % »¤$&¼l7LKBe     !(*+ .lo  [„% + ?(*+ .++@$   J  +LE  /l9PmRþ  «  ´   ooo,  v ´  s‹ ´ qp 7

2.2 Problem Statement

c M)!Be #F% M%*2,-+ +a /G@+ %  @ Bq% @/ +$  " * Y% [7 c 6  &?% :,    "=   "&%  %+  J  + + % +%   „)!/G', , &‚?,G . , +  "@.+,!Be j% + " * Y% &+7

s:5HF(*+ +D   ?)> % 5H…% M%   +$*=  ^HNþ    ooo  p 7VK
».z5¼

±u²³,³´+µ ®‚2=  J  $ 3z DHF@% )(*^% @Be*$5H'$ " Å

z

ooo,  |$} } i» þ  p ¼ ¼ |$} } »iþ  ´   ooo qp   ? ÿ p þ |$ »iþ   ¼® o   } } ´®° °

´



K  +"2@ &$ ^=%   , ,- . / \,    "T  +"2¹ +…/*2no $ "1Bp a +* ,  + n!  "1" * Y% &+7UK*Bq% @%+  J  +  &)  [+   #% +$,V,   >% * [YoT)   [,   +  )D`  +6(*+ .2 /  @ B9%  O,  +  @ O%2,-+ . +a @,  +  +7

%¢ú®ì–í ÁsÁ •ÆÅ  •Ç %

»iþ  ´ ¼öõ |$} } »iþ   ¼lD+HF„"*lOMl  9P7 »B' ¼sõVì–í ÁhÁ î ¢• ð Å òh% õ j+á 7 ì–í ÁsÁ î ®ò % S  .+, 7%*2,-+ +$a >,  +  /X)©O™ / +P P% =Be )H' ">% +=+   $$* +Å>z5¼@ *!#A*z# / [ BXL[ 4$  O™ v 7œ»¤t'++ $?% >% [,   L +,\ ´ r:7KM%  %  X*   + +3% &  + Be* ?%  $+!@>/G;þ8XD_®™ «üþN r 6»0~7 |¼lDþ8XDN r @«üþN ® 6»0~7 ›*¼lD*  =þ®^DN r @«üþ58 8 6»0~7 ¶*¼lD H'%  ?% ?(  @$;% @,  +% +"  B )*$2T/- + !% ]+ .+  "L/  +¹ ]Be +a +*#,  +  )7 ++  EDR&%*2,-+ +a /  ++$  +  "&% +!  ^ ^ l a $ = !+   $i23 +  $*\y ¸€0D„/-+) #$6 X% =%2 ,-+ "  , %#! +  +, + +F +  C !  " $+!)7qhi#+ 













  



  

4.2 Proofs of Completeness and Correctness

 ¤Â³²¤‚¥

™-•’˜d’e

z 7„¸`‰ ’P‰¥™ ’“ +• Š¤¦+” ’T¢6ŠeG’“T£• ‹*Ž“lŠe˜¤‰¢­Še–1Ž¢6Œ

±u²³,³´+µ KM% :+ &, l+ +‚ BR% :%2,G+ a :&$ + C"  

$%  + >/G % )H'[/*2;%  Be 5H+  J  +*:%2,G +a  ,  +  )79KM% 'Be *†O " @&$  + M*!, 2*7 Confidence-Pruning Effect Number of Hyperclique Patterns

1e+08

6. EXPERIMENTAL EVALUATION

c @, +:o+ Y(@lo,-+ !+

*@ *A CB ED F%G.HI;J< C&=H LK

.- / 10325476 98 =; =; =; ;

TS

S



400

0.01

0.015

0.02

0.025

Minimum Support Thresholds

ml

Ò+†`Ô9ÎÍ6gߤʈ‡`Ì`Î

µ: ‚o,-+ !+9HF+ 

Minimum Support Thresholds

Ñ6Ò`ÎmÎ$ýE΂gÌEÖÊeÕGÙ¾ÖʤÜZÎfÕGÚÖÒ`Î ÕRÙ ¯*° ±-²+³ ×`Þ Ö)ÞLÓ+ÎÖ*Ý

r>

œ ÿ(ü

Ò+†EÔ9ÎÍ6gßpÊƇ`ÌEÎ

Sj$"* ;}>% 5H! + ! VrFs:8:t:u Be O% #¯*° ±G²³T Z )7T8:

% )H'Z4%  J "   DE% O 6/-+ ? B‚,  +  ? 5(*+ +Z/*2   !"  $% œ6(*+ F  + O*B¸| 7 *·#7Lt'l )qBe * Kj/ =z % ^ ) $24¸ › 7 ›·†*Bj% 6$+!:% )(*6  , ,-* . + ?%  L~7 | 7 c $%L; , ,- . % + %  L" )+ ?%  ~7 | DCrCs:8:t:u )T  $2P$ +B21 +  6&  "Z[( .2 &CBe  P BM% =$&+7Zu4 +5(*+ +Dj% =lo++$*P! Be 'rFs:8:t:u M+     / Y2=% " % + F% ;%2,G +a @! + +D 6 % 5H'LLSj"  ;½7 c Y%P%+  J  + ;,    " DqHF#)  :%*2,-+ +$a :! + ‚  +)( C%2,G +a :,    Fl( X  , ,G . % +%  3+a 9[+  7;Sj$  $$2*D_H&$  "! $"* $% !F  ?O% + M5HI , ,- .o !,  D HF^ + J +>6%*2,-+ +a ?,  + ;*( $( "6+  +$2& ++ $+!q %O  a: A'/  + .2* D a: A:  , +D* T  a: A^þ5)  " )D "  ;  " D / ++l 7„K, , *l%Z) +5(*+ C$+!M +[!& $6 † , ,- .:% + %   +a E#{·Ç [*/ Pz)¸ › ~6Be +a :$+!'5(*+  " | |~= BC% O$+!)7!KM% +D`% + @ $ „%*2,-+ "  , %3+  + | |~6(*+ .+: Lz)¸*› ~*6%*2,-+ + " +)7Mµ^ 'o,G+ $&+*' + $   )!%  % & 6/-+ X BMBe +a +*OY+&  $ + )+  z){*¶*½ ¸*[ L% = 6/G 6 B%*2,-+ " , % X&z5·¡! 6 ¾ , ,- .@% +%  E7 c $% 3& $6 ü%* J   +4% +%  ¹ ;;~*·=DCHM[ / z)~ |} }X%2,G+ a @,  +  % 7 KM% %2,-+ +$a Z,  + V;P 5(*+'d2,- B  + &,   =H~ ~ ~ 7 y  5€?t Ò 7 a:"  D ‚ 7 _A% & ED È 7 s^ED- >8X7 ƒ„ " 7

o,   .2=!  "6(6+     +;Bp a +*:  a + )7 hi C“ Ž)G ebd çE­ Œ ¨ c D, " +:}*}› } } ¶D È   !z)¸ ¸ ¸ 7 y {€ 7-µ^!+ A07 8:$+  $(*? + b&+  $"* $% ˆBp M+ +;$+!   È 7- 7-®F)2*+ ; J )  »0+  l¼lÅq% + .2> > + $+ 7 ç'‘š£-l’–^Še -Ž—M•’ ‘)‹’ c Še–Žš’“l¥#£G‘ c £˜£ Še Še‹D , " + z5}{ -z)¶ ~DEz)¸*¸ ›7 y ¶€ 7-rF % +ED uP7 Ÿ@ +D-G7 S x.$HM  D8X7R:  )D ƒq7 hi 2AGD t 7 u4dHM 0D È Â 7  :$#ED  4r:7 ‚ " 7 Sj$   " *+ + $ "&+  FH'Y% *% " %0 &   E Ol)ń8  !& .2# Bq  $+7 ”•e•’l˜0Še3”Ž «?˜¤‰ V ’ ¸G’‰* Š¤£• ­9Ž¢6¢6Še˜0˜i’’6Ž c £˜i£ w ‹ ŠeG’’“lŠe‹D z*».z5¼lD u4 %3z)¸ ¸ ¶7 yYz+|€ 7 s^ EDG67  .2, $)Dž67 ? # +D  [®:7-u[*/  % + +7 rF$  +  "O/   +> [ + =  @%2,G "* , % +7 hi

²

] ]

o

B. THE COMPLETE LIST OF CLUSTERS z

Confidence-Pruning Effect Number of Hyperclique Patterns

1e+06 min_conf = 90% min_conf = 50% CHARM

100000 10000 1000 100 10 0.0001

0.00015

0.0002

0.00025

0.0003

Minimum Support Thresholds

 yk

É'ÊËRÌ`ÍÎgÏ `Ð [ÌqÜ4Ø9ÎÍnÕGÚ[ÔqÞÖ)Ö)ÎÍ Ù`Ó¹Ë-ÎÙ`ÎÍÞÖ5Îס؆ÇÖÒ`Î ± ×_ÞÖ5Þ Ò †`Ô9Î6Í gߤˆÊ ‡`Ì`ÎfÜPÊpÙ`ÎÍVÞ-Ù_× œ ÿ(ü + ÕRÙ ±°-² Ó+ÎÖ*Ý

>

[^`_a_

Execution Time (sec)

10 min_conf = 90% min_conf = 50% CHARM

8 6 4 2 0 0.0001

0.00015

0.0002

0.00025

0.0003

Minimum Support Thresholds

`

É'ÊËRÌ`ÍÎ Ï RÐ\Ñ6Ò`ÎnÎ;ýR΂gÌEÖÊÕRÙ ÖÊpÜ3΅Õ-Ú&ÖÒEÎIÒ+†EÔ9ÎÍ6gßpÊƇ`ÌEÎ ± ×`Þ Ö)ÞLÓ+ÎÖ*Ý ÜPÊpÙ`ÎÍ&ÞGÙ_× œ ÿ(ü ÕRÙ ±°-²

r>

[^`_a_

Confidence-Pruning Effect Number of Hyperclique Patterns

1e+09

min_conf = 95% min_conf = 85% min_conf = 70% CHARM

1e+08 1e+07 1e+06 100000 10000 1000 100

0

0.05

0.1

0.15

Minimum Support Thresholds

` yk

É'ÊËRÌ`ÍÎgÏ RÐ [ÌqÜ4Ø9ÎÍnÕGÚ[ÔqÞÖ)Ö)ÎÍ Ù`Ó¹Ë-ÎÙ`ÎÍÞÖ5Îס؆ÇÖÒ`Î ² ² ×_ÞÖ5ÞLÓ+Î*ÖÝ Ò †`Ô9Î6Í gߤˆÊ ‡`Ì`ÎZÜPʤÙEÎÍ#ÞGÙ_× œ ÿ˜ü + ÕRÙ

r>

10000

min_conf = 95% min_conf = 85% min_conf = 70% CHARM

1000 Execution Time (sec)

Z.[]\

100

10

1

0.1

`

0

0.05

0.1

0.15

Minimum Support Thresholds

É'ÊËRÌ`ÍÎ Ï RÐ\Ñ6Ò`ÎnÎ;ýR΂gÌEÖÊÕRÙ ÖÊpÜ3΅Õ-Ú&ÖÒEÎIÒ+†EÔ9ÎÍ6gßpÊƇ`ÌEÎ *² ² ×`Þ Ö)ÞLÓ+ÎÖ*Ý ÜPÊpÙ`ÎÍ&ÞGÙ_× œ ÿ(ü ÕRÙ

r>

Z.[]\

z5{

a:

z

Ÿ^ +5(*+ +[rF .+  ®F$! :? DrChqa:+ "2#rF , D8:&+ +l :ƒj)HM -D*Ÿ^ A ^ƒq5HF+ DrF    +   D *+ "2#rF , D :+  ƒ9 / $8   '$ D s:  .*;hi    D-ƒ rhi  D9rF !6  $d2Zƒ9.2%  D  ?S ¡rC* , D9^ F"    D _    ˜  _  -D„µ: .2o

  " 2>rC D- $ @t'$ D Kqlo  >hi  rFƒCr hi* DX?   ZrF D u4: )H^  !&  D*K7 ®C " "     D‚Ÿ^5(*+ &rF , Ds:)$% ) D c  ! ^hi    D /-  rM+   rC + $i2$d2 -DEŸ^" $ a , !+ D uZr rF , Dj^ 9hi .  & DEhd®Mu -D`8:( ++Lu4$+ >Ÿ Dq %HF+  D u[+ 

 `2 % D 8:, , $XrF &,  -D u4 * D 8:  lH rF , DG+ J 8F -D Kq+A*   $o=hi  rC &, + 8:Thi  D!8:  D!{*rF  D!rC &, avrF &,  -DO+ / $   2 + D&rC+I2 +! D6Ÿ@ r rC &6  +$* Ds^ƒ -Dhi+_rF , 8:, , $+4&  Ò D j§ h _ "  DGu4$+  4Kql%    "2  D a^  j+!+     DEµ: +OrC , -DR$+ 3: , % + D  [u[+   2 +! DK_+/ rF ,  D  ?8: ^: * , D-+ "  rF 7 ` D c  $HF .%[rF , rF  =rC DqŸ^5(*+ 6rF , D9^ +[» c 7 t 7 ¼ D_s^ * .!:+ +  DqhiKMK†rC* ,¹¢» a:H^¼ Dju +9hi  D„u4   XrM  D a:h - D a: .% +  D t'$?8: D t')2% + rF , D KMt'qh a?µMžj8frF , D c " +4rF ®ClA fŸ^+A + D-S  $' Bq% 8  _ D s:  2HF+$_hi  D @ Be# m®F * D @ " c  [ƒ9  -D !$,G  rF , D ƒ9 $ rF , D t