Query Planning for Continuous Aggregation Queries over a Network of ...

3 downloads 553 Views 454KB Size Report
Abstract—Continuous queries are used to monitor changes to time varying data ... aggregation queries using a network of aggregators of dynamic data items.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

IEEE TRANSACTIONS ON JOURNAL KNOWLEDGE AND DATA ENGINEERING, MANUSCRIPT ID

Query Planning for Continuous Aggregation Queries over a Network of Data Aggregators Rajeev Gupta, and Krithi Ramamritham, Fellow IEEE Abstract—Continuous queries are used to monitor changes to time varying data and to provide results useful for online decision making. Typically a user desires to obtain the value of some aggregation function over distributed data items, for example, to know value of portfolio for a client; or the AVG of temperatures sensed by a set of sensors. In these queries a client specifies a coherency requirement as part of the query. We present a low-cost, scalable technique to answer continuous aggregation queries using a network of aggregators of dynamic data items. In such a network of data aggregators, each data aggregator serves a set of data items at specific coherencies. Just as various fragments of a dynamic web-page are served by one or more nodes of a content distribution network, our technique involves decomposing a client query into sub-queries and executing sub-queries on judiciously chosen data aggregators with their individual sub-query incoherency bounds. We provide a technique for getting the optimal set of sub-queries with their incoherency bounds which satisfies client query’s coherency requirement with least number of refresh messages sent from aggregators to the client. For estimating the number of refresh messages, we build a query cost model which can be used to estimate the number of messages required to satisfy the client specified incoherency bound. Performance results using real-world traces show that our cost based query planning leads to queries being executed using less than one third the number of messages required by existing schemes. Index Terms—Algorithms, Continuous queries, Distributed query processing, Data dissemination, Coherency, Performance.

³³³³³³³³³³ ‹ ³³³³³³³³³³

1 INTRODUCTION

$

YDOXHNQRZQWRDFOLHQWRIWKHGDWD/HWYL W GHQRWHWKHYDOXH RIWKHLWKGDWDLWHPDWWKHGDWDVRXUFHDWWLPHWDQGOHWWKH YDOXHWKHGDWDLWHPNQRZQWRWKHFOLHQWEHXL W 7KHQWKH GDWDLQFRKHUHQF\DWWKHFOLHQWLVJLYHQE\_YL W XL W _)RU DGDWDLWHPZKLFKQHHGVWREHUHIUHVKHGDWDQLQFRKHUHQF\ ERXQG & D GDWD UHIUHVK PHVVDJH LV VHQW WR WKH FOLHQW DV VRRQDVGDWDLQFRKHUHQF\H[FHHGV&LH_YL W XL W _!& 1HWZRUN RI GDWD DJJUHJDWRUV 'DWD UHIUHVK IURP GDWD VRXUFHV WR FOLHQWV FDQ EH GRQH XVLQJ SXVK RU SXOO EDVHG PHFKDQLVPV ,Q D SXVK EDVHG PHFKDQLVP GDWD VRXUFHV VHQGXSGDWHPHVVDJHVWRFOLHQWVRQWKHLURZQZKHUHDVLQ SXOOEDVHGPHFKDQLVPGDWDVRXUFHVVHQGPHVVDJHVWRWKH FOLHQW RQO\ ZKHQ WKH FOLHQW PDNHV D UHTXHVW :H DVVXPH WKHSXVKEDVHGPHFKDQLVPIRUGDWDWUDQVIHUEHWZHHQGDWD VRXUFHV DQG FOLHQWV )RU VFDODEOH KDQGOLQJ RI SXVK EDVHG GDWDGLVVHPLQDWLRQQHWZRUNRIGDWDDJJUHJDWRUVDUHSUR SRVHGLQ WKHOLWHUDWXUH > @ ,Q VXFK QHWZRUNRI GDWD DJJUHJDWRUVGDWDUHIUHVKHVRFFXUIURPGDWDVRXUFHVWRWKH FOLHQWVWKURXJKRQHRUPRUHGDWDDJJUHJDWRUV ,Q WKLV SDSHU ZH DVVXPH WKDW HDFK GDWD DJJUHJDWRU PDLQWDLQV LWV FRQILJXUHG LQFRKHUHQF\ ERXQGV IRU YDULRXV GDWDLWHPV)URPDGDWDGLVVHPLQDWLRQFDSDELOLW\SRLQWRI YLHZHDFKGDWDDJJUHJDWRU '$ LVFKDUDFWHUL]HGE\DVHW RI GLFL SDLUVZKHUHGLLVDGDWDLWHPZKLFKWKH'$FDQ GLVVHPLQDWH DW DQ LQFRKHUHQF\ ERXQG FL 7KH FRQILJXUHG LQFRKHUHQF\ERXQGRIDGDWDLWHPDWDGDWDDJJUHJDWRUFDQ EH PDLQWDLQHG XVLQJ DQ\ RI IROORZLQJ PHWKRGV D  7KH GDWDVRXUFHUHIUHVKHVWKHGDWDYDOXHRIWKH'$ZKHQHYHU '$·V LQFRKHUHQF\ ERXQG LV DERXW WR JHW YLRODWHG 7KLV PHWKRG KDV VFDODELOLW\ SUREOHPV E  'DWD DJJUHJDWRU V  ZLWKWLJKWHULQFRKHUHQF\ERXQGKHOSWKH'$WR PDLQWDLQ LWV LQFRKHUHQF\ ERXQG LQ DVFDODEOH PDQQHU DV H[SODLQHG LQ>@

SSOLFDWLRQV VXFK DV DXFWLRQV SHUVRQDO SRUWIROLR YDOXDWLRQV IRU ILQDQFLDO GHFLVLRQV VHQVRUV EDVHG PRQLWRULQJURXWHSODQQLQJEDVHGRQWUDIILFLQIRUPD WLRQ HWF PDNH H[WHQVLYH XVH RI G\QDPLF GDWD )RU VXFK DSSOLFDWLRQV GDWD IURP RQH RU PRUH LQGHSHQGHQW GDWD VRXUFHVPD\EHDJJUHJDWHGWRGHWHUPLQHLIVRPHDFWLRQLV ZDUUDQWHG*LYHQWKHLQFUHDVLQJQXPEHURIVXFKDSSOLFD WLRQVWKDWPDNH XVHRI KLJKO\ G\QDPLF GDWD WKHUH LV VLJ QLILFDQWLQWHUHVWLQV\VWHPVWKDWFDQHIILFLHQWO\GHOLYHUWKH UHOHYDQW XSGDWHV DXWRPDWLFDOO\$VDQ H[DPSOH FRQVLGHU DXVHUZKRZDQWVWRWUDFNDSRUWIROLRRIVWRFNVLQGLIIHUHQW EURNHUDJH  DFFRXQWV 6WRFN GDWD YDOXHV IURP SRVVLEO\ GLIIHUHQWVRXUFHVDUHUHTXLUHGWREHDJJUHJDWHG WR VDWLVI\ XVHU·V UHTXLUHPHQW 7KHVH DJJUHJDWLRQ TXHULHV DUH ORQJ UXQQLQJTXHULHVDVGDWDLVFRQWLQXRXVO\FKDQJLQJDQGWKH XVHULVLQWHUHVWHGLQQRWLILFDWLRQVZKHQFHUWDLQFRQGLWLRQV KROG7KXVUHVSRQVHVWRWKHVHTXHULHVDUHUHIUHVKHGFRQ WLQXRXVO\ ,Q WKHVH FRQWLQXRXV TXHU\ DSSOLFDWLRQV XVHUV DUHOLNHO\WRWROHUDWHVRPHLQDFFXUDF\LQWKHUHVXOWV7KDW LVWKHH[DFWGDWDYDOXHVDWWKHFRUUHVSRQGLQJGDWDVRXUFHV QHHGQRWEHUHSRUWHG DV ORQJ DV WKH TXHU\ UHVXOWVVDWLVI\ XVHU VSHFLILHG DFFXUDF\ UHTXLUHPHQWV  )RU LQVWDQFH D SRUWIROLRWUDFNHUPD\EHKDSS\ZLWKDQDFFXUDF\RI 'DWD LQFRKHUHQF\ 'DWD DFFXUDF\ FDQ EH VSHFLILHG LQ WHUPV RI LQFRKHUHQF\ RI D GDWD LWHP GHILQHG DV WKH DEVROXWH GLIIHUHQFH LQYDOXH RI WKH GDWD LWHP DW WKH GDWD VRXUFHDQG WKH ————————————————

• 5DMHHY*XSWDLVZLWK,%05HVHDUFK1HZ'HOKL(PDLO JUDMHHY#LQLEPFRP • .ULWKL5DPDPULWKDPLVZLWK,QGLDQ,QVWLWXWHRI7HFKQRORJ\0XPEDL( PDLONULWKL#FVHLLWEDFLQ 0DQXVFULSWUHFHLYHG 0D\ 

xxxx-xxxx/0x/$xx.00 © 200x IEEE



Digital Object Indentifier 10.1109/TKDE.2011.12

1041-4347/11/$26.00 © 2011 IEEE

1

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

2

IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID

([DPSOH,QDQHWZRUNRIGDWDDJJUHJDWRUVPDQDJLQJ GDWDLWHPVGGYDULRXVDJJUHJDWRUVFDQEHFKDUDFWHUL]HG DV D^ G  G ` D^ G  G  G ` $JJUHJDWRU D FDQ VHUYH YDOXHV RI G ZLWK DQ LQFRKHUHQF\ERXQGJUHDWHUWKDQRUHTXDOWRZKHUHDVD FDQ GLVVHPLQDWH WKH VDPH GDWD LWHP DW D ORRVHU LQFRKHUHQF\ ERXQG RI  RU PRUH ,Q VXFK D QHWZRUN RI DJJUHJDWRUV RI PXOWLSOH GDWD LWHPV DOO WKH QRGHV FDQ EH FRQVLGHUHGDVSHHUVVLQFHDQRGHDLFDQKHOSDQRWKHUQRGH DN WR PDLQWDLQ LQFRKHUHQF\ ERXQG RI WKH GDWD LWHP G LQFRKHUHQF\ERXQGRIGDWDLLVWLJKWHUWKDQWKDWDWDN EXW WKHQRGHDLJHWVYDOXHVRIDQRWKHUGDWDLWHPGIURPDN

WKDQRQHGDWDLWHP 6HFRQGO\LIDVLQJOH'$FDQGLVVHPLQDWHDOOWKUHHGDWD LWHPV UHTXLUHG WR DQVZHU WKH FOLHQW TXHU\ WKH '$ FDQ FRQVWUXFW D FRPSRVLWH GDWD LWHP FRUUHVSRQGLQJ WR WKH FOLHQW TXHU\ GT  G   G   G  DQG GLVVHPLQDWH WKH UHVXOW WR WKH FOLHQW VR WKDW WKH TXHU\ LQFRKHUHQF\ ERXQG LV QRW YLRODWHG ,W LV REYLRXV WKDW LI ZH JHW WKH TXHU\ UHVXOW IURP D VLQJOH '$ WKH QXPEHU RI UHIUHVKHV ZLOO EH PLQLPXP DV GDWD LWHP XSGDWHV PD\ FDQFHO RXW HDFK RWKHU WKHUHE\ PDLQWDLQLQJ WKH TXHU\ UHVXOWVZLWKLQ WKH LQFRKHUHQF\ ERXQG  $V GLIIHUHQW GDWD DJJUHJDWRUV GLVVHPLQDWH GLIIHUHQW VXEVHWV RI GDWD LWHPV QR GDWD DJJUHJDWRU PD\ KDYH DOO WKH GDWD LWHPV UHTXLUHG WR H[HFXWH WKH FOLHQW TXHU\ ZKLFK LV LQGHHG WKH FDVH LQ ([DPSOH )XUWKHU HYHQ LI DQ DJJUHJDWRU FDQ UHIUHVK DOO WKH GDWD LWHPV LW PD\ QRW EH DEOH WR VDWLVI\ WKH TXHU\ FRKHUHQF\UHTXLUHPHQWV,QVXFKFDVHVWKHTXHU\KDVWREH H[HFXWHGZLWKGDWDIURPPXOWLSOHDJJUHJDWRUV $WKLUGRSWLRQLVWRGLYLGHWKHTXHU\LQWRDQXPEHURI VXETXHULHVDQGJHWWKHLUYDOXHVIURPLQGLYLGXDO'$V,Q WKDWFDVHWKHFOLHQWTXHU\UHVXOWLVREWDLQHGE\FRPELQLQJ WKH UHVXOWV RI PXOWLSOHVXETXHULHV )RU WKH'$V JLYHQ LQ ([DPSOH WKH TXHU\ 4 FDQ EH GLYLGHG LQ WZR DOWHUQDWLYH ZD\V 3ODQ5HVXOWRIVXETXHU\GGLVVHUYHGE\D ZKHUHDVYDOXHRIGLVVHUYHGE\D 3ODQ9DOXHRIGLVVHUYHGE\DZKHUHDVUHVXOWRIVXE TXHU\GGLVVHUYHGE\D ,Q ERWK WKH SODQV FRPELQLQJ WKH VXETXHU\ YDOXHV DW WKHFOLHQWJLYHVWKHTXHU\UHVXOW%XWVHOHFWLQJWKHRSWLPDO SODQ DPRQJYDULRXV RSWLRQV LV QRWWULYLDO ,QWXLWLYHO\ ZH VKRXOG EH VHOHFWLQJ WKH SODQ ZLWK OHVVHU QXPEHU RI VXE TXHULHV%XWWKDWLVQRWJXDUDQWHHGWREHWKHSODQZLWKWKH OHDVW QXPEHU RI PHVVDJHV )XUWKHU ZH VKRXOG VHOHFW WKH VXETXHULHV VXFK WKDW XSGDWHV WR YDULRXV GDWD LWHPV DS SHDULQJ LQ D VXETXHU\ KDYH PRUH FKDQFHV RI FDQFHOLQJ HDFK RWKHU DV WKDW ZLOOUHGXFH WKH QHHG IRUUHIUHVK WR WKH FOLHQW ,Q WKH DERYH H[DPSOH LI XSGDWHV WR G DQG G DUH VXFKWKDWZKHQGLQFUHDVHVGGHFUHDVHVDQGYLFHYHUVD WKHQVHOHFWLQJSODQPD\EHEHQHILFLDO:HJLYHDPHWKRG WR VHOHFW WKH TXHU\ SODQ EDVHG RQ WKHVH REVHUYDWLRQV :KLOH VROYLQJ WKH DERYH SUREOHP ZH HQVXUH WKDW HDFK GDWD LWHP IRU D FOLHQW TXHU\ LV GLVVHPLQDWHG E\ RQH DQG RQO\ RQH GDWD DJJUHJDWRU $OWKRXJK D TXHU\ FDQ EH GL YLGHG LQ VXFK D ZD\ WKDW D VLQJOH GDWD LWHP LV VHUYHG E\ PXOWLSOH'$V HJGGGLVGLYLGHGLQWR WZRVXETXHULHVGGDQGGG LQGRLQJ VR WKH VDPH GDWD LWHP LV SURFHVVHG DW PXOWLSOH DJJUHJD WRUVLQFUHDVLQJWKHXQQHFHVVDU\SURFHVVLQJORDG IXUWKHU LQFDVHRISDLGGDWDVXEVFULSWLRQVLWLVQRWSUXGHQWWRJHW WKH VDPH GDWD LWHP IURP PXOWLSOH VRXUFHV  %\ GLYLGLQJ WKHFOLHQWTXHU\LQWRGLVMRLQWVXETXHULHVZHHQVXUHWKDWD GDWDLWHPXSGDWHLVSURFHVVHGRQO\RQFHIRUHDFKTXHU\ 6XETXHU\ LQFRKHUHQF\ ERXQGV DUH UHTXLUHG WR EH GH ULYHG XVLQJ WKH TXHU\LQFRKHUHQF\ ERXQGV VXFK WKDW EH VLGHV VDWLVI\LQJ WKH FOLHQW FRKHUHQF\ UHTXLUHPHQWV WKH FKRVHQ'$ ZKHUHWKHVXETXHU\LVWREHH[HFXWHG LVFD SDEOH RI VDWLVI\LQJ WKH DOORFDWHG VXETXHU\ LQFRKHUHQF\ ERXQG )RU H[DPSOH LQ SODQ LQFRKHUHQF\ ERXQG DOOR FDWHGWRWKHVXETXHU\GGVKRXOGEHJUHDWHUWKDQ

1.1 Aggregate Queries and their Execution ,Q WKLV SDSHU ZH SUHVHQW D PHWKRG IRU H[HFXWLQJ FRQWLQXRXV PXOWLGDWD DJJUHJDWLRQ TXHULHV XVLQJ D QHWZRUN RI GDWD DJJUHJDWRUV ZLWK WKH REMHFWLYH RI PLQLPL]LQJ WKH QXPEHU RI UHIUHVKHV IURP GDWD DJJUHJDWRUV WR WKH FOLHQW )LUVW ZH JLYH WZR PRWLYDWLQJ VFHQDULRVZKHUHWKHUHDUHYDULRXVRSWLRQVIRUH[HFXWLQJD PXOWLGDWD DJJUHJDWLRQ TXHU\ DQG RQH PXVW VHOHFW D SDUWLFXODURSWLRQWRPLQLPL]HWKHQXPEHURIPHVVDJHV 6FHQDULR&RQVLGHUDFOLHQWTXHU\4 GGG ZKHUH G GGDUH GLIIHUHQWVWRFNVLQ D SRUWIROLR ZLWK D UHTXLUHG LQFRKHUHQF\ ERXQG RI  :H ZDQW WR H[HFXWH WKLV TXHU\ RYHU WKH GDWD DJJUHJDWRUV JLYHQ LQ ([DPSOH PLQLPL]LQJWKHQXPEHURIUHIUHVKHV 6FHQDULR ,Q D VHQVRU QHWZRUN FRQVLGHU DQ $9* TXHU\ RYHUDWDUJHWVHWRIVHQVRUV VD\GGDQGG LQMHFWHGDWD TXHU\ QRGH ,QQHWZRUN DJJUHJDWLRQ LV XVHG IRU HQHUJ\ HIILFLHQWSURSDJDWLRQ RI DJJUHJDWHV >@ )RU FRQVWUXFWLQJ DQDJJUHJDWLRQWUHHFRQQHFWLQJWKHWDUJHWVHQVRUVDQGWKH TXHU\QRGHHDFKQRGHFDQVHOHFWDSDWKWRWKHTXHU\QRGH EDVHGRQFHUWDLQSUHIHUHQFHIDFWRU:HZDQWWRVHOHFWWKHLQ QHWZRUN DJJUHJDWLRQ SDWK VXFK WKDW WKH DJJUHJDWLRQ TXHU\ JHWV H[HFXWHG ZLWK WKH PLQLPXP QXPEHU RI PHVVDJHV ,Q ERWK WKH FDVHV D OLPLWHG QXPEHU RI RSWLRQV DUH DYDLODEOH IRU H[HFXWLQJ WKH DJJUHJDWLRQ TXHU\ ,Q WKLV SDSHU ZH ZLOO XVH 6FHQDULR DV WKH UXQQLQJ H[DPSOH EXW UHVXOWVREWDLQHGDQGFRQFOXVLRQVGUDZQDUHDSSOLFDEOHWR ERWK WKH VFHQDULRV 6SHFLILFDOO\ ZH DQVZHU WKH TXHVWLRQ *LYHQ D FOLHQW TXHU\ SRVHG RYHU D K\SRWKHWLFDO GDWDEDVH RI PXOWLSOH GDWD VRXUFHV ZKDW VXETXHULHV VKRXOG EH SRVHG DW YDULRXVGDWD DJJUHJDWRUV VR WKDW WKH QXPEHU RI UHIUHVKHV IURP WKHVH DJJUHJDWRUV WR WKH FOLHQW FDQ EH PLQLPL]HG" :H XVH DGGLWLYHDJJUHJDWLRQ TXHULHV WR GHYHORS RXU DSSURDFKLQ GHWDLO DQG WRZDUGV WKH HQG RI WKH SDSHU GHVFULEH KRZ PD[PLQTXHULHVFDQEHKDQGOHG )RU DQVZHULQJ WKH PXOWLGDWD DJJUHJDWLRQ TXHU\ LQ 6FHQDULRWKHUHDUH WKUHH RSWLRQV IRU WKH FOLHQW WR JHW WKH TXHU\UHVXOWV)LUVWO\WKHFOLHQWPD\JHWWKHGDWDLWHPVG GDQGG VHSDUDWHO\7KHTXHU\LQFRKHUHQF\ERXQGFDQEH GLYLGHGDPRQJGDWDLWHPVLQYDULRXVZD\VHQVXULQJWKDW TXHU\ LQFRKHUHQF\ LV EHORZ WKH LQFRKHUHQF\ ERXQG ,Q WKLV SDSHU ZH VKRZ WKDW JHWWLQJ GDWD LWHPV LQGHSHQG HQWO\LVDFRVWO\RSWLRQ7KLVVWUDWHJ\LJQRUHVWKHIDFWWKDW WKHFOLHQWLVLQWHUHVWHGRQO\LQWKHDJJUHJDWHGYDOXHRIWKH GDWDLWHPVDQGYDULRXVDJJUHJDWRUVFDQGLVVHPLQDWHPRUH 

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

GUPTA ET AL.: QUERY PLANNING FOR CONTINUOUS QUERIES IN DYNAMIC DATA DISSEMINATION NETWORKS

     DV WKDW LV WKH WLJKWHVW LQFRKHUHQF\ ERXQG ZKLFK WKH DJJUHJDWRU D FDQ VDWLVI\ IRU WKH VXE TXHU\GG:HVKRZWKDWWKHQXPEHURIUHIUHVKHV DOVR GHSHQGV RQ WKH GLYLVLRQ RI WKH TXHU\ LQFRKHUHQF\ ERXQGVDPRQJVXETXHU\LQFRKHUHQF\ERXQGV$VLPLODU UHVXOWZDVUHSRUWHG IRU GDWD LQFRKHUHQF\ ERXQGV LQ >@ 1H[W ZH SUHVHQW SUREOHP VWDWHPHQW IRUPDOO\ DQG RXU FRQWULEXWLRQV

YDULRXV DOWHUQDWLYH RSWLRQV $ PHWKRG IRU HVWLPDWLQJ WKH TXHU\ H[HFXWLRQ FRVW LV DQRWKHU LPSRUWDQW FRQWULEXWLRQ RI WKLV SDSHU$VZHGLYLGHWKHFOLHQWTXHU\LQWRVXETXHULHVVXFK WKDWHDFKVXETXHU\JHWVH[HFXWHGDWGLIIHUHQWDJJUHJDWRU QRGHVWKHTXHU\H[HFXWLRQFRVW LHQXPEHURIUHIUHVKHV  LV WKH VXP RI WKH H[HFXWLRQ FRVWV RI LWV FRQVWLWXHQW VXE TXHULHV:HPRGHOWKHVXETXHU\H[HFXWLRQFRVWDVDIXQF WLRQ RI GLVVHPLQDWLRQ FRVWV RI WKH LQGLYLGXDO GDWD LWHPV 1.2 Problem Statement and Contributions LQYROYHG 7KH GDWD GLVVHPLQDWLRQ FRVW LV GHSHQGHQW RQ 9DOXH RI D FRQWLQXRXV ZHLJKWHG DGGLWLYH DJJUHJDWLRQ GDWD G\QDPLFV DQG WKH LQFRKHUHQF\ ERXQG DVVRFLDWHG ZLWKWKHGDWD:HPRGHOWKHGDWDG\QDPLFVXVLQJDGDWD TXHU\DWWLPHWFDQEHFDOFXODWHGDV nq G\QDPLFVPRGHODQGWKHHIIHFWRIWKHLQFRKHUHQF\ERXQG Vq (t ) = ¦ ( vqi (t ) × wqi )    XVLQJ DQ LQFRKHUHQF\ ERXQG PRGHO 7KHVH WZR PRGHOV i =1 DUH FRPELQHG WR JHW WKH HVWLPDWH RI WKH GDWD GLVVHPLQD 9TLVWKHYDOXHRIDFOLHQWTXHU\TLQYROYLQJQTGDWDLWHPV WLRQFRVW ZLWKWKHZHLJKWRIWKHLWKGDWDLWHPEHLQJ wqi ≤L≤QT6XFK 7R HPSLULFDOO\ HYDOXDWH RXU DSSURDFK ZH XVH UHDO DTXHU\HQFRPSDVVHV 64/DJJUHJDWLRQRSHUDWRUV 680DQG ZRUOGGDWDIURPWKHVHQVRUDQGVWRFNPDUNHWGRPDLQV>@ $9* EHVLGHV JHQHUDO ZHLJKWHG DJJUHJDWLRQ TXHULHV VXFK 6HQVRU QHWZRUN GDWD XVHG ZHUH WHPSHUDWXUH DQG ZLQG DVSRUWIROLRTXHULHVLQYROYLQJDJJUHJDWLRQRIVWRFNSULFHV VHQVRU GDWD IURP *HRUJHV %DQN &UXLVHV $OEDWURVV 6KLS ZHLJKWHGZLWKQXPEHURIVKDUHVRIVWRFNVLQWKHSRUWIROLR ERDUG>@6WRFNWUDFHVRIVWRFNVZHUHREWDLQHGE\SH 6XSSRVHWKHUHVXOWIRUWKHTXHU\JLYHQE\(TXDWLRQ   ULRGLFDOO\ SROOLQJ KWWSILQDQFH\DKRRFRP :H FROOHFWHG QHHGVWREHFRQWLQXRXVO\SURYLGHGWRDXVHUDWWKHTXHU\ VDPSOHVIRUHDFKGDWDLWHPZLWKDSHULRGRI  VHF LQFRKHUHQF\ ERXQG &T 7KHQ WKH GLVVHPLQDWLRQ QHWZRUN RQGV $SSHQGL[ $ JLYHV VWDWLVWLFDO SURSHUWLHVRI VRPH RI KDVWRHQVXUHWKDW WKHVH VWRFN WUDFHV ,Q WKLV SDSHU ZH SUHVHQW UHVXOWV XVLQJ nq | ¦ ( v qi (t ) − u qi ( t ) ) × wqi | ≤ Cq    VWRFNGDWDRQO\EXWVLPLODUUHVXOWVZHUHREWDLQHGIRUVHQ i =1 VRUGDWDDVZHOO>@2XUVLPXODWLRQVWXGLHVVKRZWKDWIRU :KHQHYHU GDWD YDOXHV DW VRXUFHV FKDQJH VXFK WKDW FRQWLQXRXVDJJUHJDWLRQTXHULHV TXHU\ LQFRKHUHQF\ ERXQG LVYLRODWHG WKH XSGDWHG YDOXH • 2XU PHWKRG RI GLYLGLQJ TXHU\ LQWRVXETXHULHVDQGH[HFXW VKRXOGEHUHIUHVKHGWRWKHFOLHQW,IWKHQHWZRUNRIDJJUH LQJWKHPDWLQGLYLGXDO'$VUHTXLUHVOHVVWKDQRQHWKLUGRI JDWRUV FDQ HQVXUH WKDW WKH LWK GDWD LWHP KDV LQFRKHUHQF\ WKHQXPEHURIUHIUHVKHVUHTXLUHGLQWKHH[LVWLQJVFKHPHV ERXQG &TL WKHQ WKH IROORZLQJ FRQGLWLRQ HQVXUHV WKDW WKH • )RU UHGXFLQJ WKH QXPEHU RI UHIUHVKHV PRUH G\QDPLF GDWD LWHPV VKRXOG EHSDUW RI VXETXHU\ LQYROYLQJODU TXHU\LQFRKHUHQF\ERXQG&TLVVDWLVILHG JHUQXPEHURIGDWDLWHPV nq  ¦ ( C qi × w qi ) ≤ C q    2XUPHWKRGRIH[HFXWLQJTXHULHVRYHUDQHWZRUNRIGDWDDJ i =1 JUHJDWRUV LV SUDFWLFDO VLQFH LW FDQ EH LPSOHPHQWHG XVLQJ D 7KHFOLHQWVSHFLILHGTXHU\LQFRKHUHQF\ERXQGQHHGVWR PHFKDQLVPVLPLODUWR85/UHZULWLQJ>@LQFRQWHQWGLVWUL EHWUDQVODWHGLQWRLQFRKHUHQF\ERXQGVIRULQGLYLGXDOGDWD EXWLRQ QHWZRUNV &'1V  -XVW OLNH LQ D &'1 WKH FOLHQW LWHPVRUVXETXHULHVVXFKWKDW(TXDWLRQ  LVVDWLVILHG,W VHQGVLWVTXHU\WRWKHFHQWUDOVLWH)RUJHWWLQJDSSURSULDWH VKRXOGEHQRWHGWKDW(TXDWLRQ  LVDVXIILFLHQWFRQGLWLRQ DJJUHJDWRUV HGJHQRGHV WRDQVZHUWKHFOLHQWTXHU\ ZHE IRUVDWLVI\LQJWKHTXHU\LQFRKHUHQF\ERXQGEXWQRWQHFHV SDJH  WKH FHQWUDO VLWH KDV WR ILUVW GHWHUPLQH ZKLFK GDWD VDU\ 7KLV ZD\ RI WUDQVODWLQJ WKH TXHU\ LQFRKHUHQF\ DJJUHJDWRUV KDYH WKH GDWD LWHPV UHTXLUHG IRU WKH FOLHQW ERXQGLQWRWKHVXETXHU\LQFRKHUHQF\ERXQGVLVUHTXLUHG TXHU\,IWKHFOLHQWTXHU\FDQQRWEHDQVZHUHGE\DVLQJOH LI GDWD LV WUDQVIHUUHG EHWZHHQ YDULRXV QRGHV XVLQJ RQO\ GDWD DJJUHJDWRU WKH TXHU\ LV GLYLGHG LQWR VXETXHULHV SXVKEDVHGPHFKDQLVP IUDJPHQWV  DQG HDFK VXETXHU\ LV DVVLJQHG WR D VLQJOH :H QHHG D PHWKRG IRU D  RSWLPDOO\ GLYLGLQJ D FOLHQW GDWD DJJUHJDWRU ,Q FDVH RI D &'1 ZHE SDJH·V GLYLVLRQ TXHU\ LQWR VXETXHULHV DQG E  DVVLJQLQJ LQFRKHUHQF\ LQWR IUDJPHQWV LV D SDJH GHVLJQ LVVXH ZKHUHDV IRU FRQ ERXQGVWRWKHPVXFKWKDW F WKHGHULYHGVXETXHULHVFDQ WLQXRXVDJJUHJDWLRQTXHULHVWKLVLVVXHKDVWREHKDQGOHG EHH[HFXWHGDWFKRVHQ'$VDQG G WRWDOTXHU\ H[HFXWLRQ RQSHUTXHU\EDVLVE\FRQVLGHULQJGDWDGLVVHPLQDWLRQFDSD FRVW LQ WHUPV RI QXPEHU RI UHIUHVKHV WR WKH FOLHQW LV ELOLWLHVRIGDWDDJJUHJDWRUVDVH[SODLQHGLQ([DPSOH PLQLPL]HG :H ZRXOG OLNH WR GLIIHUHQWLDWH WKH FXUUHQW ZRUN ZLWK :H SURYH WKDW WKH SUREOHP RI FKRRVLQJ VXETXHULHV ZKLOH WKDWRIGHVLJQLQJDQHWZRUNRIGDWDDJJUHJDWRUVIRUDVSH PLQLPL]LQJ TXHU\ H[HFXWLRQ FRVW LV DQ 13KDUG SUREOHP :H FLILF VHW RI FOLHQW TXHULHV :KHUHDV ZH SURSRVH D PHWKRG JLYH HIILFLHQW DOJRULWKPV WR FKRRVH WKH VHW RI VXETXHULHV DQG WR DQVZHU D FOLHQW TXHU\ XVLQJ D JLYHQ QHWZRUN RI GDWD WKHLU FRUUHVSRQGLQJ LQFRKHUHQF\ ERXQGV IRU D JLYHQ FOLHQW DJJUHJDWRUVLIWKHFOLHQWTXHULHVDUHIL[HGRQHFDQXVHWKH TXHU\ ,Q FRQWUDVW DOO UHODWHG ZRUN LQ WKLV DUHD >@ FOLHQW TXHULHV WR RSWLPDOO\ FRQVWUXFW D QHWZRUN RI SURSRVH JHWWLQJ LQGLYLGXDO GDWD LWHPV IURP WKHDJJUHJD GDWDDJJUHJDWRUVDVLQ>@2XUDLPRIPLQLPL]LQJWKH WRUVZKLFKDVZHVKRZLQWKLVSDSHUOHDGVWRODUJHQXP QXPEHU RI PHVVDJHV EHWZHHQ DJJUHJDWRUV DQG FOLHQW EHU RI UHIUHVKHV  )RU VROYLQJ WKHDERYH SUREOHP RI RSWL FRPSOLPHQWVWKHZRUNVRI>@7RJHWKHUWKH\FDQ EH PDOO\GLYLGLQJWKHFOLHQWTXHU\LQWRVXETXHULHV ZH ILUVW XVHG WR PLQLPL]H WKH WRWDO QXPEHU RI PHVVDJHV EHWZHHQ QHHG D PHWKRG WR HVWLPDWH WKH TXHU\ H[HFXWLRQ FRVW IRU GDWDVRXUFHVDQGFOLHQWV

3

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

4

IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID

>@ ZKHUH HDFK VWHSLV FRUUHODWHG ZLWKLWV SUHYLRXVVWHS ,Q SXVK EDVHG GLVVHPLQDWLRQ D GDWD VRXUFH FDQ IROORZ RQHRIWKHIROORZLQJVFKHPHV D 'DWDVRXUFHSXVKHVWKHGDWDYDOXHZKHQHYHULWGLIIHUV IURPWKHODVWSXVKHGYDOXHE\DQDPRXQWPRUHWKDQ& E  &OLHQW HVWLPDWHV GDWD YDOXH EDVHGRQVHUYHU VSHFLILHG SDUDPHWHUV > @ 7KH VRXUFH SXVKHV WKH QHZ GDWD YDOXHZKHQHYHULWGLIIHUVIURPWKH FOLHQW HVWLPDWHGYDOXH E\DQDPRXQWPRUHWKDQ& ,QERWKWKHVHFDVHVYDOXHDWWKHVRXUFHFDQEHPRGHOHG DVDUDQGRPSURFHVVZLWKDYHUDJHDVWKHYDOXHNQRZQDW WKH FOLHQW ,Q FDVH E  WKH FOLHQW DQG WKH VHUYHU HVWLPDWH WKHGDWDYDOXHDVWKHPHDQRIWKHPRGHOHGUDQGRPSURF HVV ZKHUHDV LQ FDVH D  GHYLDWLRQ IURP WKH ODVW SXVKHG YDOXH FDQ EH PRGHOHG DV ]HUR PHDQ SURFHVV 8VLQJ &KH E\VKHY·VLQHTXDOLW\>@ P (| v ( t ) − u (t ) | > C ) ∝ 1 / C 2    

1.3 Outline of the Paper 7KH FRVW PRGHO IRU GDWD GLVVHPLQDWLRQ LV GHYHORSHG LQ 6HFWLRQ   ,Q6HFWLRQ  ZH SUHVHQWWKH TXHU\ FRVWPRGHO IRUWKHDGGLWLYHDJJUHJDWLRQTXHULHV,WXVHVWKHGDWDGLV VHPLQDWLRQ PRGHO DQG D PHDVXUH IRU FDSWXULQJ FRUUHOD WLRQEHWZHHQGDWDG\QDPLFV2SWLPDOTXHU\SODQQLQJIRU DGGLWLYHTXHULHVLVSUHVHQWHGLQ6HFWLRQ5HVXOWVRISHU IRUPDQFHHYDOXDWLRQVRIDOJRULWKPVGHVFULEHGLQ6HFWLRQ DUH SUHVHQWHG LQ 6HFWLRQ  6HFWLRQ  GLVFXVVHV RSWLPDO TXHU\SODQQLQJIRU 0$;TXHULHV0RVWFRQFOXVLRQVGUDZQ IRU WKLV FODVV RI TXHULHV DUH VLPLODU WR WKDW IRU DGGLWLYH DJJUHJDWLRQTXHULHV5HODWHGZRUNLVSUHVHQWHGLQ6HFWLRQ  'LVFXVVLRQDERXW YDULRXV DVSHFWV RI RXU ZRUN FRQFOX VLRQVDQGIXWXUHZRUNDUHSUHVHQWHGLQ6HFWLRQ7DEOH JLYHVVXPPDU\RIYDULRXVV\PEROVXVHGLQWKHSDSHUDQG WKHLUGHVFULSWLRQV 7DEOH,PSRUWDQWV\PEROVDQGWKHLUPHDQLQJ Symbols

Description

A

Set of aggregators in the network.

N

Number of data aggregators (DAs).

D

Set of data items disseminated by the network.

C

Incoherency bounds of data items.

ak

kth data aggregator, 1≤k≤N

Dk

Set of data items disseminated by the kth DA.

dkj

jth data item disseminated by the kth DA.

tkj

Incoherency bound which ak can ensure for dkj.

q

Client query.

Cq

Incoherency bound for q.

nq

Number of data items in q.

dqi

ith data item of the query q.

vqi(t)

Value of the ith data item of the query q at time t.

wqi

Weight of the data item dqi for the query q.

Vq(t)

Value of the query q at time t.

qk

Sub-query of q to be executed at ak .

Cqk

Incoherency bound of qk.

Rq

Sumdiff of the query q.

ρ

Correlation measure between data items

α

Query satisfiability parameter

7KXV ZH K\SRWKHVL]H WKDW WKH QXPEHU RI GDWD UHIUHVK PHVVDJHV LV LQYHUVHO\ SURSRUWLRQDO WR WKH VTXDUH RI WKH LQFRKHUHQF\ ERXQG $ VLPLODU UHVXOW ZDV UHSRUWHG LQ >@ ZKHUHGDWDG\QDPLFVZHUHPRGHOHGDVUDQGRPZDONV

Figure 1. Number of pushes vs. incoherency bounds

9DOLGDWLQJ WKH DQDO\WLFDO PRGHO 7R FRUURERUDWH WKH DERYH DQDO\WLFDO UHVXOW ZH VLPXODWHG GDWD VRXUFHV E\ UHDGLQJYDOXHVIURPWKHVHQVRUDQGVWRFNGDWDWUDFHVGH VFULEHGLQ6HFWLRQDWSHULRGLFLQVWDQFHV)RUWKHVHH[ SHULPHQWV HDFK GDWD YDOXH DW WKH ILUVW WLFN LV VHQW WR WKH FOLHQW 'DWD VRXUFHV PDLQWDLQ ODVW VHQWYDOXH IRU HDFK FOL HQW7KHVRXUFHVUHDGQHZYDOXHIURPWKHWUDFHDQGVHQG WKH YDOXH WR LWV FOLHQWV LI DQG RQO\ LI QRW VHQGLQJ LW ZLOO YLRODWH WKH FOLHQW·V LQFRKHUHQF\ ERXQG &  )RU HDFK GDWD LWHPWKHLQFRKHUHQF\ERXQGZDVYDULHGDQGUHIUHVKPHV VDJHV WR HQVXUH WKDW LQFRKHUHQF\ ERXQG ZHUH FRXQWHG )LJXUH  VKRZV WKH FXUYHV IRU WKH QXPEHU RI SXVK PHV VDJHV IRU IRXU UHSUHVHQWDWLYH VKDUH SULFH GDWD LWHPV DV WKHLUFRUUHVSRQGLQJLQFRKHUHQF\ERXQGVDQGKHQFH& DUHYDULHG %HVLGHVYDOLGDWLQJ WKHDQDO\WLFDO PRGHO WKHVH UHVXOWVSURYLGHRQHLPSRUWDQWLQVLJKWLQWRWKHGLVVHPLQD WLRQPHFKDQLVP$VWKHLQFRKHUHQF\ERXQGGHFUHDVHVWKH QXPEHU RI PHVVDJHV LQFUHDVHV DV SHU DQDO\WLFDO PRGHO EXWWKHUHLVDVDWXUDWLRQHIIHFWIRUYHU\ORZYDOXHVRI WKH LQFRKHUHQF\ ERXQG LH ULJKW SDUW RI WKH FXUYH  7KLV LV GXHWRWKHIDFWWKDWWKHGDWDLWHPVKDYHOLPLWHGQXPEHURI GLVFUHWHFKDQJHVLQWKHYDOXH)RUH[DPSOHLIWKHVHQVLWLY LW\RIDWHPSHUDWXUHVHQVRULVRQHGHJUHHWKHQQXPEHURI GLVVHPLQDWLRQ PHVVDJHV ZLOO QRW LQFUHDVH HYHQ LI LQFR KHUHQF\ERXQGLVGHFUHDVHGEHORZR

2 DATA DISSEMINATION COST MODEL ,QWKLVVHFWLRQZHSUHVHQWWKHPRGHOWRHVWLPDWHWKHQXP EHURIUHIUHVKHVUHTXLUHGWRGLVVHPLQDWHDGDWDLWHPZKLOH PDLQWDLQLQJ D FHUWDLQ LQFRKHUHQF\ ERXQG7KHUH DUH WZR SULPDU\IDFWRUVDIIHFWLQJWKHQXPEHURIPHVVDJHVWKDWDUH QHHGHG WR PDLQWDLQ WKH FRKHUHQF\ UHTXLUHPHQW D WKH FRKHUHQF\UHTXLUHPHQWLWVHOIDQG E G\QDPLFVRIWKHGDWD

2.1 Incoherency Bound Model &RQVLGHU D GDWD LWHP ZKLFK QHHGV WR EH GLVVHPLQDWHG DW DQLQFRKHUHQF\ERXQG&LH QHZYDOXHRI WKH GDWDLWHP ZLOOEHSXVKHGLIWKHYDOXHGHYLDWHVE\PRUHWKDQ&IURP WKHODVWSXVKHGYDOXH7KXVWKHQXPEHURIGLVVHPLQDWLRQ PHVVDJHVZLOOEHSURSRUWLRQDOWRWKHSUREDELOLW\RI_Y W  X W _ JUHDWHU WKDQ & IRU GDWD YDOXH Y W  DW WKH VRXUFHDJJUHJDWRUDQGX W DWWKHFOLHQWDWWLPHW$GDWD LWHP FDQ EH PRGHOHG DV D GLVFUHWH WLPH UDQGRP SURFHVV 

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

GUPTA ET AL.: QUERY PLANNING FOR CONTINUOUS QUERIES IN DYNAMIC DATA DISSEMINATION NETWORKS

(a) C=0.001 Figure 2. Number of pushes vs. data sumdiff

(b) C=0.01

(c) C=0.1

2.2 Data Dynamics Model ERXQGYDOXHVRIDQG7KLVUDQJHLVWR :H FRQVLGHUHG WZR SRVVLEOH RSWLRQV WR PRGHO GDWD G\ WLPHVWKHDYHUDJHVWDQGDUGGHYLDWLRQRIWKHVKDUHSULFH QDPLFV$VDILUVWRSWLRQWKHGDWDG\QDPLFVFDQEHTXDQ YDOXHV 1XPEHURI UHIUHVK PHVVDJHV LV SORWWHGZLWKGDWD WLILHGEDVHGRQVWDQGDUGGHYLDWLRQRIWKHGDWDLWHPYDOXHV VXPGLII LQ  LQ )LJXUH 7KHOLQHDUUHODWLRQVKLS DSSHDUV >@:HWDNHDQH[DPSOHWRVKRZZK\VWDQGDUGGHYLDWLRQ WRH[LVWIRUDOOLQFRKHUHQF\ERXQGYDOXHV7RTXDQWLI\WKH LVQRWDJRRGPHDVXUHRIGDWDG\QDPLFVLQRXUFDVH6XS PHDVXUH RI OLQHDULW\ ZH XVHG 3HDUVRQ SURGXFW PRPHQW SRVHGDWDYDOXHVLQFRQVHFXWLYH LQVWDQFHV IRU D GDWDLWHP FRUUHODWLRQ FRHIILFLHQW 330&&  >@ D ZLGHO\ XVHG PHDV GDUH^`ZKHUHDVIRUDQRWKHUGDWDLWHPG XUH RI DVVRFLDWLRQ PHDVXULQJ WKH GHJUHH RI OLQHDULW\ EH YDOXHVDUH^      ` 6XSSRVH ERWKGDWDLWHPV WZHHQ WZR YDULDEOHV ,W LV FDOFXODWHG E\ VXPPLQJ XS WKH DUHGLVVHPLQDWHG ZLWK DQ LQFRKHUHQF\ ERXQGRI  ,W FDQ SURGXFWV RI WKH GHYLDWLRQV RI WKH GDWD LWHP YDOXHV IURP EH VHHQ WKDW WKH QXPEHU RI PHVVDJHV UHTXLUHG IRU PDLQ WKHLU PHDQ 330&& YDULHV EHWZHHQ ² DQG  ZLWK KLJKHU WDLQLQJ WKH LQFRKHUHQF\ ERXQG ZLOO EH  DQG  IRU GDWD DEVROXWH  YDOXHV VLJQLI\LQJ WKDW GDWD SRLQWV FDQ EH FRQ LWHPVGDQGGUHVSHFWLYHO\ZKHUHDVERWKGDWDLWHPVKDYH VLGHUHG OLQHDU ZLWK PRUH FRQILGHQFH )RU WKUHH YDOXHV RI WKH VDPH VWDQGDUG GHYLDWLRQ   7KXV ZH QHHG D LQFRKHUHQF\ ERXQGV   DQG  330&& YDOXHV PHDVXUHZKLFKFDSWXUHVGDWDFKDQJHVDORQJZLWKLWVWHP ZHUH   DQG  UHVSHFWLYHO\ LH DYHUDJH GHYLD SRUDOSURSHUWLHV7KLVPRWLYDWHVXVWRH[DPLQHWKHVHFRQG WLRQIURPOLQHDULW\ZDVLQWKHUDQJHRIIRUORZYDOXHV RI&DQGIRUKLJKYDOXHVRI&7KXVZHFDQFRQFOXGH PHDVXUH $V D VHFRQG RSWLRQ ZH FRQVLGHUHG )DVW )RXULHU 7UDQV WKDW IRU ORZHU YDOXHV RI WKH LQFRKHUHQF\ ERXQGV OLQHDU IRUP ))7  ZKLFK LV XVHG LQ WKH GLJLWDO VLJQDO SURFHVVLQJ UHODWLRQVKLS EHWZHHQ GDWD VXPGLII DQG WKH QXPEHU RI UH GRPDLQWRFKDUDFWHUL]HDGLJLWDOVLJQDO))7FDSWXUHVQXP IUHVKPHVVDJHVFDQEHDVVXPHGZLWKPRUHFRQILGHQFH$ EHU RI FKDQJHV LQ GDWD YDOXH DPRXQW RI FKDQJHV DQG ODUJHUHUURUIRUODUJHUYDOXHVRI&FDQEHH[SODLQHGDVIRO WKHLUWLPLQJV7KXV))7FDQEHXVHGWRPRGHOGDWDG\QDP ORZV $V SHU WKH K\SRWKHVLV D ODUJHU YDOXH RI GDWD VXPGLII LFV EXW LW KDV D SUREOHP 7R HVWLPDWH WKH QXPEHU RI UH IUHVKHV UHTXLUHG WR GLVVHPLQDWH D GDWD LWHP ZH QHHG D VKRXOGUHVXOWLQPRUHUHIUHVKHV%XWWKDWPD\QRWEHWUXH IXQFWLRQ RYHU ))7 FRHIILFLHQWV ZKLFK FDQ UHWXUQ D VFDODU ZKHQ HLWKHU   WKHUH DUH ORZ DPSOLWXGH FKDQJHV LQ WKH YDOXH7KHQXPEHURI))7FRHIILFLHQWVFDQEHDVKLJKDVWKH FRQVHFXWLYH GDWD YDOXHV VPDOOHU WKDQ WKH LQFRKHUHQF\ QXPEHU RI FKDQJHV LQ WKH GDWDYDOXH $PRQJ ))7 FRHIIL ERXQG ZKLFK LQFUHDVHV WKH GDWD VXPGLII YDOXH ZLWKRXW FLHQWVWKRUGHUFRHIILFLHQWLGHQWLILHVDYHUDJHYDOXHRIWKH UHTXLULQJ WKH GLVVHPLQDWLRQ RI PHVVDJHV RU   WKHUH DUH GDWD LWHP ZKHUHDV KLJKHU RUGHU FRHIILFLHQWV UHSUHVHQW KLJK VSLNHVVXFK WKDW WKH\ DUH PXFK KLJKHU FRPSDUHG WR WUDQVLHQWFKDQJHVLQWKHYDOXHRIGDWDLWHP:HK\SRWKH WKHLQFRKHUHQF\ERXQGOHDGLQJWRPRUHWKDQSURSRUWLRQDO VL]HWKDWWKHFRVWRIGDWDGLVVHPLQDWLRQIRUDGDWDLWHPFDQ LQFUHDVH LQ WKH GDWD VXPGLII 7KH ILUVW FDVH ZLOO EH PRUH EH DSSUR[LPDWHG E\ D IXQFWLRQ RI WKH VW ))7 FRHIILFLHQW SUHYDOHQW IRU KLJK YDOXHV RI WKH LQFRKHUHQF\ ERXQG 6SHFLILFDOO\WKHFRVWRIGDWDGLVVHPLQDWLRQIRUDGDWDLWHP ZKHUHDVWKHVHFRQGFDVHZLOOEHPRUHSURQRXQFHGIRUORZ YDOXHVRIWKHLQFRKHUHQF\ERXQG7KXVWKHOLQHDUUHODWLRQ ZLOOEHSURSRUWLRQDOWRGDWDVXPGLIIGHILQHGDV Rs = ¦ | si − si −1 |    VKLS EHWZHHQ WKH GDWD VXPGLII DQG QXPEHU RI UHIUHVKHV KDVPRUHHUURUIRUYHU\ORZYDOXHVDQGYHU\KLJKYDOXHV i ZKHUHVLDQGVLDUHWKHVDPSOHGYDOXHVRIDGDWDLWHP6DW RI LQFRKHUHQF\ ERXQGV $V ORZ DPSOLWXGH SHUWXUEDWLRQV LWK DQG L WK WLPH LQVWDQFHV LH FRQVHFXWLYH WLFNV  ,Q DUH PRUH SUHYDOHQW WKDQ KLJK DPSOLWXGH VSLNHV LQ PRVW SUDFWLFHVXPGLIIYDOXHIRUDGDWDLWHPFDQEHFDOFXODWHGDW GDWD LWHPV WKH OLQHDU UHODWLRQVKLS LV PRUH DFFXUDWH IRU WKH GDWD VRXUFH E\ WDNLQJ UXQQLQJ DYHUDJH RI GLIIHUHQFH ORZHUYDOXHVRILQFRKHUHQF\ERXQGV EHWZHHQ GDWD YDOXHV IRU FRQVHFXWLYH WLFNV )RU RXU H[ 2.3 Combining Data Dissemination Models SHULPHQWV ZH FDOFXODWHG WKH VXPGLII YDOXHV XVLQJ H[SR 1XPEHURIUHIUHVKPHVVDJHVLVSURSRUWLRQDOWRGDWDVXP QHQWLDOZLQGRZPRYLQJDYHUDJHZLWKHDFKZLQGRZKDY GLII 5V DQG LQYHUVHO\ SURSRUWLRQDO WR VTXDUH RI WKH LQFR LQJVDPSOHVDQGJLYLQJZHLJKWWRWKHPRVWUHFHQW KHUHQF\ERXQG & )XUWKHUZHFDQVHHWKDWZHQHHGQRW GLVVHPLQDWH DQ\ PHVVDJH ZKHQ HLWKHU GDWD YDOXH LV QRW ZLQGRZ 9DOLGDWLQJ WKH K\SRWKHVLV :H GLG VLPXODWLRQV ZLWK FKDQJLQJ 5V    RU LQFRKHUHQF\ ERXQG LV XQOLPLWHG GLIIHUHQW VWRFNV EHLQJ GLVVHPLQDWHG ZLWK LQFRKHUHQF\ &    7KXV IRU D JLYHQ GDWD LWHP 6 GLVVHPLQDWHG

5

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

6

IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID

ZLWKDQLQFRKHUHQF\ERXQG&WKHGDWDGLVVHPLQDWLRQFRVW FXODWLQJρ LV SURSRUWLRQDO WR 5V& ,Q WKH QH[W VHFWLRQ ZH XVH WKLV 3.2 Query based Normalization GDWDGLVVHPLQDWLRQFRVWPRGHOIRUGHYHORSLQJFRVWPRGHO 6XSSRVH ZH ZDQW WR FRPSDUH WKH FRVW RI WZR TXHULHV D IRUDGGLWLYHDJJUHJDWLRQTXHULHV 680 TXHU\ LQYROYLQJ WZR GDWD LWHPV DQG DQ $9* TXHU\ LQYROYLQJWKHVDPHVHWRIGDWDLWHPV/HWWKHTXHU\ LQFR 3 COST MODEL FOR ADDITIVE AGGREGATION KHUHQF\ERXQGIRUWKH 680DQGWKH $9*TXHULHVEH& & QUERIES DQG& & UHVSHFWLYHO\ )URP (TXDWLRQ  VXPGLII RI WKH &RQVLGHUDQDGGLWLYHTXHU\RYHUWZRGDWDLWHPV3DQG4 680 TXHU\ ZLOO EH GRXEOH WKDW RI WKH $9* TXHU\ +HQFH  ZLWKZHLJKWVZS DQGZT UHVSHFWLYHO\ DQG ZH ZDQW WR HV TXHU\HYDOXDWLRQFRVW DVSHU5& RIWKH 680TXHU\ZLOO EHKDOIWKDWRIWKH $9*TXHU\%XWLQWXLWLYHO\GLVVHPLQDW WLPDWH LWV GLVVHPLQDWLRQ FRVW ,I GDWD LWHPV DUH GLVVHPL LQJ WKH $9* RI WZR GDWD LWHPV DW D JLYHQ LQFRKHUHQF\ QDWHGVHSDUDWHO\WKHTXHU\VXPGLIIZLOOEH ERXQG VKRXOG UHTXLUH WKH VDPH QXPEHU RI UHIUHVK PHV Rdata = w p R p + wq Rq = w p ¦ | pi − pi −1 | + wq ¦ | qi −qi −1 |    VDJHV DV WKHLU 680 ZLWK GRXEOH WKH LQFRKHUHQF\ ERXQG ,QVWHDGLIWKHDJJUHJDWRUXVHVWKHLQIRUPDWLRQWKDWFOLHQW 7KXV WKHUH LV D QHHG WR QRUPDOL]H TXHU\ FRVWV  )URP D LV LQWHUHVWHG LQ D TXHU\ RYHU 3 DQG 4 UDWKHU WKDQ WKHLU TXHU\H[HFXWLRQFRVWSRLQWRIYLHZDTXHU\ZLWKZHLJKWV LQGLYLGXDOYDOXHV LWFUHDWHVDQGSXVKHVDFRPSRVLWHGDWD ZL DQG LQFRKHUHQF\ ERXQG & LV WKH VDPH DV TXHU\ ZLWK LWHP ZSSZTT WKHQWKHTXHU\VXPGLIIZLOOEH ZHLJKWV ΩZL DQG LQFRKHUHQF\ ERXQG Ω& 6R ZKLOH QRU Rquery = ¦ | w p ( pi − pi −1 ) +wq ( qi − qi −1 ) |    PDOL]LQJZHQHHGWRHQVXUHWKDWERWKTXHU\ZHLJKWVDQG 5TXHU\LVFOHDUO\OHVVWKDQRUHTXDOFRPSDUHGWR5GDWD7KXV LQFRKHUHQF\ ERXQGV DUH PXOWLSOLHG E\ WKH VDPH IDFWRU ZH QHHG WR HVWLPDWH WKH VXPGLII RI DQ DJJUHJDWLRQ TXHU\ 1RUPDOL]HGTXHU\VXPGLIILVJLYHQE\ 2 2 2 2 2 2 2 LH 5TXHU\  JLYHQ WKH VXPGLII YDOXHV RI LQGLYLGXDO GDWD Rquery = ( w p R p + wq Rq + 2 ρw p R p wq Rq ) /(w p + wq + 2 ρw p wq )     LWHPV LH5SDQG5T 2QO\GDWDDJJUHJDWRUVDUHLQDSRVL LHWKHYDOXHRIWKHQRUPDOL]LQJIDFWRUIRU5TXHU\VKRXOGEH WLRQWRFDOFXODWH5TXHU\DVGLIIHUHQWGDWDLWHPVPD\EHGLV 2 2 VHPLQDWHG IURP GLIIHUHQW VRXUFHV :H GHYHORS WKH TXHU\ 1 / w p + wq + 2 ρw p wq 7KHYDOXHRIWKHLQFRKHUHQF\ERXQG FRVWPRGHOLQWZRVWDJHV KDV WR EH DGMXVWHG E\ WKHVDPH IDFWRU 1RUPDOL]DWLRQ HQ 3.1 Modeling Correlation between Data Dynamics VXUHVWKDWTXHULHVZLWKDUELWUDU\YDOXHVRIZHLJKWVFDQEH )URP (TXDWLRQV   DQG   ZH FDQ VHH WKDW LI WZR GDWD FRPSDUHG IRU H[HFXWLRQ FRVW HVWLPDWHV (TXDWLRQ  FDQ LWHPV DUH FRUUHODWHG VXFK WKDW DV WKH YDOXH RI RQH GDWD EHH[WHQGHGWRJHWTXHU\VXPGLIIIRUDQ\JHQHUDOZHLJKWHG LWHP LQFUHDVHV WKDW RI WKH RWKHU GDWDLWHP DOVR LQFUHDVHV DJJUHJDWLRQTXHU\JLYHQE\(TXDWLRQ  DV WKHQ5TXHU\ZLOOEHFORVHUWR5GDWD2QWKHRWKHUKDQGLIWKH nq nq nq 2 2 ¦ wqi Ri + ¦ ¦ ρ ij wqi wqj Ri R j GDWDLWHPVDUHLQYHUVHO\FRUUHODWHGWKHQ5TXHU\ZLOOEHOHVV i= i =1 j =1, j ≠ i FRPSDUHGWR5GDWD7KXVLQWXLWLYHO\ZHFDQUHSUHVHQWWKH RQ2 = 1 n    nq nq q 2 UHODWLRQVKLSEHWZHHQ5TXHU\DQGVXPGLIIYDOXHVRIWKHLQGL ¦ wqi + ¦ ¦ ρ ij wqi wqj i =1 i =1 j =1, j ≠ i YLGXDOGDWDLWHPVXVLQJDFRUUHODWLRQPHDVXUHDVVRFLDWHG ZLWKWKHSDLURIGDWDLWHPV6SHFLILFDOO\LI ρLVWKHFRUUHOD  3.3 Validating the Query Cost Model WLRQPHDVXUHWKHQ5TXHU\FDQEHZULWWHQDV 2 2 2 2 2 Rquery ∝ ( w p R p + wq Rq + 2 ρw p R p wq Rq )    7R YDOLGDWH WKH TXHU\ FRVW PRGHO ZH SHUIRUPHG VLPXOD WLRQV E\ FRQVWUXFWLQJ  ZHLJKWHG DJJUHJDWLRQ TXHULHV 7KH FRUUHODWLRQ PHDVXUH ȡ LV GHILQHG VXFK WKDW ²” ȡ”  XVLQJ WKH VWRFN GDWD ZLWK HDFK TXHU\ FRQVLVWLQJ RI  6R 5TXHU\ ZLOO DOZD\V EH OHVV WKDQ _ZS5SZT5T_ DV H[ GDWD LWHPV ZLWK GDWD ZHLJKWV XQLIRUPO\ GLVWULEXWHG EH SODLQHG HDUOLHU  DQG DOZD\V EH PRUH WKDQ _ZS5S²ZT5T_ WZHHQ  DQG  )RU HDFK TXHU\ WKH QXPEHU RI UHIUHVKHV 7KHDERYHUHODWLRQFDQEHEHWWHUXQGHUVWRRGIURPLWVVLPL ZDV FRXQWHG IRU YDULRXV LQFRKHUHQF\ ERXQGV VXFK WKDW ODULW\ZLWKWKHVWDQGDUGGHYLDWLRQRIWKHVXPRIWZRUDQ WKHLUQRUPDOL]HGYDOXHV XVLQJQRUPDOL]DWLRQIDFWRUDVLQ GRPYDULDEOHV>@)RUGDWDLWHPV3DQG4 ρFDQEHFDO (TXDWLRQ  DUHEHWZHHQDQG)LJXUH D VKRZV FXODWHGDV WKDW WKH QXPEHU RI PHVVDJHV LV SURSRUWLRQDO WR WKH QRU ρ = ( ¦ ( pi − pi −1 )( qi − qi −1 ) ) /( ¦ ( pi − pi −1 ) 2 ¦ ( qi − qi −1 ) 2 )    PDOL]HGTXHU\VXPGLIIDVFDOFXODWHGXVLQJ(TXDWLRQ  LI ,Q$SSHQGL[%ZHGLVFXVVDPHWKRGIRUHIILFLHQWO\FDO WKHLUQRUPDOL]HGLQFRKHUHQF\ERXQGVDUHWKHVDPH,QWKLV FDVH 330&&YDOXHLVIRXQGWREH6LPL ODUO\ )LJXUH E  VKRZV WKH GHSHQGHQFHRI WKHQXPEHURIUHIUHVKHVRQ&WRLOOXVWUDWH WKDW WKH UHODWLRQVKLS WKDW KROGV EHWZHHQ WKHP IRU VLQJOH GDWD LWHP DOVR KROGV IRU D TXHU\ZLWKPXOWLSOHGDWDLWHPV:HXVHWKLV TXHU\FRVWPRGHOIRUTXHU\SODQQLQJZKLFK LVSUHVHQWHGQH[W

4 Figure 3: Query cost validation with varying (a) Sumdiff (b) Incoherency bound



QUERY PLANNING FOR WEIGHTED ADDITIVE AGGREGATION QUERIES

)RUH[HFXWLQJDQLQFRKHUHQF\ERXQGHGFRQ

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

GUPTA ET AL.: QUERY PLANNING FOR CONTINUOUS QUERIES IN DYNAMIC DATA DISSEMINATION NETWORKS

WLQXRXVTXHU\D TXHU\ SODQ LV UHTXLUHG7KH TXHU\SODQ QLQJSUREOHPFDQEHVWDWHGDV ,QSXWV  $QHWZRUNRIGDWDDJJUHJDWRUVLQWKHIRUPRID UHODWLRQI $'& VSHFLI\LQJWKH1GDWDDJJUHJDWRUVDN ∈ $ ≤N≤1 VHW'N ⊆' RI GDWDLWHPV GLVVHPLQDWHG E\ WKH GDWD DJJUHJDWRU DN DQG LQFRKHUHQF\ ERXQG tkj ZKLFK WKH

)ROORZLQJ LV WKH RXWOLQH RI RXU DSSURDFK IRU VROYLQJ WKLV FRQVWUDLQW RSWLPL]DWLRQ SUREOHP DV GHWDLOHG LQ WKH UHVWRIWKLVVHFWLRQ,Q6HFWLRQZHSURYHWKDWGHWHUPLQ LQJ VXETXHULHV ZKLOH PLQLPL]LQJ =T DV JLYHQ E\ (TXD WLRQ  LV13KDUG,Q6HFWLRQZHVKRZWKDWLIWKHVHW RI VXETXHULHV TN  LV DOUHDG\ JLYHQ VXETXHU\ LQFR KHUHQF\ ERXQGV &TNV FDQ EH RSWLPDOO\ GHWHUPLQHG WR PLQLPL]H =T $V RSWLPDOO\ GLYLGLQJ WKH TXHU\ LQWR VXE DJJUHJDWRUDNFDQHQVXUHIRUHDFKGDWDLWHP d kj ∈'N   &OLHQWTXHU\TDQGLWVLQFRKHUHQF\ERXQG&T$QDGGL TXHULHV LV13KDUGDQG WKHUH LV QR NQRZQDSSUR[LPDWLRQ WLYH DJJUHJDWLRQ TXHU\ T FDQ EH UHSUHVHQWHG DV ¦ wqid qi  DOJRULWKP LQ 6HFWLRQ  ZH SUHVHQW WZR KHXULVWLFV IRU GHWHUPLQLQJ VXETXHULHV ZKLOH VDWLVI\LQJ DV PDQ\ FRQ ZKHUH wqi LVWKHZHLJKWRIWKHGDWDLWHP d qi IRU≤L≤QT 2XWSXWV  TNIRU≤N≤1LHVXETXHU\IRUHDFKGDWDDJ VWUDLQWVDVSRVVLEOH &RQVWUDLQWDQG&RQVWUDLQWWREHSUH FLVH 7KHQZHSUHVHQWYDULDWLRQRIWKHWZRKHXULVWLFV IRU JUHJDWRUDN   &TNIRU≤N≤1LHLQFRKHUHQF\ERXQGVIRUDOOWKHVXE HQVXULQJWKDWVXETXHU\ LQFRKHUHQF\ERXQG LV VDWLVILHG &RQ VWUDLQW ,QSDUWLFXODUWRJHWDVROXWLRQRIWKHTXHU\SODQ TXHULHV 7KXVWR JHW D TXHU\ SODQ ZH QHHG WRSHUIRUP IROORZLQJ QLQJ SUREOHPWKH KHXULVWLFVSUHVHQWHGLQ 6HFWLRQ  DUH XVHG IRU GHWHUPLQLQJ VXETXHULHV 7KHQ XVLQJ WKH VHW RI WDVNV  'HWHUPLQLQJVXETXHULHV)RUWKHFOLHQWTXHU\TJHWVXE VXETXHULHVWKHPHWKRGRXWOLQHGLQ6HFWLRQLVXVHGIRU GLYLGLQJLQFRKHUHQF\ERXQG TXHULHVT VIRUHDFKGDWDDJJUHJDWRU N

 'LYLGLQJ LQFRKHUHQF\ ERXQG 'LYLGH WKH TXHU\ LQFR KHUHQF\ERXQG&TDPRQJVXETXHULHVWRJHW&TNV )RU RSWLPDO TXHU\ SODQQLQJ DERYH WDVNV DUH WR EH SHU IRUPHGZLWKWKHIROORZLQJREMHFWLYHDQGFRQVWUDLQWV 2SWLPL]DWLRQ REMHFWLYH 1XPEHU RI UHIUHVK PHVVDJHV LV PLQLPL]HG,Q6HFWLRQ  ZH KDYH SURYHG WKDW IRUD VXE TXHU\ TN WKH HVWLPDWHG QXPEHU RI UHIUHVK PHVVDJHV LV 2  JLYHQ E\ κ5TN Cqk  ZKHUH 5TN LV WKH VXPGLII RI WKH VXE TXHU\TN&TNLVWKHLQFRKHUHQF\ERXQGDVVLJQHGWRLWDQG κ WKH SURSRUWLRQDOLW\ IDFWRU LV WKH VDPH IRU DOO VXE TXHULHVRIDJLYHQ TXHU\ T7KXV WRWDO QXPEHU RI UHIUHVK PHVVDJHVLVHVWLPDWHGDV N

Zq = κ ¦

Rqk

2 k =1C qk



 

+HQFH =T QHHGV WR EH PLQLPL]HG IRU PLQLPL]LQJ WKH QXPEHURIUHIUHVKHV &RQVWUDLQW TN LV H[HFXWDEOH DW DN (DFK '$ KDV WKH GDWD LWHPV UHTXLUHG WR H[HFXWH WKH VXETXHU\ DOORFDWHG WR LW LHIRUHDFKGDWDLWHP dqkiUHTXLUHGIRUWKHVXETXHU\TN d qki ∈'N &RQVWUDLQW 4XHU\ LQFRKHUHQF\ ERXQG LV VDWLVILHG 4XHU\ LQFRKHUHQF\VKRXOGEHOHVVWKDQRUHTXDOWRWKHTXHU\LQ FRKHUHQF\ERXQG)RUDGGLWLYHDJJUHJDWLRQTXHULHVYDOXH RIWKHFOLHQWTXHU\LVWKHVXPRIVXETXHU\YDOXHV$VGLI IHUHQWVXETXHULHV DUH GLVVHPLQDWHG E\ GLIIHUHQW GDWD DJ JUHJDWRUVZHQHHGWRHQVXUHWKDWVXPRIVXETXHU\LQFR KHUHQFLHV LV OHVV WKDQ RU HTXDO WR WKH TXHU\ LQFRKHUHQF\ ERXQG7KXV ¦ Cqk ≤ Cq    &RQVWUDLQW 6XETXHU\ LQFRKHUHQF\ ERXQG LV VDWLVILHG 'DWD LQFRKHUHQF\ ERXQGV DW DN tkj IRU d kj ∈'N  VKRXOG EH VXFK WKDWWKHVXETXHU\LQFRKHUHQF\ERXQG&TNFDQEHVDWLVILHG DWWKDW'$7KHWLJKWHVWLQFRKHUHQF\ERXQG7TNZKLFKWKH GDWDDJJUHJDWRU DN FDQ VDWLVI\ IRU WKH JLYHQ VXETXHU\ TN FDQEHFDOFXODWHG DV Tqk = ¦ ( wqi × tqj d qi ≡d kj )  )RU VDWLV n qk

I\LQJWKLVFRQVWUDLQWZHHQVXUH Cqk ≥ Tqk 

4.1 Finding Optimal Query Plan is NP-hard )RU SURYLQJ WKDW WKH SUREOHP LV 13KDUG ZH XVH UHGXF WLRQIURPGLPHQVLRQDOPDWFKLQJ '0 SUREOHP>@ '03UREOHP*LYHQWKUHHVHWV;