Reinforcement Learning with Utility-aware Agents for Market-based Resource Allocation Eduardo Rodrigues Gomes
Ryszard Kowalczyk
Swinburne University of Technology Faculty of Information and Communication Technology Hawthorn, 3122 Victoria, Australia
Swinburne University of Technology Faculty of Information and Communication Technology Hawthorn, 3122 Victoria, Australia
[email protected]
[email protected]
ABSTRACT ,QWKLVSDSHUZHSURSRVHDQGLQYHVWLJDWHWKHXVHRI5HLQIRUFHPHQW /HDUQLQJLQDPDUNHWEDVHGUHVRXUFHDOORFDWLRQPHFKDQLVPFDOOHG ,WHUDWLYH 3ULFH $GMXVWPHQW 8QGHU VWDQGDUG DVVXPSWLRQV WKLV PHFKDQLVPXVHVGHPDQGIXQFWLRQVWKDWGRQRWDOORZWKHDJHQWVWR KDYHSUHIHUHQFHVRYHUWKHDWWULEXWHVRIWKHDOORFDWLRQHJWKHSULFH RI WKH UHVRXUFHV 7R DGGUHVV WKLV OLPLWDWLRQ ZH VWXG\ WKH FDVH ZKHUH WKH DJHQW¶V SUHIHUHQFHV LQ WKH UHVRXUFH DOORFDWLRQ DUH GHVFULEHGE\XWLOLW\IXQFWLRQVDQGWKH\OHDUQWKHGHPDQGIXQFWLRQV JLYHQ WKHLU XWLOLW\ IXQFWLRQV 7KH DSSURDFK KDV EHHQ HYDOXDWHG ZLWKH[WHQVLYHH[SHULPHQWV
Categories and Subject Descriptors , >Artificial Intelligence@ /HDUQLQJ , >Distributed Artificial Intelligence@0XOWLDJHQW6\VWHPV
General Terms ([SHULPHQWDWLRQ$OJRULWKPV
Keywords 5HLQIRUFHPHQW/HDUQLQJ0DUNHWEDVHG5HVRXUFH$OORFDWLRQ
1. INTRODUCTION 0DUNHWEDVHG UHVRXUFH DOORFDWLRQ PHFKDQLVPV RIIHU D SURPLVLQJ DSSURDFK IRU V\VWHPV WKDW QHHG GLVWULEXWHG UHVRXUFH DOORFDWLRQ ZLWKRXW FHQWUDOL]HG FRQWURO >@ IRU H[DPSOH WKH *5,' 2QH RI WKRVH PHFKDQLVPV LV WKH ,WHUDWLYH 3ULFH $GMXVWPHQW ,3$ >@ 7KLVPHFKDQLVPXVHVWKHFRQFHSWRID³SULFH´ZKLFKLVLWHUDWLYHO\ DGMXVWHG WR ILQG HTXLOLEULXP EHWZHHQ D VHW RI GHPDQGV DQG D OLPLWHGVXSSO\RIUHVRXUFHV ,Q WKH ,3$ WKH LQWHUHVWV RI WKH DJHQWV LQ WKH UHVRXUFH DOORFDWLRQ DUH GHVFULEHG E\ PHDQV RI GHPDQG IXQFWLRQV 8QGHU VWDQGDUG DVVXPSWLRQV WKRVH GHPDQG IXQFWLRQV VSHFLI\ D UHODWLRQVKLS EHWZHHQ SULFHDQGGHPDQGDQGDVVXFKGRQRWDOORZWKHDJHQWVWR KDYHSUHIHUHQFHVRYHUWKHDWWULEXWHVRIWKHDOORFDWLRQIRUH[DPSOH WKH SULFH ,W PDNHV GLIILFXOW WR LQIOXHQFH DQG RSWLPL]H WKH DOORFDWLRQ TXDOLW\ LQ WHUPV RI WKH XWLOLW\ UHFHLYHG E\ WKH DJHQWV 2QHRIWKHDOWHUQDWLYHVWRDGGUHVVWKLVSUREOHPLVWROHWWKHDJHQWV WRGHVFULEHWKHLULQWHUHVWVXVLQJXWLOLW\IXQFWLRQVLQVWHDGRIGHPDQG 3HUPLVVLRQWRPDNHGLJLWDORUKDUGFRSLHVRIDOORUSDUWRIWKLVZRUNIRU SHUVRQDORUFODVVURRP XVH LV JUDQWHG ZLWKRXW IHH SURYLGHGWKDW FRSLHV DUHQRWPDGHRUGLVWULEXWHGIRUSURILWRUFRPPHUFLDODGYDQWDJHDQGWKDW FRSLHVEHDUWKLV QRWLFHDQG WKHIXOOFLWDWLRQRQWKHILUVWSDJH7RFRS\ RWKHUZLVH RU UHSXEOLVK WR SRVW RQ VHUYHUV RU WR UHGLVWULEXWH WR OLVWV UHTXLUHVSULRUVSHFLILFSHUPLVVLRQDQGRUDIHH
IXQFWLRQV ,Q VXFK D VFHQDULR KRZHYHUWKH PDSSLQJ EHWZHHQWKH XWLOLW\IXQFWLRQVDQGDGHPDQGIXQFWLRQLVQRWXQLTXH,I³PDGHE\ KDQG´ WKH GHILQLWLRQ RI WKLV PDSSLQJLVYHU\VXEMHFWLYHDQG PD\ WXUQ LQWR D YHU\ FRPSOH[ DQG WLPHFRQVXPLQJ WDVN HVSHFLDOO\ LI RQH FRQVLGHUV DJHQWV ZLWK PXOWLSOH XWLOLW\ IXQFWLRQV ,Q DGGLWLRQ WKH UHVXOWLQJ DJHQWV PLJKW QRW SHUIRUP ZHOO LQ WKH UHDO V\VWHP 7KXV LQ WKLV SDSHU ZH SURSRVH DQG LQYHVWLJDWH WKH XVH RI 5HLQIRUFHPHQW /HDUQLQJ 5/ >@ WR OHW WKH DJHQWV OHDUQ WKH EHVW GHPDQG IXQFWLRQV JLYHQ WKHLU XWLOLW\ IXQFWLRQV 7KH UHVXOWLQJ GHPDQGIXQFWLRQVDUHDSSOLHGLQDQRUGLQDU\,3$PDUNHW IRUWKHLU HYDOXDWLRQ
2. LEARNING THE DEMAND FUNCTIONS ,Q WKH H[SHULPHQWV ZH FRQVLGHUHG D VLQJOH ,3$ PDUNHW ZLWK WZR FOLHQWDJHQWV7KHDJHQWVKDYHSUHIHUHQFHVRYHUWKHSULFHDQGRYHU WKHDPRXQWRIUHVRXUFHV6XFKSUHIHUHQFHVDUHUHSUHVHQWHGE\WZR XWLOLW\ IXQFWLRQV U1(p), IRU SULFH DQG U2(m), IRU WKH DPRXQW RI UHVRXUFH )RU WKH VDNH RI VLPSOLFLW\ EXW ZLWKRXW ORRVLQJ WKH JHQHUDOLW\ZHXVHRQO\RQHW\SHRIUHVRXUFHHJmemory :H XVHG WKH RUGLQDU\ QOHDUQLQJ DOJRULWKP >@ DQG WKH İJUHHG\ DFWLRQVHOHFWLRQPHFKDQLVPIRUWKHOHDUQLQJ7KHFXUUHQWSULFHVRI WKH UHVRXUFHV DW HDFK VWHS RI WKH ,3$ QHJRWLDWLRQ SURFHVV DUH PDSSHGLQWRWKHHQYLURQPHQWVWDWHVDVWKLVLVWKHRQO\LQIRUPDWLRQ WKH FOLHQWVKDYHDYDLODEOHLQ WKH ,3$PDUNHW7KHDJHQWV¶DFWLRQV DUHPDSSHGWRWKHDPRXQWVRIUHVRXUFHWKH\FDQUHTXLUH$QGWKH UHZDUGVDUHPDSSHGWRDIXQFWLRQRIWKHXWLOLWLHVWKHDJHQWUHFHLYHV DWWKHHQGRIWKHDOORFDWLRQSURFHGXUH 7R HQFRXUDJH WKH DJHQWV WR LPSURYH WKH VRFLDO ZHOIDUH WKH\ DUH WUDLQHG MRLQWO\ DQG XVLQJ WKH VDPH UHZDUG IXQFWLRQV 7KH ILQDO VWDWH RI HDFK OHDUQLQJ HSLVRGH LV UHDFKHG ZKHQ WKH PDUNHW LV FOHDUHG 7KH DJHQWV UHFHLYH D SRVLWLYH UHZDUG RQO\ ZKHQ WKH\ UHDFK WKH ILQDO VWDWH LH WKH\ DFW WRZDUGV WKH PDUNHW FOHDUDQFH 7KLV UHZDUG LV JLYHQ E\ WKH IXQFWLRQ U(p, m) ZKLFK LV D FRPELQDWLRQ REWDLQHG IURP WKH SURGXFW RI U1(p) DQG U2(m),WLV XVHG WR VWUHVVWKH IDFWWKDWERWKFULWHULDDUHLPSRUWDQWLQUHVRXUFH DOORFDWLRQ,QDOORWKHUVWDWHVWKHDJHQWUHFHLYHVDUHZDUGHTXDOWR ]HUR 7KHXWLOLW\IXQFWLRQVU1(p) DQG U2(m) XVHGE\WKHDJHQWVDUH:
LIp ° U p ® p LI d p d ° LIp ! ¯
AAMAS’07 0D\+RQROXOX+DZDL L86$ &RS\ULJKW,)$$0$6
680 c 978-81-904262-7-5 (RPS) 2007 IFAAMAS
7KH PDUNHW KDG D VXSSO\ RI XQLWV RI PHPRU\ LQ DOO H[SHULPHQWV 7KLV DPRXQW GRHV QRW SHUPLW IRU DOO WKH DJHQWV WR KDYHDFRPSOHWHVDWLVIDFWLRQEXWDOORZVXVWRDQDO\]HWKHEHKDYLRU RI WKH PDUNHW DQG WKH OHDUQLQJ PHFKDQLVP XQGHU D FRQGLWLRQ RI OLPLWHG VXSSO\ 7KH LQLWLDO SULFHV RI WKH UHVRXUFHV ZHUH REWDLQHG IURPDUDQGRPQXPEHUEHWZHHQDQGXQLWVRISULFH
10
0
2
4 6 price
8
10
8
10
L4
demand
demand
0
2
4 6 price
8
0
10
2
4 6 price
Figure 1. Agents’ demand functions with trend line. :H PDGH H[SHULPHQWV XVLQJ WKH GHPDQG IXQFWLRQV UHVXOWLQJ IURP WKH FXUYHILWWLQJ PHWKRG LQ WKH RUGLQDU\ ,3$ PDUNHW :H HYDOXDWHGWKHDJHQWVXVLQJWKHVHGHPDQGIXQFWLRQVDJDLQVWRWKHU DJHQWV ZLWK SUHGHILQHG ³VWDWLF´ GHPDQG IXQFWLRQV 7KH GHPDQG IXQFWLRQV RI WKHVH DJHQWV ZHUH GHILQHG E\ KDQG EDVHG RQ VXEMHFWLYH FULWHULD DQG WKH\ DOVR XVH WKH IXQFWLRQ U(p, m) WR HYDOXDWHWKH TXDOLW\RIWKH DOORFDWLRQ 7KHQHZDJHQWVDUH6 6 DQG 6 :HUDQDWRWDO RI H[SHULPHQWVXVLQJ WKH VDPHPDUNHW FRQILJXUDWLRQ DV LQ WKH OHDUQLQJ SKDVH DQG WKH VDPH QXPEHU RI DJHQWV 7KH UHVXOWV RI WKH H[SHULPHQWV ZHUH DVVHVVHG IURP DQ LQGLYLGXDO DQG D VRFLDO SHUVSHFWLYH 7KH LQGLYLGXDO SHUVSHFWLYH LV LPSRUWDQW EHFDXVHLWVKRZVKRZWKHDJHQWVSHUIRUPHGLQWHUPVRIWKHLURZQ XWLOLWLHV +RZHYHU WKH VRFLDO SHUVSHFWLYH LV WKH PRVW VXLWDEOH HYDOXDWLRQFULWHULDIRUWKLVZRUNDVWKHOHDUQHUDJHQWVZHUHWUDLQHG MRLQWO\DQGHQFRXUDJHGWRLPSURYHWKHVRFLDOZHOIDUH )LJXUH VKRZV WKH LQGLYLGXDO XWLOLWLHV IRU H[SHULPHQWV RI W\SH OHDUQHU YV OHDUQHU DQG VWDWLFYVVWDWLF,Q JHQHUDOWKH XWLOLWLHV RI WKH OHDUQHU DJHQWV DUH EHWWHU WKDQ WKH VWDWLF RQHV 7KHUH LV DOVR VRPHVLPLODULW\DPRQJWKHXWLOLWLHVUHFHLYHGE\WKHOHDUQHUDJHQWV 7KHVLWXDWLRQLVOLWWOH GLIIHUHQW IRUWKHVWDWLFDJHQWVZKHUHRQHRI WKHP 6DFKLHYHGDYHU\SRRULQGLYLGXDOSHUIRUPDQFH
S3xS3
S2xS3
S2xS2
S1xS3
S1xS2
S1xS1
L4xL4
L3xL4
L3xL3
L2xL4
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
L2xL3
,Q RUGHU WR DYRLG DQ\ SRVVLEOH GHDGORFN LI WKH OHDUQW GHPDQG IXQFWLRQVDUHXVHGGLUHFWO\LQDQRUGLQDU\PDUNHWUXQQLQJWKH,3$ PHWKRG ZH XVH WKH WUHQGV RI WKH GHPDQG IXQFWLRQV WR HYDOXDWH WKHLU SHUIRUPDQFH LQ WKH UHVRXUFH DOORFDWLRQ :H UDQGRPO\ VHOHFWHG RI RXU DJHQWV WR DSSO\ D FXUYHILWWLQJ PHWKRG :H XVHG WKH GHPDQG IXQFWLRQV REWDLQHG DW VWHS [ 7KH UDWLRQDOH IRU WKLV LV WKDW DW WKLV VWHS VRPH RI WKH DJHQWV KDYH DOUHDG\GHYHORSHG DGHPDQGIXQFWLRQ ZLWKDFRQVLVWHQW VKDSH DV \RXFDQQRWHLQWKH)LJXUH
8
4 2 0
10 8 6 4 2 0
L2xL2
3. USING THE LEARNT DEMAND FUNCTIONS
4 6 price
L1xL4
(YHQ WKRXJK WKH DJHQWV KDYH QRW DFKLHYHG H[DFWO\ WKH VDPH GHPDQG IXQFWLRQ WKH IXQFWLRQV REWDLQHG DUH FRQVLVWHQW DQG SUHVHQWHGDYHU\VLPLODURYHUDOOWUHQG
L2
10 8 6
L3
L1xL3
$QDO\]LQJ WKH HYROXWLRQ RI WKH GHPDQG IXQFWLRQV ZH DOVR QRWH WKDWLWLVQRWFRPSOHWHO\VWDEOH$OWKRXJKWKHVKDSHLVDOZD\VWKHUH DQGLVTXLWHVWDEOHEHWZHHQWKHVDPHYDOXHVIRUGHPDQGDQGSULFH WKHLQGLYLGXDOYDOXHVSUHVHQWHGVRPHYDULDELOLW\IURPRQHHSLVRGH WR DQRWKHU 7KLV EHKDYLRU FDQ KDYH LWV RULJLQ LQ VHYHUDO IDFWRUV IURPWKHDSSOLFDWLRQRIQOHDUQLQJIRUWZRVLPXOWDQHRXVOHDUQHUV SDVVLQJE\WKHQRWDSSURSULDWHGHFD\UXOHIRUWKHOHDUQLQJUDWHWR WKHGHVLJQFKRLFHVIRUWKHOHDUQLQJLPSOHPHQWDWLRQ
2
10 8 6 4 2 0
L1xL1
:H UDQ GLIIHUHQW H[SHULPHQWV DOO ZLWK WKH VDPH FRQILJXUDWLRQ JLYHQ DERYH (DFK H[SHULPHQW ZDV UXQ IRU D WRWDO RI [ HSLVRGHV )URP WKHVH H[SHULPHQWV ZH REWDLQHG DJHQWV 7KH HYROXWLRQ RI WKH GHPDQG IXQFWLRQV RI WKHVH DJHQWV RYHU WKH HSLVRGHVVKRZHGDVLPLODUEHKDYLRU,QWKHILUVWHSLVRGHV WKH\DUHVWLOOQRWVRFRQVLVWHQWZKDWFDQEHH[SODLQHGE\WKHIDFW WKDW WKH DJHQW KDV SUREDEO\ QRW YLVLWHG DOO WKH DFWLRQV LQ DOO WKH VWDWHV\HW)URPWKLVSRLQWRQWKH\VWDUWWRHYROYHWRZDUGVDZHOO GHILQHGWUHQGDV\RXFDQVHHLQ)LJXUHZKLFKVKRZVWKHGHPDQG IXQFWLRQV RI IRXU RI RXU DJHQWV $Q LQWHUHVWLQJ SRLQW LV WKDW WKH GHPDQGIXQFWLRQVGLGQRWSUHVHQWDKLJKGHPDQGIRUORZHUSULFHV DVLWPLJKWEHH[SHFWHGIURPREVHUYLQJWKHXWLOLW\IXQFWLRQV 7KLV EHKDYLRU LV PRVW OLNHO\ JHQHUDWHG IURP WKH OHDUQLQJ IRUPDW ZH XVHGZKLFKLQFHQWLYHVWKHVRFLDOZHOIDUHLPSURYHPHQWUDWKHUWKDQ WKHLQGLYLGXDOHYHQWKRXJKWKHDJHQWVDUHVHOILQWHUHVWHG
0
Utility
:H SHUIRUPHG D VHULHV RI SUHOLPLQDU\ H[SHULPHQWV LQ RUGHU WR LGHQWLI\ D IHDVLEOH FRQILJXUDWLRQ IRU WKH YDOXHV RI WKH SDUDPHWHUV XVHG LQ WKH OHDUQLQJDOJRULWKP%DVHGRQWKLVH[SHULPHQWVZH VHW Į Ȗ DQGİ 7KHSULFHRIWKH UHVRXUFHVLVDGMXVWHG E\ WKH ,3$ PDUNHW XVLQJ D FRQVWDQW SDUDPHWHU VHW WR 7KH FRQWLQXLW\RIWKHVWDWHVDQGDFWLRQVLVWUHDWHGE\WKHDSSOLFDWLRQRI D URXQGLQJ SURFHGXUH %RWK VWDWHV DQG DFWLRQV DUHURXQGHGWR GHFLPDOSODFH
L1
10 8 6 4 2 0
demand
demand
0,LIm ° ® Log m LI d m da ° ¯ 1, LIm !a
L1xL2
U m
Figure 2. Individual utilities in learner vs. learner and static vs. static experiments ([SHULPHQWVRIW\SHOHDUQHUYVVWDWLFVKRZHGWKDWWKHIRXUOHDUQHU DJHQWV SUHVHQWHG VLPLODU SHUIRUPDQFHV DJDLQVW WKH VDPH VWDWLF DJHQWV 7KLV ZDV H[SHFWHG DV WKH\ OHDUQ XVLQJ WKH VDPH VHW RI XWLOLW\ IXQFWLRQV 7KHVH H[SHULPHQWV DOVR SUHVHQWHG WKDW WKH SHUIRUPDQFH RI WKH OHDUQHU DJHQWV ZDV QRW VR JRRG LI FRPSDUHG
The Sixth Intl. Joint Conf. on Autonomous Agents and Multi-Agent Systems (AAMAS 07)
681
ZLWKWKHSHUIRUPDQFHRIWKHVWDWLFRQHV,WLVLPSRUWDQWWRPHQWLRQ WKDW WKLV GRHV QRW PHDQ WKDW WKH OHDUQHG GHPDQG IXQFWLRQ LV PHDQLQJOHVV2QHKDVWRFRQVLGHUWKHIDFWWKDWWKHOHDUQHUVKDGQRW FRQVLGHUHG WKH VWDWLF DJHQWV GXULQJ WKHLU OHDUQLQJ SKDVH DQG WKDW WKH\ DUH WUDLQHG WRJHWKHU WR DFKLHYH HTXLOLEULXP DQG REWDLQ D EHWWHU VRFLDO ZHOIDUH 7KLV REMHFWLYH LV VXFFHVVIXOO\ DFKLHYHG DV SUHVHQWHGQH[W 5HJDUGLQJ WKH DYHUDJH LQGLYLGXDO XWLOLWLHV REWDLQHG E\WKH DJHQWV RYHU WKH H[SHULPHQWV WKH DJHQWV XVLQJ WKH OHDUQHG IXQFWLRQ REWDLQHG DOPRVW WKH VDPH XWLOLW\ DV WKH VWDWLF DJHQWV ,W LV TXLWH LQWHUHVWLQJ LI \RX QRWH WKDW LQ WKH LQGLYLGXDO FRPSDULVRQV WKH OHDUQHU DJHQWV ZHUH EHDWHQ LQ PRVW RI WKH H[SHULPHQWV 7KLV HTXDOL]DWLRQ LV DFKLHYHG EHFDXVH ZKHQ UXQQLQJ DJDLQVW WKHPVHOYHVWKHOHDUQHUDJHQWVDFKLHYHDPXFKEHWWHUVROXWLRQ
0.3
0.4
0.4 0.2
b)
SxS
LxL
SxS
LxS
LxL
a)
LxS
0
0
7KH H[SHULPHQWV KDYH VKRZQ WKH IHDVLELOLW\ RI WKH DSSURDFK DQG SRLQWHG RXW WKDW IXUWKHU LQYHVWLJDWLRQV LQ WKLV GLUHFWLRQ DUH ZRUWKZKLOH$QLPPHGLDWHIXWXUHZRUNLVWRIXUWKHULQYHVWLJDWHWKH SUREOHP RI FRDGDSWDWLRQ DV ZH XVH DJHQWV EHLQJ WUDLQHG MRLQWO\ $QRWKHU IXWXUH ZRUN LV WKH H[WHQVLRQ RI WKH VFHQDULR WR EHWWHU UHIOHFWDUHDOGLVWULEXWHGV\VWHPVXFKDVWKH*ULG7KLVH[WHQVLRQ LVOLNHO\WRLQYROYHWKHXVHRIDJHQWVGHVFULEHGE\PXOWLSOHXWLOLW\ IXQFWLRQVDQGSDUWLFLSDWLQJLQPRUHWKDQRQHPDUNHWDWVDPHWLPH
6. REFERENCES
0.3 0.2 0.1 0 SxS
0.6
0.5
0.6
LxS
0.9
0.8
LxL
1.2
Average NP
1.5 Average ESW
Average USW
:HXVH WKH FRQFHSW RI 6RFLDO :HOIDUH 6: >@ WR HYDOXDWHKRZ WKH PDUNHW SHUIRUPHG XQGHU WKH VRFLDO SHUVSHFWLYH :H DSSOLHG WKUHH GLIIHUHQW IXQFWLRQV WR FDOFXODWH WKH 6: RI RXU PDUNHW WKH 8WLOLWDULDQ 6RFLDO :HOIDUH 86: GHILQHG DV WKH VXP RI LQGLYLGXDO XWLOLWLHV WKH (JDOLWDULDQ 6RFLDO :HOIDUH (6: JLYHQ E\ WKH XWLOLW\ RI WKH DJHQW ZKLFK LV ZRUVW RII DQG WKH 1DVK 3URGXFW 13 GHILQHG DV WKH SURGXFW RI WKH LQGLYLGXDO XWLOLWLHV 7KHVHWKUHH 6: IXQFWLRQV VKRZHGWKDW WKH PDUNHW¶VSHUIRUPDQFH LVLPSURYHGXVLQJWKHDJHQWVZLWKOHDUQWGHPDQGIXQFWLRQVDVLWLV SUHVHQWHGLQ)LJXUHD E DQGF
ZKHUH WKH DJHQW¶V SUHIHUHQFHV LQ WKH UHVRXUFH DOORFDWLRQ DUH GHVFULEHGE\XWLOLW\IXQFWLRQVDQGWKH\OHDUQWKHGHPDQGIXQFWLRQV JLYHQWKHLUXWLOLW\IXQFWLRQV7KHH[SHULPHQWVZHUHGLYLGHGLQWZR SKDVHV ,Q WKH ILUVWSKDVHZHDSSOLHG WKH QOHDUQLQJDOJRULWKPWR OHWWKHDJHQWVOHDUQWKHLUGHPDQGIXQFWLRQV7KLVSKDVHFRQVLGHUHG D VLQJOH PDUNHW FRPSRVHG RI WZR FOLHQW DJHQWV 7KH\ DUH WUDLQHG MRLQWO\ DQG HQFRXUDJHG WR LPSURYH WKH PDUNHW¶V VRFLDO ZHOIDUH ZKLOHEHLQJVHOILQWHUHVWHG7KHVHFRQGSKDVHZDVWKHDSSOLFDWLRQ RI WKH OHDUQW GHPDQG IXQFWLRQV LQ WKH VWDQGDUG ,3$ PDUNHW 7KH UHVXOWVRIWKHH[SHULPHQWVKDYHVKRZQWKDWWKURXJKWKHDSSOLFDWLRQ RI RUGLQDU\ 4OHDUQLQJ WKH DJHQWV ZHUH DEOH WR OHDUQ PHDQLQJIXO GHPDQG IXQFWLRQV 8QGHU DQ LQGLYLGXDO SHUVSHFWLYHDJHQWVXVLQJ WKHOHDUQWGHPDQGIXQFWLRQVSHUIRUPHGZHOOLQFRPSDULVRQWRRQHV ZLWK GHPDQG IXQFWLRQV GHILQHG ³E\ KDQG´ 0RUH UHPDUNDEO\ XQGHU D VRFLDO SHUVSHFWLYH WKH DJHQWV XVLQJ WKH OHDUQW GHPDQG IXQFWLRQVDFKLHYHGDPXFKEHWWHUVROXWLRQEHLQJDEOHWRLPSURYH WKHV\VWHP¶VVRFLDOZHOIDUH
c)
Figure 3. a) Average USW; b) Average ESW; c) Average NP
4. RELATED WORKS $V IDU DV ZH DUH DZDUH QR ZRUN KDV DGGUHVVHG WKH SUREOHP RI OHDUQLQJ D GHPDQG IXQFWLRQ EDVHG RQ D VHW RI XWLOLW\ IXQFWLRQV +RZHYHUWKHXVHRIUHLQIRUFHPHQWOHDUQLQJLQUHVRXUFHDOORFDWLRQ LV QRW QHZ *DOVW\DQ et al >@ XVHG UHLQIRUFHPHQW OHDUQLQJ LQ D VFHQDULR ZKHUH D ODUJH QXPEHU RI XVHUV VXEPLW WKHLU MREV WR UHVRXUFHV WKDW DUH VFKHGXOHG E\ D ORFDO VFKHGXOHU LQ D JULG HQYLURQPHQW $EGDOODK DQG /HVVHU >@ SURSRVHG D PXOWLDJHQW UHLQIRUFHPHQW OHDUQLQJ DOJRULWKP DQG DSSOLHG WKLV DOJRULWKP LQ GLVWULEXWHGWDVNDOORFDWLRQ
>@ $EGDOODK6DQG/HVVHU9/HDUQLQJWKHWDVNDOORFDWLRQ JDPH,QProceedings of the Fifth international Joint Conference on Autonomous Agents and Multiagent Systems +DNRGDWH-DSDQ0D\ $$0$6 $&0 3UHVV1HZ@ (YHUHWW+*HQHUDOL]HG/DJUDQJH0XOWLSOLHU0HWKRGIRU 6ROYLQJ3UREOHPVRI2SWLPXP$OORFDWLRQRI5HVRXUFH Operations Research
5HJDUGLQJ WKH DSSOLFDWLRQ RI XWLOLW\DZDUH DJHQWV &KXQOLQ /D\XDQ>@GHYHORSHGDQDOJRULWKPEDVHGRQXWLOLW\IXQFWLRQVIRU UHVRXUFH DOORFDWLRQ IRU WKH *ULG +RZHYHU VXFK DOJRULWKP GRHV QRW PDNHDQ\UHIHUHQFHDERXW DJHQWVKDYLQJSUHIHUHQFHVRYHUWKH SULFHRIWKHUHVRXUFHVDVZHPDNHLQRXUZRUN
>@ *DOVW\DQ$&]DMNRZVNL.DQG/HUPDQ.5HVRXUFH $OORFDWLRQLQWKH*ULG8VLQJ5HLQIRUFHPHQW/HDUQLQJ,Q Proceedings of the Third international Joint Conference on Autonomous Agents and Multiagent Systems 9ROXPH 1HZ@ :X7