Reinforcement Learning with Utility-aware Agents for Market ... - ifaamas

0 downloads 0 Views 152KB Size Report
one considers agents with multiple utility functions. In addition, ... demand functions given their utility functions. ... This reward is given by the function U(p, m) which is a ... used the demand functions obtained at step 450 x 10 3. The rationale for ...
Reinforcement Learning with Utility-aware Agents for Market-based Resource Allocation Eduardo Rodrigues Gomes

Ryszard Kowalczyk

Swinburne University of Technology Faculty of Information and Communication Technology Hawthorn, 3122 Victoria, Australia

Swinburne University of Technology Faculty of Information and Communication Technology Hawthorn, 3122 Victoria, Australia

[email protected]

[email protected]

ABSTRACT ,QWKLVSDSHUZHSURSRVHDQGLQYHVWLJDWHWKHXVHRI5HLQIRUFHPHQW /HDUQLQJLQDPDUNHWEDVHGUHVRXUFHDOORFDWLRQPHFKDQLVPFDOOHG ,WHUDWLYH 3ULFH $GMXVWPHQW 8QGHU VWDQGDUG DVVXPSWLRQV WKLV PHFKDQLVPXVHVGHPDQGIXQFWLRQVWKDWGRQRWDOORZWKHDJHQWVWR KDYHSUHIHUHQFHVRYHUWKHDWWULEXWHVRIWKHDOORFDWLRQHJWKHSULFH RI WKH UHVRXUFHV 7R DGGUHVV WKLV OLPLWDWLRQ ZH VWXG\ WKH FDVH ZKHUH WKH DJHQW¶V SUHIHUHQFHV LQ WKH UHVRXUFH DOORFDWLRQ DUH GHVFULEHGE\XWLOLW\IXQFWLRQVDQGWKH\OHDUQWKHGHPDQGIXQFWLRQV JLYHQ WKHLU XWLOLW\ IXQFWLRQV 7KH DSSURDFK KDV EHHQ HYDOXDWHG ZLWKH[WHQVLYHH[SHULPHQWV

Categories and Subject Descriptors , >Artificial Intelligence@ /HDUQLQJ , >Distributed Artificial Intelligence@0XOWLDJHQW6\VWHPV

General Terms ([SHULPHQWDWLRQ$OJRULWKPV

Keywords 5HLQIRUFHPHQW/HDUQLQJ0DUNHWEDVHG5HVRXUFH$OORFDWLRQ

1. INTRODUCTION 0DUNHWEDVHG UHVRXUFH DOORFDWLRQ PHFKDQLVPV RIIHU D SURPLVLQJ DSSURDFK IRU V\VWHPV WKDW QHHG GLVWULEXWHG UHVRXUFH DOORFDWLRQ ZLWKRXW FHQWUDOL]HG FRQWURO >@ IRU H[DPSOH WKH *5,' 2QH RI WKRVH PHFKDQLVPV LV WKH ,WHUDWLYH 3ULFH $GMXVWPHQW ,3$  >@ 7KLVPHFKDQLVPXVHVWKHFRQFHSWRID³SULFH´ZKLFKLVLWHUDWLYHO\ DGMXVWHG WR ILQG HTXLOLEULXP EHWZHHQ D VHW RI GHPDQGV DQG D OLPLWHGVXSSO\RIUHVRXUFHV ,Q WKH ,3$ WKH LQWHUHVWV RI WKH DJHQWV LQ WKH UHVRXUFH DOORFDWLRQ DUH GHVFULEHG E\ PHDQV RI GHPDQG IXQFWLRQV 8QGHU VWDQGDUG DVVXPSWLRQV WKRVH GHPDQG IXQFWLRQV VSHFLI\ D UHODWLRQVKLS EHWZHHQ SULFHDQGGHPDQGDQGDVVXFKGRQRWDOORZWKHDJHQWVWR KDYHSUHIHUHQFHVRYHUWKHDWWULEXWHVRIWKHDOORFDWLRQIRUH[DPSOH WKH SULFH ,W PDNHV GLIILFXOW WR LQIOXHQFH DQG RSWLPL]H WKH DOORFDWLRQ TXDOLW\ LQ WHUPV RI WKH XWLOLW\ UHFHLYHG E\ WKH DJHQWV 2QHRIWKHDOWHUQDWLYHVWRDGGUHVVWKLVSUREOHPLVWROHWWKHDJHQWV WRGHVFULEHWKHLULQWHUHVWVXVLQJXWLOLW\IXQFWLRQVLQVWHDGRIGHPDQG 3HUPLVVLRQWRPDNHGLJLWDORUKDUGFRSLHVRIDOORUSDUWRIWKLVZRUNIRU SHUVRQDORUFODVVURRP XVH LV JUDQWHG ZLWKRXW IHH SURYLGHGWKDW FRSLHV DUHQRWPDGHRUGLVWULEXWHGIRUSURILWRUFRPPHUFLDODGYDQWDJHDQGWKDW FRSLHVEHDUWKLV QRWLFHDQG WKHIXOOFLWDWLRQRQWKHILUVWSDJH7RFRS\ RWKHUZLVH RU UHSXEOLVK WR SRVW RQ VHUYHUV RU WR UHGLVWULEXWH WR OLVWV UHTXLUHVSULRUVSHFLILFSHUPLVVLRQDQGRUDIHH

IXQFWLRQV ,Q VXFK D VFHQDULR KRZHYHUWKH PDSSLQJ EHWZHHQWKH XWLOLW\IXQFWLRQVDQGDGHPDQGIXQFWLRQLVQRWXQLTXH,I³PDGHE\ KDQG´ WKH GHILQLWLRQ RI WKLV PDSSLQJLVYHU\VXEMHFWLYHDQG PD\ WXUQ LQWR D YHU\ FRPSOH[ DQG WLPHFRQVXPLQJ WDVN HVSHFLDOO\ LI RQH FRQVLGHUV DJHQWV ZLWK PXOWLSOH XWLOLW\ IXQFWLRQV ,Q DGGLWLRQ WKH UHVXOWLQJ DJHQWV PLJKW QRW SHUIRUP ZHOO LQ WKH UHDO V\VWHP 7KXV LQ WKLV SDSHU ZH SURSRVH DQG LQYHVWLJDWH WKH XVH RI 5HLQIRUFHPHQW /HDUQLQJ 5/  >@ WR OHW WKH DJHQWV OHDUQ WKH EHVW GHPDQG IXQFWLRQV JLYHQ WKHLU XWLOLW\ IXQFWLRQV 7KH UHVXOWLQJ GHPDQGIXQFWLRQVDUHDSSOLHGLQDQRUGLQDU\,3$PDUNHW IRUWKHLU HYDOXDWLRQ

2. LEARNING THE DEMAND FUNCTIONS ,Q WKH H[SHULPHQWV ZH FRQVLGHUHG D VLQJOH ,3$ PDUNHW ZLWK WZR FOLHQWDJHQWV7KHDJHQWVKDYHSUHIHUHQFHVRYHUWKHSULFHDQGRYHU WKHDPRXQWRIUHVRXUFHV6XFKSUHIHUHQFHVDUHUHSUHVHQWHGE\WZR XWLOLW\ IXQFWLRQV U1(p), IRU SULFH DQG U2(m), IRU WKH DPRXQW RI UHVRXUFH )RU WKH VDNH RI VLPSOLFLW\ EXW ZLWKRXW ORRVLQJ WKH JHQHUDOLW\ZHXVHRQO\RQHW\SHRIUHVRXUFHHJmemory :H XVHG WKH RUGLQDU\ QOHDUQLQJ DOJRULWKP >@ DQG WKH İJUHHG\ DFWLRQVHOHFWLRQPHFKDQLVPIRUWKHOHDUQLQJ7KHFXUUHQWSULFHVRI WKH UHVRXUFHV DW HDFK VWHS RI WKH ,3$ QHJRWLDWLRQ SURFHVV DUH PDSSHGLQWRWKHHQYLURQPHQWVWDWHVDVWKLVLVWKHRQO\LQIRUPDWLRQ WKH FOLHQWVKDYHDYDLODEOHLQ WKH ,3$PDUNHW7KHDJHQWV¶DFWLRQV DUHPDSSHGWRWKHDPRXQWVRIUHVRXUFHWKH\FDQUHTXLUH$QGWKH UHZDUGVDUHPDSSHGWRDIXQFWLRQRIWKHXWLOLWLHVWKHDJHQWUHFHLYHV DWWKHHQGRIWKHDOORFDWLRQSURFHGXUH 7R HQFRXUDJH WKH DJHQWV WR LPSURYH WKH VRFLDO ZHOIDUH WKH\ DUH WUDLQHG MRLQWO\ DQG XVLQJ WKH VDPH UHZDUG IXQFWLRQV 7KH ILQDO VWDWH RI HDFK OHDUQLQJ HSLVRGH LV UHDFKHG ZKHQ WKH PDUNHW LV FOHDUHG 7KH DJHQWV UHFHLYH D SRVLWLYH UHZDUG RQO\ ZKHQ WKH\ UHDFK WKH ILQDO VWDWH LH WKH\ DFW WRZDUGV WKH PDUNHW FOHDUDQFH 7KLV UHZDUG LV JLYHQ E\ WKH IXQFWLRQ U(p, m) ZKLFK LV D FRPELQDWLRQ REWDLQHG IURP WKH SURGXFW RI U1(p) DQG U2(m),WLV XVHG WR VWUHVVWKH IDFWWKDWERWKFULWHULDDUHLPSRUWDQWLQUHVRXUFH DOORFDWLRQ,QDOORWKHUVWDWHVWKHDJHQWUHFHLYHVDUHZDUGHTXDOWR ]HUR 7KHXWLOLW\IXQFWLRQVU1(p) DQG U2(m) XVHGE\WKHDJHQWVDUH:

­ LIp   ° U  p ®  p      LI d p d  °  LIp !  ¯

AAMAS’07 0D\+RQROXOX+DZDL L86$ &RS\ULJKW,)$$0$6

680 c 978-81-904262-7-5 (RPS) 2007 IFAAMAS

7KH PDUNHW KDG D VXSSO\ RI  XQLWV RI PHPRU\ LQ DOO H[SHULPHQWV 7KLV DPRXQW GRHV QRW SHUPLW IRU DOO WKH DJHQWV WR KDYHDFRPSOHWHVDWLVIDFWLRQEXWDOORZVXVWRDQDO\]HWKHEHKDYLRU RI WKH PDUNHW DQG WKH OHDUQLQJ PHFKDQLVP XQGHU D FRQGLWLRQ RI OLPLWHG VXSSO\ 7KH LQLWLDO SULFHV RI WKH UHVRXUFHV ZHUH REWDLQHG IURPDUDQGRPQXPEHUEHWZHHQDQGXQLWVRISULFH

10

0

2

4 6 price

8

10

8

10

L4

demand

demand

0

2

4 6 price

8

0

10

2

4 6 price

Figure 1. Agents’ demand functions with trend line. :H PDGH H[SHULPHQWV XVLQJ WKH  GHPDQG IXQFWLRQV UHVXOWLQJ IURP WKH FXUYHILWWLQJ PHWKRG LQ WKH RUGLQDU\ ,3$ PDUNHW :H HYDOXDWHGWKHDJHQWVXVLQJWKHVHGHPDQGIXQFWLRQVDJDLQVWRWKHU DJHQWV ZLWK SUHGHILQHG ³VWDWLF´ GHPDQG IXQFWLRQV 7KH GHPDQG IXQFWLRQV RI WKHVH DJHQWV ZHUH GHILQHG E\ KDQG EDVHG RQ VXEMHFWLYH FULWHULD DQG WKH\ DOVR XVH WKH IXQFWLRQ U(p, m) WR HYDOXDWHWKH TXDOLW\RIWKH DOORFDWLRQ 7KHQHZDJHQWVDUH6 6 DQG 6 :HUDQDWRWDO RI H[SHULPHQWVXVLQJ WKH VDPHPDUNHW FRQILJXUDWLRQ DV LQ WKH OHDUQLQJ SKDVH DQG WKH VDPH QXPEHU RI DJHQWV 7KH UHVXOWV RI WKH H[SHULPHQWV ZHUH DVVHVVHG IURP DQ LQGLYLGXDO DQG D VRFLDO SHUVSHFWLYH 7KH LQGLYLGXDO SHUVSHFWLYH LV LPSRUWDQW EHFDXVHLWVKRZVKRZWKHDJHQWVSHUIRUPHGLQWHUPVRIWKHLURZQ XWLOLWLHV +RZHYHU WKH VRFLDO SHUVSHFWLYH LV WKH PRVW VXLWDEOH HYDOXDWLRQFULWHULDIRUWKLVZRUNDVWKHOHDUQHUDJHQWVZHUHWUDLQHG MRLQWO\DQGHQFRXUDJHGWRLPSURYHWKHVRFLDOZHOIDUH )LJXUH  VKRZV WKH LQGLYLGXDO XWLOLWLHV IRU H[SHULPHQWV RI W\SH OHDUQHU YV OHDUQHU DQG VWDWLFYVVWDWLF,Q JHQHUDOWKH XWLOLWLHV RI WKH OHDUQHU DJHQWV DUH EHWWHU WKDQ WKH VWDWLF RQHV 7KHUH LV DOVR VRPHVLPLODULW\DPRQJWKHXWLOLWLHVUHFHLYHGE\WKHOHDUQHUDJHQWV 7KHVLWXDWLRQLVOLWWOH GLIIHUHQW IRUWKHVWDWLFDJHQWVZKHUHRQHRI WKHP 6DFKLHYHGDYHU\SRRULQGLYLGXDOSHUIRUPDQFH

S3xS3

S2xS3

S2xS2

S1xS3

S1xS2

S1xS1

L4xL4

L3xL4

L3xL3

L2xL4

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

L2xL3

,Q RUGHU WR DYRLG DQ\ SRVVLEOH GHDGORFN LI WKH OHDUQW GHPDQG IXQFWLRQVDUHXVHGGLUHFWO\LQDQRUGLQDU\PDUNHWUXQQLQJWKH,3$ PHWKRG ZH XVH WKH WUHQGV RI WKH GHPDQG IXQFWLRQV WR HYDOXDWH WKHLU SHUIRUPDQFH LQ WKH UHVRXUFH DOORFDWLRQ :H UDQGRPO\ VHOHFWHG  RI RXU  DJHQWV WR DSSO\ D FXUYHILWWLQJ PHWKRG :H XVHG WKH GHPDQG IXQFWLRQV REWDLQHG DW VWHS  [   7KH UDWLRQDOH IRU WKLV LV WKDW DW WKLV VWHS VRPH RI WKH DJHQWV KDYH DOUHDG\GHYHORSHG DGHPDQGIXQFWLRQ ZLWKDFRQVLVWHQW VKDSH DV \RXFDQQRWHLQWKH)LJXUH

8

4 2 0

10 8 6 4 2 0

L2xL2

3. USING THE LEARNT DEMAND FUNCTIONS

4 6 price

L1xL4

(YHQ WKRXJK WKH DJHQWV KDYH QRW DFKLHYHG H[DFWO\ WKH VDPH GHPDQG IXQFWLRQ WKH IXQFWLRQV REWDLQHG DUH FRQVLVWHQW DQG SUHVHQWHGDYHU\VLPLODURYHUDOOWUHQG

L2

10 8 6

L3

L1xL3

$QDO\]LQJ WKH HYROXWLRQ RI WKH GHPDQG IXQFWLRQV ZH DOVR QRWH WKDWLWLVQRWFRPSOHWHO\VWDEOH$OWKRXJKWKHVKDSHLVDOZD\VWKHUH DQGLVTXLWHVWDEOHEHWZHHQWKHVDPHYDOXHVIRUGHPDQGDQGSULFH WKHLQGLYLGXDOYDOXHVSUHVHQWHGVRPHYDULDELOLW\IURPRQHHSLVRGH WR DQRWKHU 7KLV EHKDYLRU FDQ KDYH LWV RULJLQ LQ VHYHUDO IDFWRUV IURPWKHDSSOLFDWLRQRIQOHDUQLQJIRUWZRVLPXOWDQHRXVOHDUQHUV SDVVLQJE\WKHQRWDSSURSULDWHGHFD\UXOHIRUWKHOHDUQLQJUDWHWR WKHGHVLJQFKRLFHVIRUWKHOHDUQLQJLPSOHPHQWDWLRQ

2

10 8 6 4 2 0

L1xL1

:H UDQ  GLIIHUHQW H[SHULPHQWV DOO ZLWK WKH VDPH FRQILJXUDWLRQ JLYHQ DERYH (DFK H[SHULPHQW ZDV UXQ IRU D WRWDO RI [  HSLVRGHV )URP WKHVH H[SHULPHQWV ZH REWDLQHG  DJHQWV 7KH HYROXWLRQ RI WKH GHPDQG IXQFWLRQV RI WKHVH DJHQWV RYHU WKH HSLVRGHVVKRZHGDVLPLODUEHKDYLRU,QWKHILUVWHSLVRGHV WKH\DUHVWLOOQRWVRFRQVLVWHQWZKDWFDQEHH[SODLQHGE\WKHIDFW WKDW WKH DJHQW KDV SUREDEO\ QRW YLVLWHG DOO WKH DFWLRQV LQ DOO WKH VWDWHV\HW)URPWKLVSRLQWRQWKH\VWDUWWRHYROYHWRZDUGVDZHOO GHILQHGWUHQGDV\RXFDQVHHLQ)LJXUHZKLFKVKRZVWKHGHPDQG IXQFWLRQV RI IRXU RI RXU DJHQWV $Q LQWHUHVWLQJ SRLQW LV WKDW WKH GHPDQGIXQFWLRQVGLGQRWSUHVHQWDKLJKGHPDQGIRUORZHUSULFHV DVLWPLJKWEHH[SHFWHGIURPREVHUYLQJWKHXWLOLW\IXQFWLRQV 7KLV EHKDYLRU LV PRVW OLNHO\ JHQHUDWHG IURP WKH OHDUQLQJ IRUPDW ZH XVHGZKLFKLQFHQWLYHVWKHVRFLDOZHOIDUHLPSURYHPHQWUDWKHUWKDQ WKHLQGLYLGXDOHYHQWKRXJKWKHDJHQWVDUHVHOILQWHUHVWHG

0

Utility

:H SHUIRUPHG D VHULHV RI SUHOLPLQDU\ H[SHULPHQWV LQ RUGHU WR LGHQWLI\ D IHDVLEOH FRQILJXUDWLRQ IRU WKH YDOXHV RI WKH SDUDPHWHUV XVHG LQ WKH OHDUQLQJDOJRULWKP%DVHGRQWKLVH[SHULPHQWVZH VHW Į Ȗ DQGİ  7KHSULFHRIWKH UHVRXUFHVLVDGMXVWHG E\ WKH ,3$ PDUNHW XVLQJ D FRQVWDQW SDUDPHWHU VHW WR  7KH FRQWLQXLW\RIWKHVWDWHVDQGDFWLRQVLVWUHDWHGE\WKHDSSOLFDWLRQRI D URXQGLQJ SURFHGXUH %RWK VWDWHV DQG DFWLRQV DUHURXQGHGWR  GHFLPDOSODFH

L1

10 8 6 4 2 0

demand

demand

0,LIm   ­ ° ® Log m    LI d m da  ° ¯ 1, LIm !a 

L1xL2

U  m

Figure 2. Individual utilities in learner vs. learner and static vs. static experiments ([SHULPHQWVRIW\SHOHDUQHUYVVWDWLFVKRZHGWKDWWKHIRXUOHDUQHU DJHQWV SUHVHQWHG VLPLODU SHUIRUPDQFHV DJDLQVW WKH VDPH VWDWLF DJHQWV 7KLV ZDV H[SHFWHG DV WKH\ OHDUQ XVLQJ WKH VDPH VHW RI XWLOLW\ IXQFWLRQV 7KHVH H[SHULPHQWV DOVR SUHVHQWHG WKDW WKH SHUIRUPDQFH RI WKH OHDUQHU DJHQWV ZDV QRW VR JRRG LI FRPSDUHG

The Sixth Intl. Joint Conf. on Autonomous Agents and Multi-Agent Systems (AAMAS 07)

681

ZLWKWKHSHUIRUPDQFHRIWKHVWDWLFRQHV,WLVLPSRUWDQWWRPHQWLRQ WKDW WKLV GRHV QRW PHDQ WKDW WKH OHDUQHG GHPDQG IXQFWLRQ LV PHDQLQJOHVV2QHKDVWRFRQVLGHUWKHIDFWWKDWWKHOHDUQHUVKDGQRW FRQVLGHUHG WKH VWDWLF DJHQWV GXULQJ WKHLU OHDUQLQJ SKDVH DQG WKDW WKH\ DUH WUDLQHG WRJHWKHU WR DFKLHYH HTXLOLEULXP DQG REWDLQ D EHWWHU VRFLDO ZHOIDUH 7KLV REMHFWLYH LV VXFFHVVIXOO\ DFKLHYHG DV SUHVHQWHGQH[W 5HJDUGLQJ WKH DYHUDJH LQGLYLGXDO XWLOLWLHV REWDLQHG E\WKH DJHQWV RYHU WKH H[SHULPHQWV WKH DJHQWV XVLQJ WKH OHDUQHG IXQFWLRQ REWDLQHG DOPRVW WKH VDPH XWLOLW\ DV WKH VWDWLF DJHQWV ,W LV TXLWH LQWHUHVWLQJ LI \RX QRWH WKDW LQ WKH LQGLYLGXDO FRPSDULVRQV WKH OHDUQHU DJHQWV ZHUH EHDWHQ LQ PRVW RI WKH H[SHULPHQWV 7KLV HTXDOL]DWLRQ LV DFKLHYHG EHFDXVH ZKHQ UXQQLQJ DJDLQVW WKHPVHOYHVWKHOHDUQHUDJHQWVDFKLHYHDPXFKEHWWHUVROXWLRQ

0.3

0.4

0.4 0.2

b)

SxS

LxL

SxS

LxS

LxL

a)

LxS

0

0

7KH H[SHULPHQWV KDYH VKRZQ WKH IHDVLELOLW\ RI WKH DSSURDFK DQG SRLQWHG RXW WKDW IXUWKHU LQYHVWLJDWLRQV LQ WKLV GLUHFWLRQ DUH ZRUWKZKLOH$QLPPHGLDWHIXWXUHZRUNLVWRIXUWKHULQYHVWLJDWHWKH SUREOHP RI FRDGDSWDWLRQ DV ZH XVH DJHQWV EHLQJ WUDLQHG MRLQWO\ $QRWKHU IXWXUH ZRUN LV WKH H[WHQVLRQ RI WKH VFHQDULR WR EHWWHU UHIOHFWDUHDOGLVWULEXWHGV\VWHPVXFKDVWKH*ULG7KLVH[WHQVLRQ LVOLNHO\WRLQYROYHWKHXVHRIDJHQWVGHVFULEHGE\PXOWLSOHXWLOLW\ IXQFWLRQVDQGSDUWLFLSDWLQJLQPRUHWKDQRQHPDUNHWDWVDPHWLPH

6. REFERENCES

0.3 0.2 0.1 0 SxS

0.6

0.5

0.6

LxS

0.9

0.8

LxL

1.2

Average NP

1.5 Average ESW

Average USW

:HXVH WKH FRQFHSW RI 6RFLDO :HOIDUH 6: >@ WR HYDOXDWHKRZ WKH PDUNHW SHUIRUPHG XQGHU WKH VRFLDO SHUVSHFWLYH :H DSSOLHG WKUHH GLIIHUHQW IXQFWLRQV WR FDOFXODWH WKH 6: RI RXU PDUNHW WKH 8WLOLWDULDQ 6RFLDO :HOIDUH 86:  GHILQHG DV WKH VXP RI LQGLYLGXDO XWLOLWLHV WKH (JDOLWDULDQ 6RFLDO :HOIDUH (6:  JLYHQ E\ WKH XWLOLW\ RI WKH DJHQW ZKLFK LV ZRUVW RII DQG WKH 1DVK 3URGXFW 13  GHILQHG DV WKH SURGXFW RI WKH LQGLYLGXDO XWLOLWLHV 7KHVHWKUHH 6: IXQFWLRQV VKRZHGWKDW WKH PDUNHW¶VSHUIRUPDQFH LVLPSURYHGXVLQJWKHDJHQWVZLWKOHDUQWGHPDQGIXQFWLRQVDVLWLV SUHVHQWHGLQ)LJXUHD E DQGF 

ZKHUH WKH DJHQW¶V SUHIHUHQFHV LQ WKH UHVRXUFH DOORFDWLRQ DUH GHVFULEHGE\XWLOLW\IXQFWLRQVDQGWKH\OHDUQWKHGHPDQGIXQFWLRQV JLYHQWKHLUXWLOLW\IXQFWLRQV7KHH[SHULPHQWVZHUHGLYLGHGLQWZR SKDVHV ,Q WKH ILUVWSKDVHZHDSSOLHG WKH QOHDUQLQJDOJRULWKPWR OHWWKHDJHQWVOHDUQWKHLUGHPDQGIXQFWLRQV7KLVSKDVHFRQVLGHUHG D VLQJOH PDUNHW FRPSRVHG RI WZR FOLHQW DJHQWV 7KH\ DUH WUDLQHG MRLQWO\ DQG HQFRXUDJHG WR LPSURYH WKH PDUNHW¶V VRFLDO ZHOIDUH ZKLOHEHLQJVHOILQWHUHVWHG7KHVHFRQGSKDVHZDVWKHDSSOLFDWLRQ RI WKH OHDUQW GHPDQG IXQFWLRQV LQ WKH VWDQGDUG ,3$ PDUNHW 7KH UHVXOWVRIWKHH[SHULPHQWVKDYHVKRZQWKDWWKURXJKWKHDSSOLFDWLRQ RI RUGLQDU\ 4OHDUQLQJ WKH DJHQWV ZHUH DEOH WR OHDUQ PHDQLQJIXO GHPDQG IXQFWLRQV 8QGHU DQ LQGLYLGXDO SHUVSHFWLYHDJHQWVXVLQJ WKHOHDUQWGHPDQGIXQFWLRQVSHUIRUPHGZHOOLQFRPSDULVRQWRRQHV ZLWK GHPDQG IXQFWLRQV GHILQHG ³E\ KDQG´ 0RUH UHPDUNDEO\ XQGHU D VRFLDO SHUVSHFWLYH WKH DJHQWV XVLQJ WKH OHDUQW GHPDQG IXQFWLRQVDFKLHYHGDPXFKEHWWHUVROXWLRQEHLQJDEOHWRLPSURYH WKHV\VWHP¶VVRFLDOZHOIDUH

c)

Figure 3. a) Average USW; b) Average ESW; c) Average NP

4. RELATED WORKS $V IDU DV ZH DUH DZDUH QR ZRUN KDV DGGUHVVHG WKH SUREOHP RI OHDUQLQJ D GHPDQG IXQFWLRQ EDVHG RQ D VHW RI XWLOLW\ IXQFWLRQV +RZHYHUWKHXVHRIUHLQIRUFHPHQWOHDUQLQJLQUHVRXUFHDOORFDWLRQ LV QRW QHZ *DOVW\DQ et al >@ XVHG UHLQIRUFHPHQW OHDUQLQJ LQ D VFHQDULR ZKHUH D ODUJH QXPEHU RI XVHUV VXEPLW WKHLU MREV WR UHVRXUFHV WKDW DUH VFKHGXOHG E\ D ORFDO VFKHGXOHU LQ D JULG HQYLURQPHQW $EGDOODK DQG /HVVHU >@ SURSRVHG D PXOWLDJHQW UHLQIRUFHPHQW OHDUQLQJ DOJRULWKP DQG DSSOLHG WKLV DOJRULWKP LQ GLVWULEXWHGWDVNDOORFDWLRQ

>@ $EGDOODK6DQG/HVVHU9/HDUQLQJWKHWDVNDOORFDWLRQ JDPH,QProceedings of the Fifth international Joint Conference on Autonomous Agents and Multiagent Systems +DNRGDWH-DSDQ0D\ $$0$6 $&0 3UHVV1HZ@ (YHUHWW+*HQHUDOL]HG/DJUDQJH0XOWLSOLHU0HWKRGIRU 6ROYLQJ3UREOHPVRI2SWLPXP$OORFDWLRQRI5HVRXUFH Operations Research

5HJDUGLQJ WKH DSSOLFDWLRQ RI XWLOLW\DZDUH DJHQWV &KXQOLQ  /D\XDQ>@GHYHORSHGDQDOJRULWKPEDVHGRQXWLOLW\IXQFWLRQVIRU UHVRXUFH DOORFDWLRQ IRU WKH *ULG +RZHYHU VXFK DOJRULWKP GRHV QRW PDNHDQ\UHIHUHQFHDERXW DJHQWVKDYLQJSUHIHUHQFHVRYHUWKH SULFHRIWKHUHVRXUFHVDVZHPDNHLQRXUZRUN

>@ *DOVW\DQ$&]DMNRZVNL.DQG/HUPDQ.5HVRXUFH $OORFDWLRQLQWKH*ULG8VLQJ5HLQIRUFHPHQW/HDUQLQJ,Q Proceedings of the Third international Joint Conference on Autonomous Agents and Multiagent Systems 9ROXPH 1HZ@ :X7