An Automated Negotiation Mechanism Based on ... - ACM Digital Library

1 downloads 0 Views 460KB Size Report
An Automated Negotiation Mechanism Based on Co-. Evolution and Game Theory. Jen-Hsiang Chen. DKERG. School of MIS,. Coventry University,. UK.
An Automated Negotiation Mechanism Based on CoEvolution and Game Theory Jen-Hsiang Chen DKERG School of MIS,

Kuo-Ming Chao Nick Godwin Colin Reeves DKERG DKERG SORG School of MIS, School of MIS, School of MIS CoventryUniversity, Coventry University, Coventry University, Coventry UK UK UK University, UK (44) 24 7688 7790 (44) 24 7688 8908 (44) 24 7688 8908 (44) 24 7688 7790 j.chen@ k.chao@ a.n.godwin@ c.reeves@ coventry.ac.uk coventry.ac.uk coventry.ac.uk coventry.ac.uk

Abstract The problems associated with current automated negotiation approaches are of little feasibility in practical industry applications. This paper describes a new method that combines a game theory approach and a co-evolutionary approach to support an effective negotiation model for agents to resolve conflict. Under this proposed method, the agents without knowing the other agent's strategies and payoffs, produce an optimised resolution that complies Nash equilibrium and Pareto efficiency concepts. We use a finitely repeated prisoner's dilemma game to demonslrate the proposed method.

Keywords Game theory, Genetic Algorithm, Prisoner Dilemma, No Fear of Deviation.

Introduction Automated negotiation is an important mechanism in multiple agents system, because agents are autonomous, proactive, and self-interested, they often have conflicts. They do not often have an agent sitting on the top of other agents to resolve the conflicts for them. Agents need to resolve their conflicts on their own through negotiation. Negotiation can be viewed as a distributed search through a spscc of potential agreements. The structure of the negotiation object forms the dimensionality and topology of the searching space [10]. Two components are important when designing an

Pennission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage end that ©epics bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers "or to redistribute to lists, requires prior specific permission and/or ~ fee

Peter Smith School of Computing and Engineering Sunderland University, UK (44) 191 515 2761 peter.smith@ sunderland.ac.uk

automated negotiation system: negotiation protocol and negotiation strategies. The protocol defines the rules under which the interaction between the agents should take place such as what deals can be made and what sequences of offers are allowed [4]. A slrategy for a player is a complete plan o f action- it specifies a feasible action for the player in every contingency in which the player might be called upon to act [2]. However, finding highly effective negotiation strategies and protocols are not trivial tasks. A number of researchers [6] adopted evolutionary approaches such as Genetic Algorithms (GAs) to find effective negotiation strategies through learning. However, the main barrier associated with it is the difficulty of locating a global optimised solution, so a sub-optimal deal is often reazhed. The game theory negotiation approach also provides promising solutions to the automated negotiation. The main criticism of the game theory negotiation is that the complete payoff information, such as serategies and the values in the payoffmalrix, cannot be available during negotiation in most practical application domains. This hinders the dissemination of automated negotiation techniques to the practice and hence to agents. The aim ofthis research is to design a negotiation mechanism that can be used in practical application domains and improve the effectiveness of negotiation, in order to facilitate multiple agents in resolution of conflicts in the decision-malting process. The proposed method adopts the GA to test, mutate, and op~nise the strategies based on their dynamic performance against all the other strategies in the other agent's population. GA is used to generate a set of higldy opfimised strategies that consistently have high payoffs in the conflicts. The agents use these highly optimised strategies to play with each other to generate the payoff matrix. The game theory apptoneh reasons on the payoff matrix to find the optimised point that is either the same or better than the one generated in the co-evolutionary approach. As a result, the system produces the best-fit negotiation strategies, which is a stabilised equilibrium following the concept of the Nash equilibrium and Pareto efficiency.

2. Game theory and Co-Evolutionary

Model

SAC2002 ,Madrid, Spain. Copyright 2002 ACM 1-58113-,445-2/02/03.._$5.00.

The central work of automated negotiation was derived from game theories. Game theory models have provided great insights

63

limitations. One is the models often select outcomes (deals) that are sub-optimal; this is because they adopt an approximate notion o f rationality and they do not examine the full space o f possible outcomes; another is the models need extensive evaluation, typically through simulations and empirical analysis, since it is usually impossible t o predict precisely how the system and the 'constituent agents will behave in a wide variety o f circumstances [10]. The other barrier for applying co-evolutionary approach to solve difficult games is built upon s known Nash equilibrium, and then refines it to a better one. This is not a practical assumption for most real world applications.

into competitive decision-making, so the study o f bargaining and negotiation has long attracted economists to solve finance and economic problems. Negotiation in deal making mechanisms, using game theory, is a way to coordinate rational agents when they encounter conflicts. The game can be analysed according to the Nash Equilibrium concept that obtains when both players have no longer incentive to deviate from their strategies [12]. Thus, Nash Equilibrium solution is the most popular solution to the bargaining problem [1 l]. Rosensebein & Zlotkin [13] focused on designing negotiation mechanisms for various applications by introducing protocols o f interactions with certain desirable properties that can be designed for different domains. Kraus [7] adopted non-cooperative models for problems that involve time and resource restrictions in worth oriented domains.

3. Trusted Third Party Mediated Game In this section, we briefly describe a game theory negotiation approach, the Trusted Third Party (TIP), which could partially avoid the G A inadequacy. The T I T mediated game is extended from the traditional game by incorporating a negotiation mechanism. In the game, agents do not only reason on the game matrix but also attempt to find the equilibrium using given negotiation mechanisms [14]. The advantage o f this mechanism is to solve problems in difficult games where one negotiating agent does not know the other agent's payoffs (difficult games are those that have no Nash equilibrium and multiple Nash equilibria ).

There are a growing numbers o f researchers applying game theory to the automated negotiation [13]. This leads to the realisation o f these game theory models without, or with little, consideration o f computational and communication complexities that are important in practical applications. Furthermore, game theory models are not adequate to represent multiple issues [11]. The Nash Equilibrium solution for the co-operative or non-co-operative game is not necessary to be Pareto effÉcient, which is a strategy combination that increases the payoff o f one agent without decreasing the payoff o f another agent. The Pareto efficiency is one o f main attributes to assess agreement o f negotiation, which could be criteria for those solutions that increase the sum o f system wide utilities [13]. The game may have no equilibrium or more than one equilibria in pure strategy games. The mix o f strategies has to be used, so that the solution may be the sub-optimal. As a result, the game theory models have generated considerable debate as to their efficiency and usefulness in guiding the design o f automated negotiation mechanism for multi-agent systems [3]. However, the emergence o f evolutionary approaches to automated negotiation has provided an alternative approach to resolve conflicts in multiagent systems.

T I P [14] includes two communication actions and the corresponding negotiation protocol. Two communication actions provide a way to trade payoffs and therefore change the game matrix from a difficult game into an acceptable one. The general negotiation protocol is a loop o f making or accepting a suggestion or making a counter suggestion. The selfish and self-interested agents may be inclined to deviate from their commilments in order to maximise their payoffs. A No-Fear-of-Deviation (NFD) equilibrium [14] was proposed for the agents to bind their commilments and to avoid the deviation o f commiUnents. The Guarantee communication action is a way to prevent an agent from playing a strategy that will lead to a worse result for another agent. The Compensation communication action is used to persuade an agent to play a given strategy that can lead to a desirable state. The T I P is a domain independent agent who is not involved in the game matrix and who trusted by both negotiating agent. The function o f the T I P is to have the ability to perform guarantee and compensation communication actions, and to have the ability to observe behaviours o f the agents and detect if the agents keep to their commitments. The agents use guarantee and compensation communication actions to ensure finding an NFD equilibrium. Each agent under TTP negotiation mechanisms is allowed to make suggestions and counter-suggestions to rnaximise its own payoffs. This process will in the end, reach a stabilised state that is the NFD equilibrium and is Pareto-efficient.

Axelrod [1] was the pioneer using computational deductive evolutionary approaches, as opposed to geme theory, to analyse repeated prisoner dilemma problems, in order to explore its feasibility in resolving complex conflicts in the economic and management sciences. The two perspectives o f equilibrium are applied by evolutionary approaches to the repeated game. Firstly, the Nash equilibrium concept is used to determine the equilibrium outcome o f a strategic interaction in an evolutionarily manner. Secondly, the equilibrium for strategic players in evolutionary processes leading to a better equilibrium is a learning process. The "learned'" equilibriums am one attempt to "refine" the possible Nash equilibriums that result from the evolutionary approach. A comprehensive summary o f the rationale and techniques o f the "learning" approach to solving game equilibriums was elucidated in Fudenberg and Levine [5]. Furthermore, Oliver [9] introduced the concept o f thresholds as evaluation criteria to refine the strategies. Pcyman [10] proposed three complex families o f tactics such as time, resource and imitation to generate cffeetive negotiation strategics through evolutionary processes. The evolutionary approach has been anticipated as a promising negotiation technology to auctions, and the next generation o f e-commerce products.

4. A Proposed Automated Negotiation Mechanism The aim o f the system is to seek highly effective strategies, which cart obtain high payoffs for participating agents through the acts o f negotiation and communication. The proposed system consists o f two main components: a Genetic Algorithm and a N F D equilibrium Algorithm (see Figure 1). Negotialiun actions are carried out based upon a set o f action plan defined in the strategy. The sl~ategy in this case is a set of selected tactics and actions for

However, the evolutionary approach is not faultless. The application o f G A to automated negotiation still has three

64

I

,-~

I

i

-- #

J, I I

.......

I

[

=

I = =

1

:: . :"..:: -....."...:L.T:_ ...'.:~ ... .:::'..."i :: _i ..:.:....:. ::. ~.: :~..:. i:./: ..:. '//. :.:.:.:: ..:....T..:. '.J, i . ....:: " :...:. .:... ..~..:.,.:.

..........

l

..............

i

..........

Figure I Proposal automated negotiation mechanism payoffs. The payoff matrix may have none, one or multiple lqash equilibria. If there is a unique Nash equilibrium, it may not be a Pare/fo-efficient point. In order to overcome this problem, we use the NFD equilibrium algorithm to find a new Nash equilibrium, which is also a Pareto-efficient point based on the payoff matrix.

searching potential agreements. Each agent could have a number of tactics, for example, which are always playing Co-operation, always playing Defect, playing Tit-for-Tat (TT), and playing Reverse Tit-for-Tat (RTI') in the prisoner's dilemma. The proposed negotiation protocol includes a number o f rules that the agents must follow. Firstly, the actual deal cannot be made in the co-evolutionary process, because at this stage the system tries to fund the high payoffs for both parties, by exploring the scarch space. Secondly, the issues they are negotiating must be same throughout the negotiation process. Thirdly, one agent makes an offer and thc other agent must make a counter offer, agree, or reject to the offer. Fourthly, we also adopted compensation and guarantee communication actions that allow agents to re-allocate the payoffs in the matrix in order to find a stabilised equilibrium. Finally, the mutually acceptable agreements can only be made after finding a new equilibrium and corresponding strategies.

I f the agents agree the target payoffs produced from the NFD equilibrium algorithm, it is still necessary to determine the actual strategies to achieve these payoffs. To discover these strategies the GA process used before is reapplied, but in this case the fitness function is based on the NFD equilibrium. Thus, the fitness function is re-defined by comparing the utility associated with the deal, with that associated with the NFD point. The greater positive difference between the ~ t payoff and the NFD, the more successful the agent's behaviour. The GA may not be able to find a strategy that produces the exact payoff in the NFD equilibrium, but one is close. If the NFD equilibrium is not satisfactory for the agents, both return to co-evolutionary phase, as shown in Figure 1. The GA still uses the result of NFD in calculating its illness function. This iteration will not continue indefinitely since they will be limits on time and resources. The system stops when one o f two conditions is met- an agreement is reached; or either or both agents reach the limit for l~ne or resources. The way the agents represents its resources and specifies the limits is not discussed in this paper.

The method for selecting parents for the next generation in GA process is the tournament. In other words, the weakest strategies are removed from the population and the best strategies have a better chance of remaining in the population for the next genc~atiun. After a number of iterations of these evolutionary processes, each agent selects the most effective strategies from its populations. At this point, the selected strategies have displayed outstanding average performance in playing all strategies of the othe= agent. Thus, it does not represent the performance on the basis o f one-in-one, and it is necessary to select representative strategies ~om one agent to play with each strategy from the other in order to form an actual payoffmatrix.

In order to explore fully the search space, there is no limitation to say that values may only be increased or decreased. If only one issue is to be changed, the change can only occur in one direction, otherwise this could result in the agents accepting an offer that is worse than the previous offer, or rejecting an offer that is better than the p~vious one. In order to avoid this problem, the coevolution stage in the proposed method is considered as an intermediate process, so no agreement or disagreement can be made at this stage. The agents based on the payoffs in the NFD

The selected strategies are numbered uniquely by starting with a letter artd then a sequential number such as Ax strategy where O