First-Person-Shooter Games

10 downloads 0 Views 3MB Size Report
PLAYING FIRST-PERSON-SHOOTER GAMES. WITH. MACHINE LEARNING TECHNIQUES. AND MEHODS. Adil khan. Game Artificial Intelligence Lab.
PLAYING FIRST-PERSON-SHOOTER GAMES WITH MACHINE LEARNING TECHNIQUES AND MEHODS Adil khan Game Artificial Intelligence Lab School of Computer Science and Technology, Harbin Institute of Technology (HIT), Harbin, Heilongjiang 150001, PR China Higher Education Pakistan [email protected] ORCD: 0000-0003-2862-5718 © Adil Khan 2018

1

AGENDA • Introduction. • Motivation.

• Related Works. • Contributions.

• Main Research Contents. • Conclusions.

© Adil Khan 2018

2

Introduction • Define Game-AI. • Types of Computer Video Games • First-Person-Shooter Games

© Adil Khan 2018

3

Defining Game-AI Expectations and Reality • AI is an enormous topic, so don't expect to come away from this unit an expert • But one will have learned the skills necessary to create entertaining and challenging AI for the majority of action game genres • One will have a sound understanding of the key areas of game AI, providing a solid base for any further learning one undertake

© Adil Khan 2018

4

Defining Game AI Academic AI •

Academic research is split into two camps: strong AI and weak AI. – Strong AI concerns itself with trying to create systems that mimic human

thought processes (Currently researching in this camp) – Weak AI (more popular nowadays) with applying AI technologies to the solution of real-world problems

© Adil Khan 2018

5

Defining Game AI • Game AI programmers, on the other hand, have to work with limited resources • The amount of processor cycles and memory available varies from platform to platform. • Compromises often have to be made in order to get an acceptable level of performance. • In addition, successful games — the ones making all the money — do one

thing very well: They entertain the player •

After all, most players will quickly become frustrated and despondent with an AI that always gives them a whip pin. To be enjoyable, an AI must put up a good fight but lose more often than win. It must make the player feel clever, sly, cunning, and powerful. It must make the player jump from his seat shouting. © Adil Khan 2018

6

Opponents and Allies - AI • The Acronym AI is often used to

refer to – Enemy AI: Monsters – Friendly AI: AI, Parties

– NPCs (Non-Player Characters) – Bots

© Adil Khan 2018

7

AI for Computer Games • Script-Based AI

• Finite State Machines • Searching and Finding Path • Behavior Trees • Fuzzy Logic • Genetic Algorithms

• Neural Networks • Others

© Adil Khan 2018

8

Types of Computer Video Games • Action Games • Adventure Games • Action-Adventure Games • First-Person-Shooter Games • Role Playing Games • Strategy Games

• Sports Games • Turn-based Games

© Adil Khan 2018

9

Research Directions in Game AI •

Non-Player Character AI



Machine Learning



Decision Making and Control



AI Elsewhere in Games adapt AI techniques from other domains and apply them into games



Interactive Storytelling



Cooperative Behaviors



Player control



Content creation

© Adil Khan 2018

10

First-Person-Shooter Games •

In general First-person-shooter games are considered Fast-paced games



A first person shooter (FPS) is a genre of action video game that is played from first person perspective



Best selling games for console (Halo, Call of duty, Battlefield, etc…)

© Adil Khan 2018

11

First-Person-Shooter Games

© Adil Khan 2018

12

Motivation • Why Game-AI? • Define the problems. • Why DOOM Game?

© Adil Khan 2018

13

The Value of Games? Purpose and Significance

• Games are stimulations for the mind • Or rather, games are food for the brain • We task a variety of cognitive skills:

• Lateral thinking • Long-term deliberation • Reactions • Games are a challenge for people and therefore a good task for AI • We use games as benchmarks to test our new ideas • Different games carry unique properties that make them interesting: • Determinism • Accessibility of information • Agent Driven system © Adil Khan 2018

14

Why Game AI? Purpose and Significance

• To investigate how to improve existing AI Techniques • To create new types of AI systems within the confines of games • To try to improve existing game design ideas • Introducing AI algorithms to ‘play’ video games • Often we replace the human player with an AI that is interfaced to the in-game

controls • The system either learns or adapts to the game so it can score maximum points

© Adil Khan 2018

15

Why Game AI? Purpose and Significance

• Research in artificial intelligence in videogames (AI) is becoming increasingly popular as we realize the advantages of using games as simulation test-beds • Computer games are the perfect platform upon which to pursue research into human level AI

• Game AI is one of the current Active research area of Google Deep Mind • 3D realistic environments © Adil Khan 2018

16

Why Game AI? Purpose and Significance

• Increase in computing power (GPU’s)

• Advancements in machine learning, specifically, visual learning • Evolution of neural networks • Helps in training robots how to navigate human environments

© Adil Khan 2018

17

Why Game AI? Atari 2600 games • Atari 2600 games are widely adopted as a benchmark for Visual Learning Algorithms • Atari 2600 games are employed using reinforcement learning to obtain human-level controllers from raw pixel data. • Light weight and easy to use platform for testing AI techniques • Very Active Testbeds in recent past for evaluating AI techniques • A new research era from pixels to actions • Involved 2D environments • involving 3D environments are introducing now gradually © Adil Khan 2018

18

Define the problems? Atari 2600 games

(Atari arcade, 2600), A classic First-PersonShooter

• Learning AI agents to build general purpose smart machines • Despite of benchmark for visual learning Algorithms • Have several drawbacks from the AI research perspective • Involve only 2D environments • Fully observable to the agents • The environments hardly resemble the real world we live in • They are third-person-perspective games • That does not match the real-world mobile-robot scenario

© Adil Khan 2018

19

Define the problems? Atari 2600 games • Human players are still ahead of bots trained from scratch • Deep reinforcement learning algorithms are ahead on average • More challenging reinforcement learning problems • Should involve first-person Perspective and realistic 3D worlds as well (Atari arcade) 1983 (Atari 2600), A classic First-PersonShooter

• Using RL, in Atari games, agents acted upon high-level information and handcrafted features

• like position of walls, enemies, locations of items, etc.. • That are usually inaccessible to human players

© Adil Khan 2018

20

Define the problems? Why DOOM (VizDoom) Game? • Uses existing game engine that saves time and large amount of programming effort • Can work as a base for my research • Portability and the ability to run multiple instances on a single machine

(VizDoom) • Quick and fast ( the game engine is not a learning bottleneck) • Total control over the game’s Processing • Customizable resolution and rendering parameters • Multiplayer games capabilities ( agent vs agent and agent vs human) • Easy to use tools to create customs scenarios • Ability to bind different programming languages (preferably written in C++) • Multi-platform © Adil Khan 2018

21

Define the problems? Why DOOM Game? •

Involve 3D environment, more real-world-like than Atari 2600 games



The environment resembles the real world we live in



Not a third person perspective game



Matches a real-world mobile-robot scenario



Human Players are ahead of bots trained from scratch and deep reinforcement learning algorithms are ahead on average so, – There is a need for challenging RL problems involving first-person Perspective and realistic 3D world



A unique and dedicated machine learning platform ‘VizDoom’ based on DOOM



For research from raw visual information

© Adil Khan 2018

22

Define the problems? Why DOOM Game? •

Allows developing bots that play DOOM using only the screen buffer



Customization capabilities of VizDoom



Custom scenario(s) that differ by maps, environment elements, non-player characters, rewards, goals, and actions available to the agent



Lightweight, games can be played at 7000 frames per second on modern

powerful machines •

The real time in Doom involves 35 fps



Unreal Tournament, Counter-Strike and Quak III Arena have already been

used in AI research, and now it should be DOOM •

Involves partially observable states

© Adil Khan 2018

23

Solution and Applications First-Person-Shooter Games

• First person shooters are some of the most popular games on the market today • known for their combative nature and fast-paced action

• Creating agents with human-like behavior is one way to use videogames to research AI techniques. • We can then transfer these insights to the fields of physical robotics and reallife simulations in things

© Adil Khan 2018

24

Solution and Applications First-Person-Shooter Games

• such as military training simulations • Researching the creation of interesting behaviors such as combat strategies

• Teaching robots how to navigate human environments • Which may help the videogames industry to develop more realistic and entertaining characters to play against

© Adil Khan 2018

25

Solution and Applications Machine Learning Approaches



In researching Game AI, I decided to use machine-learning algorithms to get Agents (bots) to learn how to play an FPS.



We will use reinforcement learning (RL), which allows a bot to learn a problem by interacting with its environment



Raw visual information may relieve researchers of the burden of acting upon highlevel information and handcrafted features



like position of walls, enemies, locations of items, etc.



The environment provides a reward or penalty based on how the bot is performing.



These values (reward or penalty) are used to build a map telling the bot which

action is good to perform in the current state of the environment. •

The bot receives a reward if it collects an item or kills an enemy. If the bot dies, it gets a penalty. © Adil Khan 2018

26

Related Works • Deep Q-Network.

• Deep Recurrent Q-Network • A3C Model

© Adil Khan 2018

27

DQN (Deep Q-Network) •

DQN is able to combine reinforcement learning with a class of artificial neural network known as deep neural networks



DQN overcomes unstable learning by mainly 4 techniques • Experience Replay • Target Network • Clipping Rewards • Skipping Frames

© Adil Khan 2018

28

EXPERIENCE REPLAY •

Experience Replay is originally proposed in reinforcement learning for Robots using neural network in 1993



DNN is easily overfitting current episodes. Once DNN is over fitted, it’s hard to produce various experiences. To solve this problem,



Experience Replay stores experiences including state transitions, rewards and

actions, which are necessary data to perform Q learning, and makes minibatches to update neural networks •

This technique expects the following merits. – reduces correlation between experiences in updating DNN – increases learning speed with mini-batches – reuses past transitions to avoid catastrophic forgetting

© Adil Khan 2018

29

Target Network • In Temporal Difference TD error calculation

• Target function is changed frequently with DNN • Unstable target function makes training difficult • So Target Network technique fixes parameters of target function

and replaces them with the latest network every thousands steps

• target Q function in the red rectangular is fixed © Adil Khan 2018

30

Clipping Rewards •

Each game has different score scales. For example, in Pong, players can get 1 point when wining the play



Otherwise, players get -1 point



However, in SpaceInvaders, players get 10~30 points when defeating invaders



This difference would make training unstable. Thus Clipping Rewards technique clips scores, which all positive rewards are set +1 and all negative rewards are set -1.

© Adil Khan 2018

31

Skipping Frames •

ALE is capable of rendering 60 frames\images per second.



But actually people don’t take actions so much in a second. AI doesn’t need to calculate Q values every frame.



So In Skipping Frames technique DQN calculates Q values every 4 frames and use past 4 frames as inputs. This reduces computational cost and gathers more experiences.

© Adil Khan 2018

32

Performance •

All of these techniques enable DQN to achieve stable training



DQN overwhelms naive DQN



In Nature version, it shows how much Experience Replay and Target Network contribute to stability.

------------(Linear Function approximator)

Performance with and without Experience Replay and Target Network



Experience Replay is very important in DQN. Target Network also increases its performance. © Adil Khan 2018

33

DQN Continued…….. •

DQN has achieved human-level control in many of Atari games with these 4 techniques.



However there are still some games DQN cannot play. I will introduce some papers that struggle with them later.

© Adil Khan 2018

34

Deep Q-Network (DQN)

List of Hyper parameters and their values © Adil Khan 2018

35

Nature Paper 2015

DQN in Atari, Model Architecture © Adil Khan 2018

36

Nature Paper 2015

© Adil Khan 2018

37

© Adil Khan 2018

38

DQN (Deep Q-Network) Average reward over time for DQN agent in limited GridWorld

© Adil Khan 2018

39

DRQN (Deep Recurrent Q-Network) Average reward over time for DRQN agent in limited GridWorld.

© Adil Khan 2018

40

DRQN (Deep Recurrent Q-Network) •

DQN performs well on Atari games in fully observable environments



In partial observability scenarios have incomplete and noisy observation



Adding an LSTM after the convolutional layers would help the Q-network

retain some memory of previous observations, enabling better decision making. •

So, The DQN architecture is minimally modified in such a way that the first fully-connected layer is replaced with a recurrent LSTM layer of the same size where Parameters are learned from scratch.

© Adil Khan 2018

41

DRQN (Deep Recurrent Q-Network) •

Architecture of DRQN

© Adil Khan 2018

42

DRQN (Deep Recurrent Q-Network) •

Updates: Backpropagation through a recurrent network requires each backward pass to have many timesteps of game screens and target values



DRQN performs well at Atari games with just one input frame per time step



Having just one input frame makes it impossible for the convolutional layers to estimate essential quantities like the velocity of the ball in Pong, and the LSTM layer compensates with its hidden state



DRQN is more robust against missing information during test time if it was trained with full state or partial information

© Adil Khan 2018

43

A3C Algorithm •

The A3C algorithm was released by Google’s DeepMind, and it essentially almost obsoleted DQN



It is faster, simpler, more robust, and able to achieve much better scores on the standard battery of Deep RL tasks



On top of all that it could work in continuous as well as discrete action spaces



It has become the go-to Deep RL algorithm for new challenging problems with complex state and action spaces



I am starting by unpacking the name, and from there, begin to unpack the mechanics of the algorithm itself.

© Adil Khan 2018

44

Asynchronous •

Unlike DQN, where a single agent represented by a single neural network interacts with a single environment



A3C utilizes multiple incarnations of the above in order to learn more efficiently, in other words utilizes multiple agents to collectively improve a policy



In A3C there is a global network, and multiple worker agents which each have their own set of network parameters



Each of these agents interacts with it’s own copy of the environment at the same time as the other agents are interacting with their environments



The reason this works better than having a single agent, is that the experience of each agent is independent of the experience of the others



In this way the overall experience available for training becomes more diverse. © Adil Khan 2018

45

Diagram of A3C high-level architecture •

In A3C there is a global network

© Adil Khan 2018

46

Actor-Critic •

Actor-Critic combines the benefits of both approaches: value-iteration methods such as Q-learning, or policy-iteration methods such as Policy Gradient.



In the case of A3C, the network estimates both a value function V(s) (how good a certain state is to be in) and a policy π(s) (a set of action probability outputs).



These will each be separate fully-connected layers sitting at the top of the network



Critically, the agent uses the value estimate (the critic) to update the policy (the actor) more intelligently than traditional policy gradient methods.

© Adil Khan 2018

47

Advantage •

If we think back of our implementation of Policy Gradient, the update rule used the discounted returns from a set of experiences in order to tell the agent which of its actions were “good” and which were “bad



The insight of using advantage estimates rather than just discounted returns is to allow the agent to determine not just how good its actions were,



but how much better they turned out to be than expected Intuitively, this allows the algorithm to focus on where the network’s predictions were lacking



The advantage function is as follow: Advantage: A = Q(s ,a) - V(s)



The discounted returns (R) as an estimate of Q(s , a) can be used to allow to generate an estimate of the advantage. © Adil Khan 2018

48

A3C Model

© Adil Khan 2018

49

Contributions • Current studies and our experimental results

© Adil Khan 2018

50

CONTRIBUTIONS

1 The SAI UK

2 The SAI UK

3

4 IEEE 2018

Title: Training an Agent for FPS Doom Game using Visual Reinforcement Learning and VizDoom Journal: International Journal of Advanced Computer Science and Applications(IJACSA) Time&Place: December, 2017, UK. Status: (Published) Title: State-of-the-Art and Open Challenges in RTS Game-AI and Starcraft Journal: International Journal of Advanced Computer Science and Applications(IJACSA Time&Place: December, 2017, UK Status: (Published) Title: A competitive Combat Strategy and tactics in RTS Game-AI and Starcraft Conference: The 2017 PCM Conference on Multimedia, Springer Time&Place: September, 2017, UK Status: (Published) Title: A Standard Skipcount Range for Training Agents using Deep Reinforcement Learning and VizDoom Journal: IEEE TRANSACTIONS On GAMES Time&Place: USA, NY, Jun, 2018. © Adil Khan 2018

51

1.

Training an Agent for FPS Doom Game using Visual Reinforcement Learning and VizDoom

• One of the emerging research area in Artificial Intelligence and getting focus more day by day • The primary purpose of the experiments are to train a competitive agent



using visual reinforcement learning and ‘VizDoom’ for first-person shooter games,

• particularly ‘Doom’ to exhibit human-like behaviors and to outperform average human players and existing in-built game agents.

© Adil Khan 2018

52

(Cont.)

1. Training an Agent for FPS Doom Game using Visual Reinforcement Learning and VizDoom Scenario • A rectangular chamber is used as a basic scenario • wherein the center of the room's long wall an agent will spawn. Along the opposite wall, an immobile monster will spawn at arbitrary positions. • The agent will move towards the left side, right side and will shoot as well. • A single shot will be sufficient to eradicate the monster. • The episode will finish once the 300 frames will be completed or the monster will either be killed, whichever comes first. • For killing the monster, the agent will achieve 101 points, -5 for missing the shot and -1 for each individual action. The best practice for the agent to learn killing the monster will be to kill as rapidly as possible preferably with a single shot. © Adil Khan 2018

53

(Cont.)

1. Training an Agent for FPS Doom Game using Visual Reinforcement Learning and VizDoom Deep Q-Learning • Markov Decision Process‟ is used to model the problem

• and Q-learning to learn the policy •

An ᵋ-greedy policy with linear decay will be used for selecting an action

• The Q-function will be approximated with the convolutional neural network by training it with ‘Stochastic Gradient Decent’ using experience replay

© Adil Khan 2018

54

(Cont.)

1. Training an Agent for FPS Doom Game using Visual Reinforcement Learning and VizDoom Neural Network Architecture

• The network that will be used in the experiments include two convolutional layers with 32 square filters, 7 and 4 pixels wide, respectively, • A max-pooling layer will follow each convolutional layer with a max pooling of size 2 and Relu (Rectified Linear Unit) function for activation. • The network contains a fully connected layer with 800 leaky rectified linear units and an output layer with 8 linear units conforming to the 8 combinations of the 3 offered actions i.e. right, left and shoot.

© Adil Khan 2018

55

(Cont.)

1. Training an Agent for FPS Doom Game using Visual Reinforcement Learning and VizDoom Environment for Experiments

• The experiments are performed performed in ‘Pycharm professional 2017’ version •

using ViZDoom 1.1.1, OpenCV 3.3, CMake 2.8+, Make, GCC 4.6+, Boost libraries 1.54+, and Python 3.5 (64-bit) with Numpy

• on an Ubuntu 16.04.3 installed computer with Intel® Core™ i7-7700 CPU @3.60 GHz x 8 and NVIDIA GeForce GTX 1080/PCIe/SSE2 GPU

• for processing CNN‟s, the whole learning process along with the testing episodes will be calculated in time.

© Adil Khan 2018

56

2. A Standard Skipcounts Range for Training Agents

using Deep Reinforcement Learning and VizDoom • Research Question •

What should be the optimal skipcount in order to develop a well-trained and robust agent?



Learning is the slowest when the agent does not skip any frame and



Learning is the faster and smoother when the agent skips more frames



The primary purpose of the research will be to examine how the number of skipcounts influences the learning process



and to find a standard and optimized skipcounts range (scale) that can provide a balance or tradeoff between



The learning speed and the final performance using VizDoom AI platform



But conversely, too large skipcounts make the agent graceless due to the lack of balance control that results in suboptimal concluding results. © Adil Khan 2018

57

Cont..

2. A Standard Skipcounts Range for Training Agents

using Deep Reinforcement Learning and VizDoom •

PROPOSED METHOD



A rectangular chamber will be considered as a basic scenario



where an agent will spawn in the center of the room’s long wall



A static monster will spawn at arbitrary positions along the opposite wall



The agent will move towards the left, right and will shoot



A single shot is sufficient to massacre the monster



The scenario or episode will finish either by killing the monster or with the completion of 300 frames,



The agent will get a score of 101 if it will kill the monster otherwise



Will score -5 for a miss hit and will score -1 for each action © Adil Khan 2018

58

Cont..

2. A Standard Skipcounts Range for Training Agents

using Deep Reinforcement Learning and VizDoom  PROPOSED METHOD o Neural Network Architecture

 A convolutional neural network (CNN) architecture of three convolutional layers with 32 square filters, 7, 4 and 2 pixels wide will be used respectively  Each convolutional layer is trailed by a max-pooling layer with max pooling of size 2 and Relu function for activation

 there is a fully connected layer with 800 leaky rectified linear units  and an output layer with 8 linear units corresponding to the  8 combinations of the 3 available actions i.e. Left, right, and shoot.

© Adil Khan 2018

59

Cont..

2. A Standard Skipcounts Range for Training Agents

using Deep Reinforcement Learning and VizDoom •

Deep Q-Learning



A method of Deep reinforcement learning will be used to learn the policy



In order to experiment, the problem will be modeled as a Markov Decision Process (MDP)



An 𝜖−greedy policy will be used to select an action with a linear decay 𝜖



The convolutional neural network will be used to approximate the Q-function trained with ‘Stochastic Gradient Decent’



Besides, a reply memory will be used to store the game transitions.

© Adil Khan 2018

60

Cont..

2. A Standard Skipcounts Range for Training Agents

using Deep Reinforcement Learning and VizDoom •

Environment for Experiments



All of the experiments will be performed in Pycharm 2017 professional version



Using ViZDoom 1.1.5, OpenCV 3.3, CMake 2.8+, GCC 4.6+, and Python 3.6 (64bit) with Numpy on an Ubuntu 16.04.3 installed computer with Intel® Core™ i7-

7700 CPU @3. 60 GHz x 8 and NVIDIA GeForce GTX 1080/PCIe/SSE2 GPU for processing CNN ’s, •

The whole learning and testing process is calculated in time

© Adil Khan 2018

61

Expected Innovation • Research plan and future works.

© Adil Khan 2018

62

Expected Innovation and Future Works

• • • • •

• • •



New state-of-the-art machine learning techniques, methods, and algorithms will be proposed Bots are now deaf for some scenarios so we plan to allow bots to access the sound buffer in almost all scenarios well balanced, trained and robust agents for playing FPS games particularly DOOM will be introduced or created The agents then could be hired and introduced by Computer Game industries and companies in the market. A contribution to the game industry Improved game playing skills would be offered in the market for gamers to play with and against AI agents The same techniques, methods, and algorithms could be used to train agents for other arbitrary games and one day hopefully on many valuable real-world control problems We would like to implement a synchronous multiplayer mode which would be convenient for self-learning in multiplayer settings. We are interested to conduct supervised learning experiments as well if VizDoom automatically labeled objects in the scene. Our research work will produce contributing to better AI practices. © Adil Khan 2018

63

Research Schedule • Expected schedule.

© Adil Khan 2018

64

EXPECTED SCHEDULE

2016.9 – 2017.2: Game AI courses 2017.2 – 2017.4: Preparing and planning the research and reading articles 2017.4 – 2017.8: Machine Learning Methods & Techniques in General Computer Games.

2017.9 – 2017.11: Machine Learning Methods & Techniques in RTS Games.

© Adil Khan 2018

65

EXPECTED SCHEDULE (cont.) 2017.12 – 2018.2: Machine Learning Methods & Techniques in RPG Action Games. 2018.3 – 2018.5: Machine Learning Methods & Techniques in FPS Games. 2018.6 – 2018.8: Proposing methods and algorithms to play FPS games with Deep Reinforcement Learning. • Training an agent for FPS games e.g. DOOM 2018.9 – 2018.11: Proposing an architecture of Neural Networks with Q-learning • A Standard Skipcounts scale for Training Agents using Deep Reinforcement Learning and VizDoom 2018.12 – 2019.3: Finally, Publishing…………. © Adil Khan 2018

66

Conclusions • Concludes this presentation • Thanks

© Adil Khan 2018

67

Author Profile Adil Khan received C.T. from AIOU Islamabad, B.Ed. from the University of Peshawar, BS Honors in Computer Science from Edwards college Peshawar, M.S in Computer Science from City University of Science and Information Technology Peshawar, Pakistan. In 2014-2016, he was a lecturer in Higher Education Department KPK, Pakistan. He has published many articles in toptier academic journals and conferences including IEEE. Currently, He is serving as a researcher at the School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001 PR China. He is interested in Game Artificial Intelligence (Game AI). Twitter: Researchgate: LinkedIn: Facebook:

https://twitter.com/AdilAdil25 https://www.researchgate.net/profile/Adil_Khan13/ https://www.facebook.com/groups/wsnscholars/ https://www.facebook.com/groups/wsnscholars/