PLAYING FIRST-PERSON-SHOOTER GAMES WITH MACHINE LEARNING TECHNIQUES AND MEHODS Adil khan Game Artificial Intelligence Lab School of Computer Science and Technology, Harbin Institute of Technology (HIT), Harbin, Heilongjiang 150001, PR China Higher Education Pakistan
[email protected] ORCD: 0000-0003-2862-5718 © Adil Khan 2018
1
AGENDA • Introduction. • Motivation.
• Related Works. • Contributions.
• Main Research Contents. • Conclusions.
© Adil Khan 2018
2
Introduction • Define Game-AI. • Types of Computer Video Games • First-Person-Shooter Games
© Adil Khan 2018
3
Defining Game-AI Expectations and Reality • AI is an enormous topic, so don't expect to come away from this unit an expert • But one will have learned the skills necessary to create entertaining and challenging AI for the majority of action game genres • One will have a sound understanding of the key areas of game AI, providing a solid base for any further learning one undertake
© Adil Khan 2018
4
Defining Game AI Academic AI •
Academic research is split into two camps: strong AI and weak AI. – Strong AI concerns itself with trying to create systems that mimic human
thought processes (Currently researching in this camp) – Weak AI (more popular nowadays) with applying AI technologies to the solution of real-world problems
© Adil Khan 2018
5
Defining Game AI • Game AI programmers, on the other hand, have to work with limited resources • The amount of processor cycles and memory available varies from platform to platform. • Compromises often have to be made in order to get an acceptable level of performance. • In addition, successful games — the ones making all the money — do one
thing very well: They entertain the player •
After all, most players will quickly become frustrated and despondent with an AI that always gives them a whip pin. To be enjoyable, an AI must put up a good fight but lose more often than win. It must make the player feel clever, sly, cunning, and powerful. It must make the player jump from his seat shouting. © Adil Khan 2018
6
Opponents and Allies - AI • The Acronym AI is often used to
refer to – Enemy AI: Monsters – Friendly AI: AI, Parties
– NPCs (Non-Player Characters) – Bots
© Adil Khan 2018
7
AI for Computer Games • Script-Based AI
• Finite State Machines • Searching and Finding Path • Behavior Trees • Fuzzy Logic • Genetic Algorithms
• Neural Networks • Others
© Adil Khan 2018
8
Types of Computer Video Games • Action Games • Adventure Games • Action-Adventure Games • First-Person-Shooter Games • Role Playing Games • Strategy Games
• Sports Games • Turn-based Games
© Adil Khan 2018
9
Research Directions in Game AI •
Non-Player Character AI
•
Machine Learning
•
Decision Making and Control
•
AI Elsewhere in Games adapt AI techniques from other domains and apply them into games
•
Interactive Storytelling
•
Cooperative Behaviors
•
Player control
•
Content creation
© Adil Khan 2018
10
First-Person-Shooter Games •
In general First-person-shooter games are considered Fast-paced games
•
A first person shooter (FPS) is a genre of action video game that is played from first person perspective
•
Best selling games for console (Halo, Call of duty, Battlefield, etc…)
© Adil Khan 2018
11
First-Person-Shooter Games
© Adil Khan 2018
12
Motivation • Why Game-AI? • Define the problems. • Why DOOM Game?
© Adil Khan 2018
13
The Value of Games? Purpose and Significance
• Games are stimulations for the mind • Or rather, games are food for the brain • We task a variety of cognitive skills:
• Lateral thinking • Long-term deliberation • Reactions • Games are a challenge for people and therefore a good task for AI • We use games as benchmarks to test our new ideas • Different games carry unique properties that make them interesting: • Determinism • Accessibility of information • Agent Driven system © Adil Khan 2018
14
Why Game AI? Purpose and Significance
• To investigate how to improve existing AI Techniques • To create new types of AI systems within the confines of games • To try to improve existing game design ideas • Introducing AI algorithms to ‘play’ video games • Often we replace the human player with an AI that is interfaced to the in-game
controls • The system either learns or adapts to the game so it can score maximum points
© Adil Khan 2018
15
Why Game AI? Purpose and Significance
• Research in artificial intelligence in videogames (AI) is becoming increasingly popular as we realize the advantages of using games as simulation test-beds • Computer games are the perfect platform upon which to pursue research into human level AI
• Game AI is one of the current Active research area of Google Deep Mind • 3D realistic environments © Adil Khan 2018
16
Why Game AI? Purpose and Significance
• Increase in computing power (GPU’s)
• Advancements in machine learning, specifically, visual learning • Evolution of neural networks • Helps in training robots how to navigate human environments
© Adil Khan 2018
17
Why Game AI? Atari 2600 games • Atari 2600 games are widely adopted as a benchmark for Visual Learning Algorithms • Atari 2600 games are employed using reinforcement learning to obtain human-level controllers from raw pixel data. • Light weight and easy to use platform for testing AI techniques • Very Active Testbeds in recent past for evaluating AI techniques • A new research era from pixels to actions • Involved 2D environments • involving 3D environments are introducing now gradually © Adil Khan 2018
18
Define the problems? Atari 2600 games
(Atari arcade, 2600), A classic First-PersonShooter
• Learning AI agents to build general purpose smart machines • Despite of benchmark for visual learning Algorithms • Have several drawbacks from the AI research perspective • Involve only 2D environments • Fully observable to the agents • The environments hardly resemble the real world we live in • They are third-person-perspective games • That does not match the real-world mobile-robot scenario
© Adil Khan 2018
19
Define the problems? Atari 2600 games • Human players are still ahead of bots trained from scratch • Deep reinforcement learning algorithms are ahead on average • More challenging reinforcement learning problems • Should involve first-person Perspective and realistic 3D worlds as well (Atari arcade) 1983 (Atari 2600), A classic First-PersonShooter
• Using RL, in Atari games, agents acted upon high-level information and handcrafted features
• like position of walls, enemies, locations of items, etc.. • That are usually inaccessible to human players
© Adil Khan 2018
20
Define the problems? Why DOOM (VizDoom) Game? • Uses existing game engine that saves time and large amount of programming effort • Can work as a base for my research • Portability and the ability to run multiple instances on a single machine
(VizDoom) • Quick and fast ( the game engine is not a learning bottleneck) • Total control over the game’s Processing • Customizable resolution and rendering parameters • Multiplayer games capabilities ( agent vs agent and agent vs human) • Easy to use tools to create customs scenarios • Ability to bind different programming languages (preferably written in C++) • Multi-platform © Adil Khan 2018
21
Define the problems? Why DOOM Game? •
Involve 3D environment, more real-world-like than Atari 2600 games
•
The environment resembles the real world we live in
•
Not a third person perspective game
•
Matches a real-world mobile-robot scenario
•
Human Players are ahead of bots trained from scratch and deep reinforcement learning algorithms are ahead on average so, – There is a need for challenging RL problems involving first-person Perspective and realistic 3D world
•
A unique and dedicated machine learning platform ‘VizDoom’ based on DOOM
•
For research from raw visual information
© Adil Khan 2018
22
Define the problems? Why DOOM Game? •
Allows developing bots that play DOOM using only the screen buffer
•
Customization capabilities of VizDoom
•
Custom scenario(s) that differ by maps, environment elements, non-player characters, rewards, goals, and actions available to the agent
•
Lightweight, games can be played at 7000 frames per second on modern
powerful machines •
The real time in Doom involves 35 fps
•
Unreal Tournament, Counter-Strike and Quak III Arena have already been
used in AI research, and now it should be DOOM •
Involves partially observable states
© Adil Khan 2018
23
Solution and Applications First-Person-Shooter Games
• First person shooters are some of the most popular games on the market today • known for their combative nature and fast-paced action
• Creating agents with human-like behavior is one way to use videogames to research AI techniques. • We can then transfer these insights to the fields of physical robotics and reallife simulations in things
© Adil Khan 2018
24
Solution and Applications First-Person-Shooter Games
• such as military training simulations • Researching the creation of interesting behaviors such as combat strategies
• Teaching robots how to navigate human environments • Which may help the videogames industry to develop more realistic and entertaining characters to play against
© Adil Khan 2018
25
Solution and Applications Machine Learning Approaches
•
In researching Game AI, I decided to use machine-learning algorithms to get Agents (bots) to learn how to play an FPS.
•
We will use reinforcement learning (RL), which allows a bot to learn a problem by interacting with its environment
•
Raw visual information may relieve researchers of the burden of acting upon highlevel information and handcrafted features
•
like position of walls, enemies, locations of items, etc.
•
The environment provides a reward or penalty based on how the bot is performing.
•
These values (reward or penalty) are used to build a map telling the bot which
action is good to perform in the current state of the environment. •
The bot receives a reward if it collects an item or kills an enemy. If the bot dies, it gets a penalty. © Adil Khan 2018
26
Related Works • Deep Q-Network.
• Deep Recurrent Q-Network • A3C Model
© Adil Khan 2018
27
DQN (Deep Q-Network) •
DQN is able to combine reinforcement learning with a class of artificial neural network known as deep neural networks
•
DQN overcomes unstable learning by mainly 4 techniques • Experience Replay • Target Network • Clipping Rewards • Skipping Frames
© Adil Khan 2018
28
EXPERIENCE REPLAY •
Experience Replay is originally proposed in reinforcement learning for Robots using neural network in 1993
•
DNN is easily overfitting current episodes. Once DNN is over fitted, it’s hard to produce various experiences. To solve this problem,
•
Experience Replay stores experiences including state transitions, rewards and
actions, which are necessary data to perform Q learning, and makes minibatches to update neural networks •
This technique expects the following merits. – reduces correlation between experiences in updating DNN – increases learning speed with mini-batches – reuses past transitions to avoid catastrophic forgetting
© Adil Khan 2018
29
Target Network • In Temporal Difference TD error calculation
• Target function is changed frequently with DNN • Unstable target function makes training difficult • So Target Network technique fixes parameters of target function
and replaces them with the latest network every thousands steps
• target Q function in the red rectangular is fixed © Adil Khan 2018
30
Clipping Rewards •
Each game has different score scales. For example, in Pong, players can get 1 point when wining the play
•
Otherwise, players get -1 point
•
However, in SpaceInvaders, players get 10~30 points when defeating invaders
•
This difference would make training unstable. Thus Clipping Rewards technique clips scores, which all positive rewards are set +1 and all negative rewards are set -1.
© Adil Khan 2018
31
Skipping Frames •
ALE is capable of rendering 60 frames\images per second.
•
But actually people don’t take actions so much in a second. AI doesn’t need to calculate Q values every frame.
•
So In Skipping Frames technique DQN calculates Q values every 4 frames and use past 4 frames as inputs. This reduces computational cost and gathers more experiences.
© Adil Khan 2018
32
Performance •
All of these techniques enable DQN to achieve stable training
•
DQN overwhelms naive DQN
•
In Nature version, it shows how much Experience Replay and Target Network contribute to stability.
------------(Linear Function approximator)
Performance with and without Experience Replay and Target Network
•
Experience Replay is very important in DQN. Target Network also increases its performance. © Adil Khan 2018
33
DQN Continued…….. •
DQN has achieved human-level control in many of Atari games with these 4 techniques.
•
However there are still some games DQN cannot play. I will introduce some papers that struggle with them later.
© Adil Khan 2018
34
Deep Q-Network (DQN)
List of Hyper parameters and their values © Adil Khan 2018
35
Nature Paper 2015
DQN in Atari, Model Architecture © Adil Khan 2018
36
Nature Paper 2015
© Adil Khan 2018
37
© Adil Khan 2018
38
DQN (Deep Q-Network) Average reward over time for DQN agent in limited GridWorld
© Adil Khan 2018
39
DRQN (Deep Recurrent Q-Network) Average reward over time for DRQN agent in limited GridWorld.
© Adil Khan 2018
40
DRQN (Deep Recurrent Q-Network) •
DQN performs well on Atari games in fully observable environments
•
In partial observability scenarios have incomplete and noisy observation
•
Adding an LSTM after the convolutional layers would help the Q-network
retain some memory of previous observations, enabling better decision making. •
So, The DQN architecture is minimally modified in such a way that the first fully-connected layer is replaced with a recurrent LSTM layer of the same size where Parameters are learned from scratch.
© Adil Khan 2018
41
DRQN (Deep Recurrent Q-Network) •
Architecture of DRQN
© Adil Khan 2018
42
DRQN (Deep Recurrent Q-Network) •
Updates: Backpropagation through a recurrent network requires each backward pass to have many timesteps of game screens and target values
•
DRQN performs well at Atari games with just one input frame per time step
•
Having just one input frame makes it impossible for the convolutional layers to estimate essential quantities like the velocity of the ball in Pong, and the LSTM layer compensates with its hidden state
•
DRQN is more robust against missing information during test time if it was trained with full state or partial information
© Adil Khan 2018
43
A3C Algorithm •
The A3C algorithm was released by Google’s DeepMind, and it essentially almost obsoleted DQN
•
It is faster, simpler, more robust, and able to achieve much better scores on the standard battery of Deep RL tasks
•
On top of all that it could work in continuous as well as discrete action spaces
•
It has become the go-to Deep RL algorithm for new challenging problems with complex state and action spaces
•
I am starting by unpacking the name, and from there, begin to unpack the mechanics of the algorithm itself.
© Adil Khan 2018
44
Asynchronous •
Unlike DQN, where a single agent represented by a single neural network interacts with a single environment
•
A3C utilizes multiple incarnations of the above in order to learn more efficiently, in other words utilizes multiple agents to collectively improve a policy
•
In A3C there is a global network, and multiple worker agents which each have their own set of network parameters
•
Each of these agents interacts with it’s own copy of the environment at the same time as the other agents are interacting with their environments
•
The reason this works better than having a single agent, is that the experience of each agent is independent of the experience of the others
•
In this way the overall experience available for training becomes more diverse. © Adil Khan 2018
45
Diagram of A3C high-level architecture •
In A3C there is a global network
© Adil Khan 2018
46
Actor-Critic •
Actor-Critic combines the benefits of both approaches: value-iteration methods such as Q-learning, or policy-iteration methods such as Policy Gradient.
•
In the case of A3C, the network estimates both a value function V(s) (how good a certain state is to be in) and a policy π(s) (a set of action probability outputs).
•
These will each be separate fully-connected layers sitting at the top of the network
•
Critically, the agent uses the value estimate (the critic) to update the policy (the actor) more intelligently than traditional policy gradient methods.
© Adil Khan 2018
47
Advantage •
If we think back of our implementation of Policy Gradient, the update rule used the discounted returns from a set of experiences in order to tell the agent which of its actions were “good” and which were “bad
•
The insight of using advantage estimates rather than just discounted returns is to allow the agent to determine not just how good its actions were,
•
but how much better they turned out to be than expected Intuitively, this allows the algorithm to focus on where the network’s predictions were lacking
•
The advantage function is as follow: Advantage: A = Q(s ,a) - V(s)
•
The discounted returns (R) as an estimate of Q(s , a) can be used to allow to generate an estimate of the advantage. © Adil Khan 2018
48
A3C Model
© Adil Khan 2018
49
Contributions • Current studies and our experimental results
© Adil Khan 2018
50
CONTRIBUTIONS
1 The SAI UK
2 The SAI UK
3
4 IEEE 2018
Title: Training an Agent for FPS Doom Game using Visual Reinforcement Learning and VizDoom Journal: International Journal of Advanced Computer Science and Applications(IJACSA) Time&Place: December, 2017, UK. Status: (Published) Title: State-of-the-Art and Open Challenges in RTS Game-AI and Starcraft Journal: International Journal of Advanced Computer Science and Applications(IJACSA Time&Place: December, 2017, UK Status: (Published) Title: A competitive Combat Strategy and tactics in RTS Game-AI and Starcraft Conference: The 2017 PCM Conference on Multimedia, Springer Time&Place: September, 2017, UK Status: (Published) Title: A Standard Skipcount Range for Training Agents using Deep Reinforcement Learning and VizDoom Journal: IEEE TRANSACTIONS On GAMES Time&Place: USA, NY, Jun, 2018. © Adil Khan 2018
51
1.
Training an Agent for FPS Doom Game using Visual Reinforcement Learning and VizDoom
• One of the emerging research area in Artificial Intelligence and getting focus more day by day • The primary purpose of the experiments are to train a competitive agent
•
using visual reinforcement learning and ‘VizDoom’ for first-person shooter games,
• particularly ‘Doom’ to exhibit human-like behaviors and to outperform average human players and existing in-built game agents.
© Adil Khan 2018
52
(Cont.)
1. Training an Agent for FPS Doom Game using Visual Reinforcement Learning and VizDoom Scenario • A rectangular chamber is used as a basic scenario • wherein the center of the room's long wall an agent will spawn. Along the opposite wall, an immobile monster will spawn at arbitrary positions. • The agent will move towards the left side, right side and will shoot as well. • A single shot will be sufficient to eradicate the monster. • The episode will finish once the 300 frames will be completed or the monster will either be killed, whichever comes first. • For killing the monster, the agent will achieve 101 points, -5 for missing the shot and -1 for each individual action. The best practice for the agent to learn killing the monster will be to kill as rapidly as possible preferably with a single shot. © Adil Khan 2018
53
(Cont.)
1. Training an Agent for FPS Doom Game using Visual Reinforcement Learning and VizDoom Deep Q-Learning • Markov Decision Process‟ is used to model the problem
• and Q-learning to learn the policy •
An ᵋ-greedy policy with linear decay will be used for selecting an action
• The Q-function will be approximated with the convolutional neural network by training it with ‘Stochastic Gradient Decent’ using experience replay
© Adil Khan 2018
54
(Cont.)
1. Training an Agent for FPS Doom Game using Visual Reinforcement Learning and VizDoom Neural Network Architecture
• The network that will be used in the experiments include two convolutional layers with 32 square filters, 7 and 4 pixels wide, respectively, • A max-pooling layer will follow each convolutional layer with a max pooling of size 2 and Relu (Rectified Linear Unit) function for activation. • The network contains a fully connected layer with 800 leaky rectified linear units and an output layer with 8 linear units conforming to the 8 combinations of the 3 offered actions i.e. right, left and shoot.
© Adil Khan 2018
55
(Cont.)
1. Training an Agent for FPS Doom Game using Visual Reinforcement Learning and VizDoom Environment for Experiments
• The experiments are performed performed in ‘Pycharm professional 2017’ version •
using ViZDoom 1.1.1, OpenCV 3.3, CMake 2.8+, Make, GCC 4.6+, Boost libraries 1.54+, and Python 3.5 (64-bit) with Numpy
• on an Ubuntu 16.04.3 installed computer with Intel® Core™ i7-7700 CPU @3.60 GHz x 8 and NVIDIA GeForce GTX 1080/PCIe/SSE2 GPU
• for processing CNN‟s, the whole learning process along with the testing episodes will be calculated in time.
© Adil Khan 2018
56
2. A Standard Skipcounts Range for Training Agents
using Deep Reinforcement Learning and VizDoom • Research Question •
What should be the optimal skipcount in order to develop a well-trained and robust agent?
•
Learning is the slowest when the agent does not skip any frame and
•
Learning is the faster and smoother when the agent skips more frames
•
The primary purpose of the research will be to examine how the number of skipcounts influences the learning process
•
and to find a standard and optimized skipcounts range (scale) that can provide a balance or tradeoff between
•
The learning speed and the final performance using VizDoom AI platform
•
But conversely, too large skipcounts make the agent graceless due to the lack of balance control that results in suboptimal concluding results. © Adil Khan 2018
57
Cont..
2. A Standard Skipcounts Range for Training Agents
using Deep Reinforcement Learning and VizDoom •
PROPOSED METHOD
•
A rectangular chamber will be considered as a basic scenario
•
where an agent will spawn in the center of the room’s long wall
•
A static monster will spawn at arbitrary positions along the opposite wall
•
The agent will move towards the left, right and will shoot
•
A single shot is sufficient to massacre the monster
•
The scenario or episode will finish either by killing the monster or with the completion of 300 frames,
•
The agent will get a score of 101 if it will kill the monster otherwise
•
Will score -5 for a miss hit and will score -1 for each action © Adil Khan 2018
58
Cont..
2. A Standard Skipcounts Range for Training Agents
using Deep Reinforcement Learning and VizDoom PROPOSED METHOD o Neural Network Architecture
A convolutional neural network (CNN) architecture of three convolutional layers with 32 square filters, 7, 4 and 2 pixels wide will be used respectively Each convolutional layer is trailed by a max-pooling layer with max pooling of size 2 and Relu function for activation
there is a fully connected layer with 800 leaky rectified linear units and an output layer with 8 linear units corresponding to the 8 combinations of the 3 available actions i.e. Left, right, and shoot.
© Adil Khan 2018
59
Cont..
2. A Standard Skipcounts Range for Training Agents
using Deep Reinforcement Learning and VizDoom •
Deep Q-Learning
•
A method of Deep reinforcement learning will be used to learn the policy
•
In order to experiment, the problem will be modeled as a Markov Decision Process (MDP)
•
An 𝜖−greedy policy will be used to select an action with a linear decay 𝜖
•
The convolutional neural network will be used to approximate the Q-function trained with ‘Stochastic Gradient Decent’
•
Besides, a reply memory will be used to store the game transitions.
© Adil Khan 2018
60
Cont..
2. A Standard Skipcounts Range for Training Agents
using Deep Reinforcement Learning and VizDoom •
Environment for Experiments
•
All of the experiments will be performed in Pycharm 2017 professional version
•
Using ViZDoom 1.1.5, OpenCV 3.3, CMake 2.8+, GCC 4.6+, and Python 3.6 (64bit) with Numpy on an Ubuntu 16.04.3 installed computer with Intel® Core™ i7-
7700 CPU @3. 60 GHz x 8 and NVIDIA GeForce GTX 1080/PCIe/SSE2 GPU for processing CNN ’s, •
The whole learning and testing process is calculated in time
© Adil Khan 2018
61
Expected Innovation • Research plan and future works.
© Adil Khan 2018
62
Expected Innovation and Future Works
• • • • •
• • •
•
New state-of-the-art machine learning techniques, methods, and algorithms will be proposed Bots are now deaf for some scenarios so we plan to allow bots to access the sound buffer in almost all scenarios well balanced, trained and robust agents for playing FPS games particularly DOOM will be introduced or created The agents then could be hired and introduced by Computer Game industries and companies in the market. A contribution to the game industry Improved game playing skills would be offered in the market for gamers to play with and against AI agents The same techniques, methods, and algorithms could be used to train agents for other arbitrary games and one day hopefully on many valuable real-world control problems We would like to implement a synchronous multiplayer mode which would be convenient for self-learning in multiplayer settings. We are interested to conduct supervised learning experiments as well if VizDoom automatically labeled objects in the scene. Our research work will produce contributing to better AI practices. © Adil Khan 2018
63
Research Schedule • Expected schedule.
© Adil Khan 2018
64
EXPECTED SCHEDULE
2016.9 – 2017.2: Game AI courses 2017.2 – 2017.4: Preparing and planning the research and reading articles 2017.4 – 2017.8: Machine Learning Methods & Techniques in General Computer Games.
2017.9 – 2017.11: Machine Learning Methods & Techniques in RTS Games.
© Adil Khan 2018
65
EXPECTED SCHEDULE (cont.) 2017.12 – 2018.2: Machine Learning Methods & Techniques in RPG Action Games. 2018.3 – 2018.5: Machine Learning Methods & Techniques in FPS Games. 2018.6 – 2018.8: Proposing methods and algorithms to play FPS games with Deep Reinforcement Learning. • Training an agent for FPS games e.g. DOOM 2018.9 – 2018.11: Proposing an architecture of Neural Networks with Q-learning • A Standard Skipcounts scale for Training Agents using Deep Reinforcement Learning and VizDoom 2018.12 – 2019.3: Finally, Publishing…………. © Adil Khan 2018
66
Conclusions • Concludes this presentation • Thanks
© Adil Khan 2018
67
Author Profile Adil Khan received C.T. from AIOU Islamabad, B.Ed. from the University of Peshawar, BS Honors in Computer Science from Edwards college Peshawar, M.S in Computer Science from City University of Science and Information Technology Peshawar, Pakistan. In 2014-2016, he was a lecturer in Higher Education Department KPK, Pakistan. He has published many articles in toptier academic journals and conferences including IEEE. Currently, He is serving as a researcher at the School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001 PR China. He is interested in Game Artificial Intelligence (Game AI). Twitter: Researchgate: LinkedIn: Facebook:
https://twitter.com/AdilAdil25 https://www.researchgate.net/profile/Adil_Khan13/ https://www.facebook.com/groups/wsnscholars/ https://www.facebook.com/groups/wsnscholars/