Controlling Non-player characters using Support Vector Machines

Posted by Pedro Melendez on January 30, 2015

Hello, this is the very first post in this blog [1] (with the exception of the hello word! post) and it will show an overview of my planned work for my dissertation. Actually this blog is intended to show all the advances in my dissertation work.

The main idea of my thesis is to control non player characters using the machine learning technique known as Support Vector Machines. Support Vector Machines (SVMs) are a set of statistical learning methods which have some properties that make them very suitable in order to be used as controller for nonplayer characters (NPCs). Unlike artificial neural networks, SVMs will not produce several solutions for a specific training set but one optimal solution that maximizes the distance between classes which is known as the margin.

The plan is to have a set of basic behaviours which would be triggered using a trained SVM. In other words, the SVM will take only the decisions about when a specific behaviour has to be triggered. The training set would consist in a set of “rules” coded in a numerical format so can be processed properly by the classifier. For example, let’s say that we want to control an AI agent in a third person shooter game and we want to establish some rules like “if you have plenty of health and bullets and you are facing an enemy then you have to attack”.

To address that we could define three variables like health, ammo and enemies near and a target action that could represent an index to a predefined behaviour like attack. This is shown in the following the table:

Health     Ammo    Enemies    Near    Behaviour 
------     ----    -------    ----    ----------
100        20      1          1       (Attack) 
100        20      0          2       (Explore) 
20         2       1          3       (Run Away) 
80         10      1          1       (Attack) 
80         5       2          3       (Run Away) 
50         2       0          2       (Explore) 

A second component of the plan of my thesis is to achieve learning after purchase. The main goal is to detect decisions that can compromise the proper behaviour of the NPC and create a “rule” (a row in the training set really) in the training set and train again the SVM. Since SVM will produce only one solution for a set of parameters and a training set, we have “some” guarantees that the NPC should not have strange behaviours because of an inappropriate solution for the training set, still the challenge is to guarantee that the new example in the training set will not compromise the quality of the result.

This post is intended to be an overview of the work I am going to be doing on the next 4 months [1]. I hope to publish details and some results as soon as I get them. Please be my guest if you want to make some comments or give some feedback of my work.


[1] Keep in mind that this was written in late 2009