[Originally posted on January 28th, 2010]
The final title of my dissertation was Decision-Making in game characters using Support Vector Machines. The main reason to change the title was because this one offers a better description about what the dissertation is about. It tries to check the feasibility of the use of support vector machines as a decision making algorithm for game characters. Thus, it was required to develop a software implementation which shows how support vector machines can be used for this purpose. As a proof of concept, an approach to achieve continuous learning is evualated as well. The idea was to develop a label feedback mechanism, which adds new labelled points to the training set and retrains the algorithm when an oportunity for improvement is detected.
The figure above shows the general diagram of the solution developed for the dissertation. From an abstract perspective, the approach proposed is similar to any other traditional model like FSM. The character will process some input, more often called internal or external knowledge depending if the information was taken from the character (internal) or from the environment (external). The input is presented in a numerical format to a previously trained SVM which will return the index of the action to be taken. While the action is being performed, a label feedback mechanism called in this dissertation as a quality function (very similar to the role of a fitness function in a GA) will evaluate if the input is suitable to improve the training set.
In the case that the quality function detects that an improvement has to be made, it will add a new row in the training set which is labelled with a proper tag, then the SVM will be retrained.
###Input Data
In order to be processed the input data has to be prepared in a numerical format. The training set was designed to encourage taking decisions similar to the FSM solution used in the comparison. The figure below shows an explanation about how the training set was built. Each row in the training set has a relation with some transition in the FSM. For example if there is an Action 1 in the FSM which has a transition to the Action 2 when a variable target is equal to 1.
It would be in the training set a row with previous action equal to 1, the target value equal to 1 and the resulting label as 2 (mapping in this way to Action 2 ).
###Testbed Game : Happmann
Happmann is a third person shooter with a deathmach-style gameplay intended to be played on a local network . The camera system was designed to allow players to see most of the battle field, and thus all of the action can be seen. The game was developed as a course project for the module CS1170 Game Design and Development which is part of the MSc Computer Games Technology offered by University of Abertay Dundee.
Since the original game lacks a single player option and the source code was available and familiar, it was chosen with the idea of being a proper option as a test-bed game to conduct experiments.
#Learning Mechanism
###Offline Learning
Although my dissertation uses an online learning mechanism to tune the solution at runtime, some offline learning tasks were required such as the initial training phase. The input data was transformed and scaled and later it was trained using a RBF Kernel.
##Online Learning
Online Learning is the ability of a system to learn from its own predictions or results. Note that this learning experience is not necessarily at runtime, but it has to be performed without human supervision. For the case of this dissertation, the input data was simple enought to allow the learning process at runtime, but this probably would not be the case with more complex data.
As an online learning system will tune its knowledge without supervision, it is necessary to have a label feedback mechanism to add new input data and perform the retraining of the algorithm. The mechanism proposed is a function named in this dissertation as a Quality Function.
The main goal of this quality function is to check guidelines that could be violated during the execution of the game. As a prof of concept, the quality function in this dissertation checks for oscillations, repeated actions and as a stopping criteria when the character is returning.
Although simplistic the quality function presented before shows how general guidelines can be defined to tune misbehaviours while the game is running. It could be written in a more complex way to define strategic guidelines or to adapt difficulty parameters. However, the goal of this dissertation is to show if the method proposed is viable to achieve online learning and did not pretend to explore all the possible ways about how the learning mechanism can be defined.