Work Related to HQ

RoboCup

RoboCup, the Robot World Cup Iniative, is an international effort to advance AI and robotics using the domain of soccer. Although the emphasis is on building physical robotic soccer players, there is a Soccer Simulator available. The simulator allows client programs to control simulated robotic players, and simulates the information such players would collect to send to the client programs. The simulator is written in C++ and clients communicate using UDP socket connections.

RoboCup is similar to HQ in that it is a two-player game where computer programs control multiple pieces (soccer players). It is also an imperfect information game, since each piece has a restricted view of the game.

In RoboCup, it is difficult to evaluate the strength of higher-level control algorithms since success depends heavily on how well pieces dribble, kick, intercept, etc. HQ was designed to minimize the effects of such low-level abilities to concentrate on high-level control issues.

The robotics focus of RoboCup places severe time restrictions on the contestant programs. Although both RoboCup and HQ have the same discrete model of time, HQ allows contestants minutes versus seconds in which to make a move. This allows more sophisticated strategies to by investigated.

The RoboCup simulator moves all the pieces and allows all the pieces to change their motion at each time step. HQ moves all the pieces at each time step, but allows only one piece to make a motion change per time step. The RoboCup approach is necessary in soccer due to the autonomous decision-making of soccer players. However, the HQ approach is more appropriate for the isolated study of control strategies.

Writing a successful RoboCup client program requires a significant investment of time attending to low-level details. This impedes progress on the high-level group control issues tested by HQ.

Finally, RoboCup does not give a learning program time to learn about its contestants before it is rated in a tournament. HQ allows programs to learn from practice games with contestant programs before they are rated.

RARS

RARS is a Robot Auto Racing Simulator written in C++. It is an environment for testing robot control strategies using the domain of auto racing. Like Robocup , computer programs control multiple pieces (race cars) in an imperfect information game. Pieces have a restricted view of the race track and other pieces.

Like Robocup , there is limited communication between pieces. In fact, it is discouraged. There are no time restrictions on the contestant programs.

Of particular interest is the allowance for learning contestants. Each contestant can read and write to a data file. Practice runs on a track can be made. However, the emphasis of the game is on the static performance of a contestant.

The UMASS Machine Learning Lab

The Machine Learning Lab at the University of Massachusetts at Amherst has written some C code that allows computer programs to play each other on the net in the games of Othello, Checkers, or Hearts. The code will run on UNIX machines and allows a program to be available to any contestant using a daemon process that listens to a port for other players. The protocol for communicating moves is defined. This allows tournaments between programs to be held at any time. This approach is described in the following techreport .

Of particular interest is the algorithm used to assign rankings to the different programs. A tournament is held once a day. It basically consists of a single bubble sort pass through a sorted list of the programs. Two programs are compared by playing a match. The order of the programs on the list is swapped depending on the results of the match. After this pass of matches, the ranking of each program is adjusted by
for (i=0; i less than n_players; i++)
    player[i].rank = player[i].rank * DECAY + (1.0 - DECAY)*(i+1);
where DECAY = 0.5.

JavaBots

The JavaBots system was used to develop JavaSoccer, a Java version of the RoboCup system. In JavaSoccer the individual soccer players can learn different behaviors. Each piece learns independently of the other pieces. The focus of the research is on the coordinated strategy that emerges (some pieces behave as offensive players and others as defensive). The focus of HQ is in learning effective coordinated strategies directly.

Robot Warfare 1

RW1 is related to HQ in that users submit contestant programs, and contestants have incomplete information about the state of the game. It differs substantially from HQ. RW1 programs are written in a restricted language that does not support learning. Contestants are single pieces, limiting the range of interesting possible strategies. The board is only a discrete 10x20 grid. The tournament rules do not favor programs that learn.

C++Robots

An improvement over Robot Warfare 1 in that contestant programs are written in C++ and the board is continuous. However, contestants are still single pieces, limiting the complexity of possible strategies. The tournament rules do not favor programs that learn.

C-Robots

Similar to C++Robots except that the game is real-time, whereas C++Robots is turn based.

RoShamBo

A tournament for computer programs to compete in the game of Rock-Paper-Scissors. Contestants are C code functions that simply return an integer value: 0 = rock, 1 = paper, or 2 = scissors. Contestants have access to two history arrays for a match that record the history of their choices and the history of the other contestant choices. A match consists of from 100 to 10,000 turns. A simple count is kept for each turn to determine the winner of the match. A tournament pairs all contestants. Contestants are emailed to the tournament organizers and are compiled locally.

This tournament is of interest because it is an incomplete information game and allows learning. It would be more interesting if contestants had access to the history of every match played, not just the current match. Also, the speed of a contestant is important, which hinders some learning methods.

The Glicko Rating System

The Glicko rating system extends the Elo rating system by incorporating a measure of uncertainty of a player's rating. The Glicko system is now used on the Free Internet Chess Server . It accounts for the increasing uncertainty of a player's rating over time between rated matches. Since HQ allows a program to choose its competitor and whether the resulting match is rated, it is possible that some time will pass between rated matches.