HYDRA: A PEER-TO-PEER ARCHITECTURE FOR MASSIVELY-MULTIPLAYER ONLINE GAMES
Massively multiplayer online games have been a huge commercial success in recent years. Existing deployments of such games have been built on a server-client architecture, even as some have claimed that such centralized architectures are inherently unscalable. This claim has been shown to be untrue by Blizzard’s World of Warcraft, which has some 8.5 million players globally as at March 2007, approximately 500,000 players online at any one time, and servers supporting several thousand players simultaneously.
Nevertheless, we believe that it is still worthwhile to develop a peer-to-peer architecture for such games because by exploiting the bandwidth and computational capabilities of the client hosts, their deployment costs can be significantly reduced and in some cases, their performance improved with reduced latencies.
Our key insight is that the server-client model is well-understood and works well. Therefore, instead forcing game developers to have to think differently when developing their games for a peer-to-peer environment, our approach is to support the server-client model transparently, thereby insulating game developers from the complexities of network churn in such an environment.
In this light, we adopt a new approach in Hydra, our network architecture for peer-to-peer massively multiplayer games. Instead of addressing issues of efficient event delivery and multicast overlays, Hydra seeks to provide a simple augmented server-client programming model to the game developer and implements a set of protocols to support the required interface. We hide the complexities associated with the recovery from node failures (i) by imposing some conditions on how the game application should process incoming messages, (ii) by having the game application provide an interface to the network layer for the checkpointing and restoration of application game state, and (iii) by providing basic guarantees on consistency in message delivery without using locks or concurrency control.
We understand that it is probably not feasible to deploy commercial games on a purely peer-to-peer architecture because for all practical purposes, such games require support for billing and persistent storage. We believe however that this is not a concern because it is not difficult to implement such functionality in a separate centralized system and have the basic peer-to-peer game integrate with these functions into a hybrid architecture.
Existing networked games tend to be locality-based, which means that the virtual game world can be divided naturally into regions. In fact, Hydra assumes that the game world is divided into disjoint regions that are each managed by a single server. The client will connect to the respective server that manages the region of the virtual world in which the player’s avatar is currently residing. Clients can only interact with other clients that are connected to the same server. For games that require a smooth transition between two regions, it is the responsibility of the game application to manage the transition.
Hydra only delivers messages to the servers for the clients, it does not actually manage the connections; all connections are managed at the application layer by the game servers. The movement of a player’s avatar from the region managed by one server to that managed by another requires either the client to transfer its connection from one server to the other, or to establish simultaneous connections to both servers.
The network interface is relatively straightforward for the game client: it can send either reliable or unreliable messages over UDP to a game server, which is specified by a unique identifier and not an IP address. Unreliable messages are each sent once on a best effort basis, while reliable messages are retransmitted until they are successfully delivered. Hydra provides no ordering guarantees for the reliable messages, but ensures that message ordering is preserved for unreliable messages. Unreliable messages that end up at the server after later messages have been executed are simply discarded. We are considering adding support for a blocking RPC-like interface for the client.
The game server is implemented on the assumption that it is solely responsible for a region of the virtual world. Messages are delivered by Hydra to the server in a priority queue that sorts incoming messages from the clients in a partial ordering. The messages are sorted in ascending order according to a discrete timestamp, called a tick, assigned by Hydra. The server will pop messages off the queue and process them, and the manner in which the messages are processed must adhere to three conditions in order for Hydra to guarantee that the game application is consistent following the recovery of a failed node.
I. Simulation Pause. The Hydra system maintains a current tick count that the game application may access with getTick(). The server should pause its simulation if the tick count of the messages at the top of the queue is larger than the current tick count, i.e. implement the following pseudocode in its simulation loop:
In order words, the simulation must process incoming messages in the main queue no faster than the current tick. If the current tick does not change, the game simulation will be paused indefinitely.
II. Simulation Determinism. The network interface for the game server is similar to that for the game client: a server may also send either reliable or unreliable messages to its clients over UDP. However, the messages that a server sends to its clients should also be synchronized with the simulation at the server. The assumption is that the server will process incoming messages in batches according to their ticks and that messages will only be sent at boundary between ticks, i.e. if a message is sent by the server after all the messages with tick t have been popped off the queue and all the messages with tick t + 1 are still on the queue (and hence not yet processed), then the outgoing message is the message corresponding to the simulation state after all the incoming messages with tick t have been processed.
The simulation should also be deterministic, i.e. if the virtual game world is in a state S before processing a batch of the messages with tick t, the new state of the game world S′ and all the messages that are sent after processing the batch of messages should be deterministic and dependent only on the contents of the processed messages with tick t. While most games will require some form of randomization, this requirement for simulation determinism can easily met with the use of pseudo-random number generators and pre-determined seeds.
III. Load/Save Interface. The game server must also support a method to checkpoint and save its internal state to an output stream and a corresponding method to initialize its state from an input stream. While this requirement may seem a little out of place for a persistent game world, many existing single-player games do support some form of load/save functionality hence this requirement is not unreasonable. Like outgoing messages, a checkpoint is to be taken at boundary points where all the messages with tick t have been processed and before any with tick t + 1 are processed.
If the simulation application fails to adhere to these conventions, Hydra cannot guarantee that in the event of a failover, the state of the game will necessarily be consistent. Our experience in developing games under the Hydra framework has convinced us that the three conditions described above are unobtrusive and can easily be satisfied by a game developer.
$Date: 2008/01/01 06:35:12 $