Multi-Threaded Game Server Design

March 28, 2021

Speed is key! Right?

When you think of a game server, you might be surprised to find that many are single threaded. Open source private server projects sometimes have a single thread for processing packets from all players. Some even go as far as to develop on a Single Threaded Apartment model. Understandably, this design leads to performance bottlenecks. It makes a lot more sense to use multiple threads so packets from players are processed in parallel. So speed is key, right?

Well, if you’re familiar with Comet, then you might already know where this is going. Comet is a base game server project which provides new developers with a documented, multi-threaded game server skeleton. The project has been forked a few times now, but most attempts to build upon it have been unsuccessful due to the complexities of multi-threaded design. It turns out that stability is key, and performance is very close behind. Let’s talk about that.

Comet

You can check out the skeleton game server project referenced in this post by clicking the button below!

View Here

The problems with multi-threaded server design

The main problem most developers run into when working with a multi-threaded game server is the server’s access to resources. Players often play together in an MMORPG, which can mean sharing maps, items, quests, and so on. Not only that, the server must also control monsters that attack players and drop items. That’s a lot of state behaviors to work with, and manipulating those states across multiple threads can lead to a slew of problems.

The first problem developers run into involves shared collections. To oversimplify: collections are groups of objects, such as monsters on a map or items on the ground. For some game events, a collection might need to be scanned (for example, monster pathfinding to the closest player in their field of view). The problem is, players can leave the monster’s field of view as that collection is being scanned by a separate thread. Depending on the type of collection (think a linked list), the pointer to the next player could be deleted while being accessed, which could cause a segmentation fault. Segfaults have the potential of crashing the server, but usually cause exceptions to be thrown in modern programming languages.

Another common problem when accessing collections between two threads is when one thread races to modify a state before the other, causing undesirable behaviors called race conditions. In the example of a game server, a monster and a player might attack each other at the same time. Pause for a moment and think critically about why that might be a problem.

One example of a race condition you might have thought of is when checking health. When one thread checks the player’s health and status, it might determine that they can be attacked and killed by the monster. At the same time as that thread is processing the attack, a second thread processes an attack from the player. That second thread can also determine that the monster can be attacked and killed. By the time the first thread finishes the attack by the monster, the second thread is already processing the attack from a now dead player. This results in both the player and the monster dying at the same time. See the code example below.

 
					public bool Attack(IMapEntity observer)
{
    // Both threads can check health at the same time.
    if (this.Health > 0 && observer.Health > 0)
    {
        uint damage = Random.Next(0, 10);
        observer.Health = Math.Max(0, observer.Health - damage);
        return true;
    }
    
    return false;
}

What we learned from this is that both collections and sections of code can be defined as critical sections for accessing resources. We can solve those problems with a pretty simple but dangerous technique.

A dangerous solution to dangerous problems

One solution for preventing segfaults and race conditions is to lock multiple threads from accessing critical sections at the same time. Locks are a type of mutual exclusion that prevent threads from entering critical sections together. As one thread enters the critical section, the lock is enabled and prevents other threads from also entering that critical section.

This might seem pretty safe at first. After all, if you can prevent two threads from accessing critical sections at the same time, then the crashes and race conditions stop. Unfortunately, the nesting and ordering of multiple locks could cause the threads to wait on each other indefinitely. This is called a deadlock. Watch the video below outlining a famous example of deadlocking from the Dijkstra’s Dining Philosophers problem.

In the video, Gary illustrates a modern solution to the problem using schedulers (the waiter); however, the original solution to the problem is actually the order of the locks (which is more unbalanced than the priority queue design mentioned in the video). Let’s make this problem a lot simpler and in the context of Comet.

Let’s say you have two players who want to gift gold to each other at the same time. Both gifts acquire locks so the player’s gold amount can be checked and modified safely to prevent gold duplication. If the actor gifting the gold always locks first, then it’s possible that both actors lock themselves at the same time and wait on each other’s locks. See the code example below.

 
					public bool GiftGold(Character observer, int amount)
{
    lock(this) // Both actors lock themselves at the same time.
    lock(observer)
    {
        if (this.Gold >= amount && 
            this.Gold + amount <= Kernel.Constants.GoldLimit)
        {
            this.RemoveGold(amount);
            observer.AddGold(amount);
            return true;
        }
    }
    
    return false;
}

Rather than locking themselves first, a simple solution would be to order the actors by a unique identifier. For example, the lowest character ID always locks first. It means that the character with the lowest ID always gets priority for gifting; however, that doesn’t matter for this scenario where we only care about at most two units of work.

Lockless designs inspired by tabletop games

Given the ideas and strategies above, you can absolutely make a multi-threaded game server. I’ve seen compromises in multi-threaded design across a variety of different projects. Most use excessive locks at the higher chance of causing a deadlock / impacting performance. On occasion, I find servers that queue packets by type to reduce the chance of race conditions. These are all fair solutions; however, we can make this a lot simpler.

In Chimera, my personal Conquer Online private server project, I use channels to synchronize threads and reduce race conditions without locking. Channels are typed conduits through which you can send and receive values between two or more threads. They act like message queues, allowing messages to be sent in order between threads.

Chimera

You can read more about my game microservices project in Golang by clicking the button below!

View Here

In Golang, the programming language Chimera was written in, channels are a language primitive. In C#, the programming language Comet was written in, channels are optional frameworks that can be included (blog post). Comet already uses channels in its packet processor.

Using channels allows us to limit the use of locks for thread synchronization. Returning back to the example of combat between players and monsters on a shared map, channels can be used to create a completely lockless map and combat system.

Conquer Online is a 2D isometric game. This means that it operates on a tile system, used to map world coordinates to the 2D plane. Conquer Online additionally partitions their world into multiple boards/maps, similar to dungeons and the overworld in Diablo or the Dungeons & Dragons (D&D) tabletop game.

We can treat Conquer Online like D&D. If a channel is assigned to each map, then all actionable items can be queued just like the in-combat mechanics of D&D. For example, equipping armor, attacking a monster, drinking a potion, using a scroll – all of these actions would be queued on the map’s channel. Even the monster AI threads can post to this channel and have their actions be queued with the players, given a degree of fault tolerance. This allows all map actions to be processed in sequence as a single thread, but still allows all maps to be processed in parallel in respect to one another.

Channels can also be used for a variety of game systems outside of combat. For example, a trade channel for serially processing trade requests (think of the gold gifting example from before).

Conclusion

Although Chimera operates a bit differently given its microservice architecture, Comet as a single executable can still benefit from channels in a similar manner. Channels are absolutely no silver bullet to solving every complexity in multi-threaded server design. However, they do present us with yet another powerful tool in developing robust servers for our players. Have fun, and feel free to visit the forums for additional questions and discussions.