Custom Load Balancing

As briefly discussed in the clustering introduction and architecture documents, Load Balancing (in short LB) is an essential component in the Overcast Cluster.

The main purpose of a Load Balancing algorithm is to distribute clients among the available Game Nodes: every time a user wants to jump into a game, the Load Balancer is responsible for finding a suitable Game Node and running the match-making query on that machine.

In the Overcast Cluster, Load Balancing is fully customizable and in this article we are going to discuss how to create your own, custom Load Balancer. To get started we'll take a look at the class hierarchy used by the LB and the default implementation provided in the Overcast Cluster.

Class hierarchy

We provide a top-level interface called ILoadBalancer which should be used to implement a custom LB. The interface looks like this:

         public interface ILoadBalancer
         {
            public void init();
            public SFSGameNode process(Object params);
         }

The init() method can be overridden to initialize the LB state while the process() method is where the LB logic takes place. Here we return a Game Node (represented by the SFSGameNode class) and in case no eligible Game Node is found, the method can return a null value which in turn triggers a Load Balancer error.

On top of this interface we provide a base classe called BaseLoadBalancer which adds a number of helpful objects and methods:

         public abstract class BaseLoadBalancer implements ILoadBalancer
         {
            protected final Logger log = ... ;
            protected final Properties props = ... ;
            
            protected Collection<SFSGameNode> getGameNodes() { ... }
            protected String getProperty(String key){ ... }
         }
  • log: allows to send output to the log files.
  • props: contains any number of custom settings for the LB loaded at boot time. These properties are populated via the AdminTool's Cluster Configurator module and we'll describe them in more detail later in this article.
  • getGameNodes(): returns a collection of all active and healthy Game Nodes.
  • getProperty(): returns the value of a specific property configured via the AdminTool's Cluster Configurator module.

NOTE: we recommend to always extend the BaseLoadBalancer class to create your custom LB implementation.

Least Connection LB

The default implementation uses a simple algorithm called Least Connection which looks for the least loaded server and sends the player to that node.

         public class LeastConnectionsLoadBalancer extends BaseLoadBalancer
         {
            @Override
            public SFSGameNode process(Object params)
            {
               Collection<SFSGameNode> nodes = getGameNodes();
               SFSGameNode selectedNode = null;
               
               /*
                * If there are no Game nodes return null
                */
               if (nodes.size() == 0)
                  return null;
               
               for (SFSGameNode node : nodes)
               {
                  // First element
                  if (selectedNode == null)
                  {
                     selectedNode = node;
                     continue;
                  }
               
                  // Keep searching the lowest UserCount
                  else
                  {
                     if (selectedNode.getState().getUserCount() > node.getState().getUserCount())
                        selectedNode = node;
                  }
               }
               
               return selectedNode;
            }
         }

First of all you can notice that we didn't override the init() method, since this Load Balancing algorithm is state-less. This also allows not to worry about concurrency: since no shared state is being processed the class can be invoked concurrently by multiple threads without side effects.

In the process() method we obtain the list of all active Game Nodes by calling getGameNodes(). Next we cycle through the collection and find the node with the least amount of connections, which we return.

In case there are no Game Nodes available in the cluster we just return a null, which in turn triggers a Load Balancing error from both client and server sides.

Development

In order to develop a custom Load Balancer you would typically only need the sfs2x-cluster.jar as dependency in your Java project. In case you also need to reference other classes coming from the SFS2X API you will have to add the relative dependencies as well (sfs2x.jar, sfs2x-core.jar).

Deployment

Once you have created your own LB implementation you can proceed with the deployment following these steps:

  • Pack the class(es) in a jar file.
  • Deploy the obtained jar file to the extensions/__lib__/ folder of your Lobby Node via the AdminTool's Extension Deployer module as discussed here.
  • Specify the fully qualified name of your LB class in the AdminTool's Cluster Configurator module, under the Load Balancer tab.

Custom LB settings

Earlier we have mentioned that custom runtime settings can be loaded by the LB class via the getProperty() method. These properties can be configured via the AdminTool's Cluster Configurator module, under the Load Balancer tab:

These values are loaded by the Lobby Node at boot-time and they can be accessed in the LB class as Strings:

         String valueA = getProperty("ValueA");
         int valueB = Integer.parseInt(getProperty("ValueB"));

Health Checks

An additional activity of the LB system is to perform regular health checks on every Game Node to make sure they are responsive and ready for work. If any node becomes unresponsive or too slow for a certain period of time, the system will exclude that node from the LB pool to avoid sending players to a problematic server.

The following parameters can be tweaked from the AdminTool panel just shown above:

  • Health Check Interval: the number of seconds between every health check.
  • Health Check Max Time: the maximum amount of time (milliseconds) for a health check response from the Game Node.
  • Health Success Count: the number of consecutive successful responses required to consider the Game Node as 'healthy'.
  • Health Fail Count: the number of consecutive failed responses required to consider the Game Node as 'unhealthy'.
  • Deactivation Threshold: the amount of seconds spent by a server in 'unhealthy' state before it gets deactivated.

When a Game Node is deactivated nothing bad happens to the games that are still running on the server. Players will be able to continue their games but the node won't receive any new user since the deactivation. Over time the number of CCU will decrease and eventually the server will be empty.

Since the deactivated server is marked as 'unhealthy', this Game Node will not be able to re-join the cluster in case of a ScaleUp Event and it will be removed from the cluster when all players have left.