Performance Testing, Part 3

In part one of this series we've discussed how to implement a simple stress test application in Java, targeting the native SFS2X socket protocol.
In part two we have ported the same application to JavaScript using WebSocket and targeting the Firefox browser, for performance reasons.
Finally, in this last chapter, we are going to discuss testing tips, best practices and how to avoid potential pitfalls.

Performance notes (Java)

When replicating many hundreds / thousands of clients we should keep in mind that every new instance of the SmartFox class (the main API class) will use a certain amount of resources, namely RAM and threads.

For simple test clients each instance should take ~1MB of heap memory which means we can expect 1000 clients to take approximately 1GB of RAM. In this case you will probably need to adjust the heap settings of the JVM by adding the -Xmx switch to the startup script.

Similarly the number of threads in the JVM will increase by 2 for each new client generated. So, for example, a test with 1000 clients will end up using 2000 threads, which is already a pretty high value.

Any relatively modern machine (e.g 4-6 cores, 8GB RAM) should be able to run at least 1-2000 clients, although the complexity of the client logic and the rate of network messages might reduce this value.

Performance notes (Javascript)

In part two of this series we have already discussed some of the performance aspects of testing in the browser. In particular, we found significant limitations on the number of concurrent connections generated in a single web application and we've found Mozilla Firefox to be the most flexible for this sort of tests.

Essential tips for testing

If you can choose between WiFi or cabled connection always choose the latter rather than the former (for the client machine, that is).
Don't overload the generation phase: in other words don't use too aggressive timings when generating the test clients. Typically an interval between 40-50ms between each client is the recommended minimum. Going below these values will put an excessive stress on both sides and it's an unrealistic test scenario anyway.
Whenever possible make sure not to deliver the full list of Rooms to all clients. This can be a major RAM eater if the test involves hundreds or thousands of Rooms. To do so, simply remove all group references to the “Default groups” setting in your test Zone. With no Groups joined by default there will be no Room list.
Make sure you don't push the client machine to its limits. When you reach the CPU capacity all clients will start to lag behind and force the server to slow down as well, by queueing messages or dropping packets. When you hit 85-90% of the CPU capacity you should stop adding more clients.
When you have hit the ceiling of a client machine (the aforementioned 85-90% of its CPU capacity) and you still want to test more clients, it's time to add more machines.
Keep in mind what your client-side bandwidth capacity is. During the test keep an eye on the bandwidth consumption, especially the upload side of it. Since most DSL/fiber connections are not symmetrical you need to make sure that both your upload and download bandwidth don't get saturated. If they do the test will likely behave strangely with slowdowns, sudden disconnections etc.
In order to check the above (CPU / network usage) always keep the SFS2X AdminTool open during the test, as well as your OS performance monitor (i.e. Task Manager (Windows), Activity Monitor (macOS) etc.)
Last but not least: start the test with a reasonable amount of clients and slowly increase over multiple iterations. In your first round of tests don't immediately shoot for 1000s of CCUs. Since you have no idea how many resources the test will take, start at safe CCU value of, say, 100. See how the test goes, how much bandwidth and CPU you're using on both sides and make sure everything is running smoothly. Then you can increase the number of CCUs and keep testing until you reach the desired outcome or a bottleneck.

Advanced testing

1) Login: in our simple example we have used an anonymous login request and we don’t employ a server-side Extension to check the user credentials. Chances are that your system will probably use a database for authentication and you might want to test how the DB performs with a high traffic.

A simple solution is to pre-populate the user’s database with index-based names such as User-1, User-2… User-N. This way you can build a simple client side logic that will generate these names with an auto-increment counter and perform the login. Passwords can be handled similarly using the same formula, e.g. Password-1, Password-2… Password-N.

TIP: When testing a system with an integrated database always monitor the Queue status under the AdminTool > Dashboard. Slowness with DB operations will likely show up in those queues, as threads become less efficient in dealing with the requests.

2) Joining Rooms: another problem for automated tests is how to distribute clients in multiple Rooms. Suppose we have a game for 4 players and we want to distribute 1000 clients into game Rooms for 4 players. A simple solution is to create this logic on the server side.

The Extension will take a generic “join” request (via ExtensionRequest) and perform a bit of custom logic:

search for a game Room with free slots:
- if found it will join the user there
- otherwise it will create a new game Room and join the user

A similar example has been discussed in details in this post in our support forum.

Additional resources

There are two more parts to this article series:

Performance Testing, part I: where we discuss how to build a stress test application in Java.
Performance Testing, part II: where we discuss how to build a stress test application in JavaScript over WebSocket.