Automated Testing for League of Legends

Hi, I’m Jim ‘Anodoin’ Merrill, and I work on test automation efforts for League of Legends, focused specifically on the in-game experience. I currently serve as the tech captain to the Build Verification System Development (BVS-Dev) team. In large part, our team builds tools for automated testing and helps teams write better tests.

For the past couple of years, we’ve been working on getting our test system and infrastructure up to snuff in order to increase developer efficiency and reduce the number of bugs we ship. We now run approximately 100,000 test cases a day, and automated testing at that volume helps get content to players sooner and with fewer bugs. I’d like to share a little bit of what we’ve done, and hopefully start a conversation about automated testing in the game space.

Why Do We Care?

League changes really, really quickly. On average, we see well over 100 code and content changes checked into source control every day, and providing adequate coverage for all of those changes is a challenge. With a new patch every two weeks, it’s critical that we detect defects quickly. Bugs discovered late in the release process can cause delays, lead to redeploys, or require temporary champ disables—all bad experiences for players. Automation frees up our quality analysts to focus on more creative testing and upstream defect prevention, where they can provide more value.  

Automation also provides faster turnaround for test results. It isn’t feasible for humans to run a full test sweep on every new code or content submission, and, even if it were, it would require an army of testers to return results sufficiently quickly.

Our test system runs on continuous integration (CI) and reports back within about an hour of check-in. That means that developers receive results in a reasonable timeframe, which helps reduce context switching; in fact, bugs discovered in automation get resolved eight times faster than the average bug. Better yet, if we need to increase our throughput for tests, we can simply add a few more executors to our test farm.

The Build Verification System

The imaginatively-named Build Verification System (BVS) is our test framework for the game client and server. It's responsible for acquiring artifacts to test, deploying them onto a test machine, starting and managing systems under testing, executing tests, and reporting on their results. The tests and harness are written in Python, and we wrote most of the BVS code to insulate test-writers from the complexities of gathering the required resources. As a result, a few arguments in a test class can specify what map to run, how many clients to include, and what champions should be in the game.

Tests make use of remote procedure call (RPC) endpoints exposed on the client and the server in order to issue commands and monitor game state. For the most part, tests consist of a fairly linear set of instructions and queries—existing tests cover everything from champion abilities to vision rules to the expected rewards for a minion kill. Some of our earlier tests were significantly less linear, but that made working with the system much harder for less-technical developers.

Since all the work of configuring a test workspace is done separately, the tests themselves should look the same whether running in a local workspace or in our test farm. This makes it easy to run tests locally while making changes to the game.

For example, our test for the damage dealt by Kog’Maw’s new W looks like this:

Name: BioArcaneBarrage_DamageDealt
Description: Verifies the damage modifications from Bio-Arcane Barrage
    - KogMaw deals less damage to non-lane minions
    - KogMaw deals percentile magic damage
    - KogMaw deals normal damage to lane minions

from KogMawAbilityTest import KogMawAbilityTest
from Drivers.LOLGame.LOLGameUtils import Enumerations
import KogMawStats

class BioArcaneBarrage_DamageDealt(KogMawAbilityTest):
    def __init__(self, championAbilities):
        super(BioArcaneBarrage_DamageDealt, self).__init__(championAbilities)
        self.ability = 'Bio-Arcane Barrage'
        self.slot = KogMawStats.W_SLOT
        self.details = 'Kog\'Maw deals reduced base-damage to non-minions with additional percentile damage'

        self.playerLocation = Enumerations.SRULocations.MID_LANE
        self.enemyAnnieLocation = Enumerations.SRULocations.MID_LANE.angularOffsetDegrees(45, 200)
        self.enemyMinionLocation = Enumerations.SRULocations.MID_LANE.angularOffsetDegrees(45, 400)

    def setup(self):
        super(BioArcaneBarrage_DamageDealt, self).setup()
        self.enemyAnnie = self.spawnEnemyAnnie(self.enemyAnnieLocation)
        self.enemyMinion = self.spawnEnemyMinion(self.enemyMinionLocation)
        self.teleport(self.player, self.playerLocation)

    def execute(self):

        self.castSpellOnTarget(self.player, self.slot, self.player)
        self.champAttackOnce(self.player, self.enemyAnnie)
        self.takeRecentDeathRecapSnap(self.enemyAnnie, "annieRecap")

        self.castSpellOnTarget(self.player, self.slot, self.player)
        self.champAttackOnce(self.player, self.enemyMinion)

        self.teleport(self.player, Enumerations.SRULocations.ORDER_FOUNTAIN)

    def verify(self):
        # Verify that enemy Annie is taking the correct amount of damage.
        annieAutoDamageEvents = self.getDeathRecapEvents(self.player, "Attack", "annieRecap")
        annieAutoDamage = 0
        for event in annieAutoDamageEvents:
            annieAutoDamage += event.PhysicalDamage

        annieSpellDamageEvents = self.getDeathRecapEvents(self.player, "Spell", "annieRecap", scriptName=KogMawStats.W_MAGIC_DAMAGE_SCRIPT_NAME)

        annieSpellDamage = 0
        for event in annieSpellDamageEvents:
            annieSpellDamage = event.MagicDamage

        AD = self.getStat(self.player, "AttackDamageItem")
        expectedPercentile = (KogMawStats.W_AD_DAMAGE_RATIO * AD)/100
        annieTotalHealth = self.getStat(self.enemyAnnie, "MaxHealth")
        expectedPercentileDamage = self.asPostResistDamage(self.enemyAnnie, expectedPercentile * annieTotalHealth, 'MagicResist', snapshot='preCast')

        self.assertInRange(annieSpellDamage, expectedPercentileDamage, expectedPercentileDamage * .1, "{} magic damage dealt. Expected ~{}".format(annieSpellDamage, expectedPercentileDamage))

        expectedPhysicalDamage = self.asPostResistDamage(self.enemyAnnie, KogMawStats.W_NON_MINION_DAMAGE_RATIO * AD, 'Armor', snapshot='preCast')

        self.assertInRange(annieAutoDamage, expectedPhysicalDamage, expectedPhysicalDamage * .1, "{} physical damage dealt. Expected ~{}".format(annieAutoDamage, expectedPhysicalDamage))

        # Verify that enemy minion is taking the correct amount of damage.
        AD = self.getStat(self.player, "AttackDamageItem")
        minionExpectedPhysicalDamage = self.asPostResistDamage(self.enemyMinion, AD, 'Armor', snapshot='preCast')

        expectedPercentile = (KogMawStats.W_AD_DAMAGE_RATIO * AD)/100
        minionTotalHealth = self.getStat(self.enemyMinion, "MaxHealth")
        minionExpectedMagicDamage = self.asPostResistDamage(self.enemyMinion, expectedPercentile * minionTotalHealth, 'MagicResist', snapshot='preCast')

        expectedDamage = minionExpectedMagicDamage + minionExpectedPhysicalDamage
        actualDamage = self.getDamageTaken(self.enemyMinion, 'preCast', 'minionRecap')

        self.assertInRange(actualDamage, expectedDamage, 1, "{} total physical and magic damage dealt. Expected ~{}".format(annieAutoDamage, expectedDamage))

    def teardown(self):


The first part of Kog’Maw’s suite of tests, including the Arcane Barrage damage test, looks like this:

When a test finishes a run, it provides the results to a separate reporting service, which stores run-data going back approximately six months. Depending on the source of given test data, this service takes different actions. A local run of a test opens a webpage on the executing machine that details the passing and failing cases. A run in the test farm, however, will create new bug tickets for any discovered issues, tag artifacts according to results, and send an email to committers if there are any failing cases. Test data is also aggregated and tracked via the reporting service, allowing us to see when test failures have occurred, how often they occur, and how long it has been since a passing build.

In Wood 5 we don't use wards anyway, so I see no problem with this critical failure

In order to prevent flaky or unreliable tests, each must pass through a standard process in order to be trusted. After a test has been code-reviewed and submitted, it enters a set of tests called BVSStaging. There, tests must demonstrate stability for at least one week before being promoted. If a test in staging fails, only test developers are notified in order to prevent confusion.

Once a test has demonstrated its reliability, it is promoted into one of two sets. The first set, BVSBlocker, contains tests that indicate whether a build is even worth further testing. A build which fails Blocker isn’t deployed to a testing environment, because either games aren’t starting or there are multiple severe crash bugs affecting the game. Its counterpart, BVSCore, is our core set of functional tests, including tests for every champion ability.

Framework Deep Dive

The BVS is implemented in three layers: the executor, the drivers, and the scripts. The executor implements a generic API for functional testing, while drivers implement the specific steps of configuring and executing a test. Finally, the scripts implement the specific logic for test cases. Currently, we only have one driver in use (LOLGame), but the executor-driver separation means that future projects can use the BVS by implementing their own driver and using shared utilities written when the LOLGame driver was written.

For some reason, I don't get asked for flowcharts a lot...


Individual components register their required and optional arguments as part of their declaration. When arguments are provided at the command line, they are stored as a dictionary that components consume as part of their initialization. Earlier versions of the BVS made use of the standard argparse library for Python, but we chose to move away from argparse for two reasons: first, the number of potential argument inputs was becoming large, and therefore very hard to trace through the system; and, second, drivers needed to be able to have driver-specific arguments, which meant declaring a parser at startup was not viable.

class TestFactory(API.TestFactoryAPI):
    requiredArgs = [ArgsObject('driver', 'Driver you wish to use'),
                    ArgsObject('name', 'Name of the test to run')]
    optionalArgs = [ArgsObject('overrideConfig', 'Use a non-standard game.cfg', None),
                    ArgsObject('gameMetadataConfiguration', 'A string identifying which game metadata to use', None),
                    ArgsObject('listener', 'Log listener to use', None),
                    ArgsObject('mutator', 'A string name for mutator to apply to test object', None),
                    ArgsObject('testInfoID', 'Test and metadata this test run is related to', None),
                    ArgsObject('testSubsetNumber', 'The number out of total if test is subsectable', None),
                    ArgsObject('totalSubsetNumber', 'The total numbers of subsets test is split into', None)]
Example arguments for a driver object


There are three levels of relevant granularity: test sets, tests, and test cases.

  • Test Sets are a group of tests that run together. For example, the BVSBlocker test set mentioned earlier is a set of smoke tests that run on CI. Test sets are currently described to the BVS via JSON files that can be created either in VCS or on the fly.

  • Tests are individual classes implementing a set of similar test cases that use the same basic game configuration. For example, the LoadChampsAndSkins test runs through test cases consisting of loading the assets for each champion and skin and confirming that the load occurs properly.

  • Test cases are single units of expected functionality within a test. For example, the function loadChampionAndSkin in the LoadChampsAndSkins test is a single test case that gets executed hundreds of times to cover each combination of champion and skin. The entire Kog’Maw test case above is executed by a higher level test, which allows more complicated test cases to have a bit more structure than a function.

Parallelization in the BVS is generally done at the test set level, but can also happen at the test level. Because the BVS stores and reads test sets as JSON, we create sub-lists within that JSON that can either be executed serially by a single executor or in parallel by our test farm. In the early days of the BVS, this allowed us to balance by hand, which was more efficient than automated parallelization for the small list of tests. As the major test sets in use have grown, we’ve switched over to an automated load-balancer that generates the same JSON files, but now uses the average run time for each test component over the last 10 runs.

Most users of the BVS really only interact with tests themselves, since we go out of our way to ensure they don’t have to think about any of the details that the driver handles. Along the same lines, we expose a fairly large standard library wrapping the RPC endpoints used to talk to the game. Part of the reason we do this is just to ensure that tests aren’t closely coupled to the RPC interface, but the major reason we do it is to provide a standard set of behaviors that prevent sloppy test-writing and ensure consistency between tests.

In particular, we expose no form of pure sleep in the standard test library of the BVS. Early test-writers made heavy use of sleeps, which led to a number of fragile tests that performed differently based on the hardware they were running on. All waits in the standard library are conditional waits that poll the game on a regular cadence, waiting for a condition to be met.

@annotate("Wait until a unit drops the specified buff.",
              arguments=[argument("unitNameOrID", "Unit name (or unique integer unit ID).", (str, int)),
                         argument("buff", "Buff you want to drop.", str),
                         argument("timeout", "How long to wait.", float, default=STANDARD_TIMEOUT),
                         argument("interval", "How often to check for a change.", float, default=SERVER_TICK),
                         argument('speedUp', 'Whether to speed the game up.', bool, default=False)],
              tags=["wait", "buff", "change"])
    def waitForBuffLost(self, unitNameOrID, buff, timeout=STANDARD_TIMEOUT, interval=SERVER_TICK, speedUp=False):
        conditionFunction = lambda: not self.hasBuff(unitNameOrID, buff)
        return self.__waitForCondition(conditionFunction, timeout=timeout, interval=interval, speedUp=speedUp)
Example conditional wait


One of the other major adaptations we have made to the BVS since its earlier days is separating out logic regarding everything except running a test. In the past, the BVS did the work of figuring out which artifacts it should use, tagging builds as passed or failed, and composing its own test reports. In order to preserve a clear separation of responsibilities, we have a separate service that does all work not pertaining to directly running tests. The service is a Django application using Django REST Framework to provide an API that the BVS and other services hit for current BVS state.

Run and post-run flow (click to enlarge)

Overall Performance

Overall, the BVS runs ~5500 test cases in approximately 18 minutes for every new build of League of Legends. In total, that’s somewhere around 100,000 test cases a day. The average time from defect submission to first report of failure for the BVS is between one and two hours. 50 percent of all critical or blocker level bugs are discovered by the BVS, with the remainder being discovered during internal QA or on PBE. Issues not caught by the BVS generally slip by due to a lack of test coverage rather than bad tests.

While most bugs we discover fall in the realm of game crashes or missing functionality, occasionally we will be first-finder on some truly excellent bugs. My personal favorite was a defect where all the towers in the game slowly slid into the top-right corner of the map, resulting in an epic tower traffic-jam in purple side’s base. Our finds also include things non-automated tests wouldn’t necessarily have caught, like an issue where skillshots would pass through an enemy if a champ hit that enemy at exactly point-blank range.

As a whole, automated testing has not necessarily replaced manual testing, but it has helped to speed up the development feedback loop and freed more of our manual testers up to focus on destructive tests. As more content is added to League of Legends, we continue to add more coverage, which should increase our hit-rate for defects and improve our confidence in build health.

Thanks for taking the time to read this. If you have any questions, feel free to drop a comment below. In our next article on automation, we’ll tackle the issue of test throughput and speed of return.

Posted by Jim Merrill