Self-Healing Automation: A Remedy for Failing Automated Tests

In CI/CD environments, automated tests are the linchpin of a successful build and deployment. When tests go bad, they cause failed builds, necessary research, and examination. And, of course, if there is a real error, the need to be fixed and rerun. However, what often happens is that a test changes slightly or the underlying environment changes in a way that may cause a test to fail at certain times but not others. These situations are frustrating and I often mutter to myself, “Can’t the test figure these things out on their own?” Today, the answer is “Yes, they can!” Or, at least they can to a point.

Often, when we think about Test Automation, we are looking at tools like WebDriver or other variations which allow the user to select actions and perform operations for specific elements on a web page or app using locators. If these locators are consistent or do not change, then the test is likely to run and will be able to respond because the locator or element it is looking for or the assert it needs to determine a pass or fail condition is present. A problem arises when we are not able to guarantee what those values are or there is a chance that those values might change. Even if the elements on the page do not change, if the contents of the page are dynamic or are sufficiently complex in how they are rendered, the time it takes to render the page might prove to be an issue. I am familiar with this and have tried a number of methods over the years to effectively deal with this, ranging from implicit waits, to creating loops to confirm an element is present before moving forward. Both of these approaches have their issues.

Can AI and Machine Learning Help Us Heal Our Tests?

Currently, there are efforts underway and tools available to leverage both AI and Machine Learning to help make scripts adaptable to change and to making our tests more robust. To get into the details of both AI and Machine Learning could take an entire book or 2; but, at its heart, these tools operate under the principle of making small tests and comparisons. These comparisons create variable values that we refer to as “agents.” When an agent is created, it is given a value and over time, that value is modified up or down based on the interactions as the program is being run.

As an example, if a program has an assert statement where it finds the correct value, it then increments that agent variable (or leaves the value alone) based on it being successful. However, there may be situations where the value changes or it fails. In that case, the weighted value of an agent would be decremented and another value would be tried in its place. If that value is successful, that second agent’s count value would be incremented. This is a basic description but the idea is that the script has the ability to adapt and try different things based on the way that the page or application responds. The level of complexity can and will likely grow but the overall approach is the same. By creating these agents and employing principles of AI and Machine Learning, I can give my test scripts the ability to be more adaptable to an environment that changes or might have multiple options in the way that a page could be rendered or what could be loaded on the screen and interacted with.

Making Tests More Inclusive

One of the greatest challenges I face when I create test cases is the understanding I have of how I can interact with a system. In most cases, I would rely on there being a set of locators (be they IDs, CSS Classes, or XPath) to give me access to a particular element on a page, such as being able to interact with a form field and press a button, click on a specific link, advance to a particular page, or receive a value necessary to determine if a test was successful. If I only have one option to do this (such as I only access a username field by the ID of “username”), I will be able to run my tests without issue. Let’s say that the developer chooses to make that ID more specific, such as “initialUsername.” At this point, my test is likely to fail because I am looking for a value called “username” and it no longer exists. Under normal circumstances, I would go and change my scripts so that any mention of the element “username” would now use “initialUsername” instead. Successful? Probably. Time-consuming? Definitely.

Why is this an issue? We are relying on one specific attribute of an element to be accurate. If that value for that element changes, then we are likely to run into this brittle situation. The way to get around this is to have a strategy that can either capture or understand multiple aspects of a web element and how to interact with it. To use Chrome as an example, if we select an element on a web page and we click on “Inspect,” we can see that any element we want to interact with doesn’t exist in isolation. There are numerous other attributes that define the element. It may be a CSS Class. It may be a name. It may have a preceding text label. It may be nested within a tag hierarchy.

The point is, a locator is not typically defined by just one attribute. We can use a variety of methods to get to that element and interact with it. Having a steadily defined ID will of course be beneficial, but what if the page is still loading and we don’t have access to that ID yet? Should my script wait? Should I pause until I can confirm the ID? I could, or I could make it so that my tests can interact with other aspects of the element. Again, this goes back to the idea of having agents that would be present and those agents would be “standing by.” I have my preferred agent, of course, but in the event that my preferred agent can’t act, I can call upon another one to try a different approach. If that approach is successful, the weight of that agent increases, and, over time, my test may come to rely on that alternate agent to identify where and when it should come into play. With that change, we give our tests more opportunities to be successful.

A Simple Example using TestIM

Upon looking for example AI Automation tools, there were 3 main ones: Mabl, TestIM, and Testcraft. Mabl and Testcraft are either pay-to-play or require chatting with a sales representative; TestIM didn’t require talking with a sales rep, but it should be noted that the full “self-healing” is actually a future offering and not 100% available yet.

As an example test, I am going to use TestIM (which has a free version you can try out) and do a basic product search. In this case, we will look for and confirm we can find a particular USB audio interface. 

For our search, we will use “Focusrite Scarlett 8i6” as our search term, and we will compare a number of values to determine if the test passes or fails.

The benefit to TestIM is that anyone can step in and record basic tests to start with, structure them in either the visual block editor that they provide, or export the code to load into an IDE for further modifications. 

Below is an example of creating the test visually:

A visual example depicting the tests created for the blog post's example.
Figure 1 – A visual depiction of the example test.

If you would like to set up a similar test, here are the steps:

  1. Download the TestIM Editor (most readily available as a plugin via Chrome Extension.
  2. Click on record new test.
    • Go to
    • In the Search Text box, type “Focusrite Scarlett 8i6”
    • Click on the Magnifying Glass, or press “Tab” and then press “Enter” (I prefer the latter approach, but both will work)
    • Scroll down to the filter criteria items and select the check box for “Focusrite”
    • You should see a search listing that references a number of items.
    • Scroll to the “Amazon Choice” icon and click on it.
    • Validate that the text “Focusrite Scarlett 8i6 (3rd Gen) USB Audio Interface with Pro Tools | First” is present.
  3. When finished, click Stop to complete the recording.
  4. Click on “Save” to save your test. Give it a meaningful name (I used “Search for Scarlett 8i6”)

With these steps, you can now run your test. As a word of caution, running the test locally can have a variety of unusual errors that are difficult to debug. To avoid the majority of these issues, I encourage you to run the test in the “TestIM-Grid” configuration.

Once you have confirmed that the test run successfully, now you can examine the items in the test and see how the system uses AI to classify and rank the elements.

A list of the items in the test and their correlating star ranking, which determines the amount of value placed on the item.
Figure 2 – The amount of stars represent which values have a higher value.

The stars are the system’s highlighting which values they will place the most value on and which ones they will place less value on. As tests are run and if there are delays or errors, the listed values will be updated and preference will be placed on the agent values as they increment and less attention will be placed on values with lower agent hits. If there are values that fall outside of the desired parameters (say two stars or less) we can deselect those options and they will not be considered in future test runs. The more values selected, the more likely the tests will pass, but also the longer the tests will run as the AI will put them into contention for testing and ranking.

(There is also the ability to export tests and run them in your local IDE but that is beyond the scope of this article.)

How Can I Benefit from This?

The principles behind user agents and what makes AI and Machine Learning work are basic; but, as in all things, the devil is in the details. While it may sound simple, making it so that your test scripts can run with these agents and this self-healing approach might prove to be a major undertaking for a single Software Tester. Fortunately, many of the tools currently available are leveraging AI and Machine Learning to do this as part of their framework. Another thing to be aware of is that when self-healing is turned on or enabled, it can add time to the running of tests since now a test will need to consider multiple ways to interact with an element whereas before there was a single path to the element and interacting with it. Depending on the number of test cases run, that additional amount of time could be significant. Still, even if it does add more time, I would argue that a few seconds or minutes of increased run time outweighs being notified of a failing test that may or may not be a legitimate failure. More to the point, the time it takes to fix and resubmit that test would definitely be longer than it would take to run most if not all of the additional paths needed to ensure that the process or procedure being examined can have multiple opportunities to succeed.


Self-healing Automation sounds like magic, but in reality, it is giving your tests multiple opportunities to be successful. Whether I approach this with readily available tools, interacting with my development team to have a variety of agents at the ready to create tests, or simply create multiple paths to allow me to identify and interact with an element, self-healing Automation is a technique that can be readily used and need not be scary or intimidating. While I can not promise that all of your tests will pass or that flaky or brittle tests will not still be a reality, the odds of them becoming such are greatly reduced when a self-healing strategy is deployed.

*Disclaimer: This article is not endorsed by, directly affiliated with, maintained, authorized, or sponsored by any of the companies mentioned in this blog (Mable, TestIM, Testcraft). All product and company names are the registered trademarks of their original owners. The use of any trade name or trademark is for identification and reference purposes only and does not imply any association with the trademark holder or their product brand.

Michael Larsen
Michael Larsen is a Senior Automation Engineer with LTG/PeopleFluent. Over the past three decades, he has been involved in software testing for a range of products and industries, including network routers & switches, virtual machines, capacitance touch devices, video games, and client/server, distributed database & web applications.

Michael is a Black Belt in the Miagi-Do School of Software Testing, helped start and facilitate the Americas chapter of Weekend Testing, is a former Chair of the Education Special Interest Group with the Association for Software Testing (AST), a lead instructor of the Black Box Software Testing courses through AST, and former Board Member and President of AST. Michael writes the TESTHEAD blog and can be found on Twitter at @mkltesthead. A list of books, articles, papers, and presentations can be seen at