The right testing strategies for AI/ML applications

Adopting Artificial Intelligence (AI) and Machine Learning (ML) based systems have seen an exponential increase in the past few years and will continue to do so. Markets and Markets forecasts the global artificial intelligence (AI) market size to grow from USD 58.3 billion in 2021 to USD 309.6 billion by 2026, at a Compound Annual Growth Rate (CAGR) of 39.7% during the forecast period. 71% of respondents in an Algorithmia survey 2020 mentioned an increase in budgets for AI/ML initiatives, with some organizations even looking at doubling the investments in these areas. With the exponential growth in such applications, the QA practices and approaches for ML models also need to keep pace.

Nuances of QA in ML

Traditional QA approaches require a subject matter expert to understand the functionality and behavior of the application under test. Such instances across applications and modules are documented in the real world making it easier for test case creation. In an ML world, the focus is on the decision made by the model and understanding the various data/ scenarios that could have led to that decision. This calls for an in-depth understanding of the possible outcomes that lead to a decision and needs a knowledge of data science.

Secondly, the data available for creating an ML model is a small subset of real-world data. Hence, there is a strong need for the model to be re-engineered consistently through real data. What’s needed is a rigor of manual follow-up once the model is deployed, so that the prediction capability of the model is enhanced continuously. This also helps to overcome the trust issues with the model since in real life the decision would have been taken through human intervention.


Finally, in a traditional QA approach, business acceptance testing involves an executable module being created and put in production for being tested. This is more predictable because so long as no additions are being made to the application, the same set of scenarios will keep getting tested. However, ML engines tend to deteriorate over a period of time in the absence of real-world data, since the ability of the model to reengineer itself is primarily dependent on the availability of newer scenarios. Business acceptance testing in such cases is long drawn and requires constant monitoring of external parameters and the availability of quality data.

Phases of QA:

For every ML model, QA needs to be applied across 3 phases – data pipeline which assures availability of datasets that can be used in creating the model, model building which governs the creation of ML engine, and deployment outlining the phase of launching the engine into a real-life environment.

  • Data pipeline: The quality of input data sets plays a key role in governing the ability to predict an ML system. Testing data pipelines to ensure the availability of clean and accurate data through big data and analytics techniques is key to the success of the ML model.
  • Model building: Out of a specified number of datasets available, 90% is used in building the model and the remaining 10% is used in validating the model. If the 10% belong to the same category, a model pushed into production will fail for that particular category and might look like an application with no ability in that particular area. There is a strong need to do a deep dive into the data segments as well to ensure even distribution.
  • Deployment: Since the all-around coverage of scenarios is what determines the accuracy of an ML model and the ability to do that in real life is limited, the system cannot be expected to be performance-ready in one go. The concept of a sweat drift becomes valid here whereby we arrive at a measure of time by when the model starts behaving reliably. There are a host of tests that need to be done to the system like candidate testing, A/B testing to ensure that the system is working correctly and can ease into a real-life environment.


In conclusion, assuring the quality of AL/ML-based models and engines need an approach that is fundamentally different from traditional testing. It needs to be continuously changing with the ability to focus on the data being fed into the system and on which predictive outcomes are made. Continuous testing, with a focus on the quality of data, the ability to affect the predictive outcome and remove biases in prediction is the answer.

Check out all the software testing webinars and eBooks here on

About the Author


Manage Testing services, Project delivery, Planning, Scheduling, and Review Automation Framework development/ enhancements, CICD, New tools, Process
Find out more about @raghukiranb

Related Content