Test Data in Test Automation – The Crucial Point in Test Automation

Over the past few years, I have worked with my test consultant colleagues on several projects, writing automated tests for both traditional clients and websites – with different automation tools depending on the project. At some point in every project, we faced the question: “how can we design our test data for test automation?” The question in this case is less about “which data is relevant to use” and more about “how can we extract a well-defined data dump from the database and reintegrate it into a freshly set up one before the test runs, without the need to create test data during test runs?”. If such questions arise in your project, you probably already have some automated tests in place and may have already run into some problems. If you are just starting out, it makes sense to think about test data early on. Below are some reasons why you need to think about test data.


Problems you might face without a proper test data approach

  • Test cases will be written with dependencies to each other and to other data to ensure executability of tests in a regression for example
    • In order to perform the “delete customer” test case, the “create customer” test case must have run successfully.
    • In order to perform “edit product” test case, the “create product” test case must have run successfully.
  • Introducing test code which is only necessary to create and manage the test data (“setting up the data via additional test steps and using the application just before the test”)
    • In the best case this will lead to increased test run duration
    • In the worst case the whole test suite will fail, because test data couldn´t be created as intended
    • In both cases, it leads to test results that are harder to analyze because the prerequisite steps may be just as long as the actual steps for the test
  • Your test code will get messy and hard to maintain over time
  • You probably will waste time working on workarounds instead of on test cases

Experience shows that there are many projects and situations where it’s uneconomic or very difficult to get proper access to databases or build servers, but if you have the prerequisites to ensure good test data management, then you should do it.


Strategies for working with test data In Test Automation

There are various strategies for working with test data in test automation. Not all of them will work in all project contexts and all project stages, but they are worth looking at and possibly adapting to your situation. For all the following approaches, it is necessary to use an independent database or database schema for testing purposes. Otherwise your efforts in test data management could be slowed down due to constant and unwanted data changes in the shared database.

Test data generation with Java code

If you have frequent changes to your application’s data model it might not be possible to work with database exports or suitable SQL scripts. In such cases, you might consider writing a “DataHelper” class. You basically call your application’s functions, create proper datasets with correct parameters and write them into your database e.g. via a Java database connection. With this and some programming effort, you can easily create mass data to use in your tests. As a downside if you need a lot of data and have large loops the execution time for these scripts will increase.

Manual test data creation with dumping and reinserting into your database

In a more mature project, frequent data model changes are less likely. This lets us implement an automated process for inserting and exporting/ importing the required test data. To start, you´ll need to define the test data you need for your tests.

  1. Then you can manually create the data using the application, to ensure that the data you need is present in the database. This is also a good first test if your application is working as expected, or if there are any issues that need to be addressed.
  2. Before you start adding the data to the database, you must reset it to the newest well-defined data version. This is an important step to keep control over the data and not change any data by accident. It is also important to only add or change data that is relevant for your tests, and to define namespaces and conventions for the test data, so that anyone knows which test data is related to which test case.
  3. Once the data is entered, export the database either as plain SQL or as dump files or in a format that suites your testing project. When you need to import the test data, you simply clear the database first and then reinsert the data afterwards. See figure 1 for the complete process.



Figure 1: Workflow for creating test data

There are several timeslots when it is possible to reset the test data. Each approach has its pros and cons regarding execution time. The ones you could consider are:

  • Resetting the test data only before the execution of the whole test suite.
    This would be the “normal” approach to provide clean test data for the next test run.
  • Resetting the test data after and before the execution of the whole test suite.
    This is suitable if the import is not time consuming. Some projects might consider doing this as a failsafe if the database is in an unusable state after test execution. But there is one disadvantage you should keep in mind. There are situations where you need the data in the post-execution state to reproduce errors and to compare the current state against the expectations. With an automated data reset that information would be gone.


Coping with data model changes

Depending on the project you will need to adapt and extend your test data regularly. As the application changes, you will face data model changes that you need to deal with for your current and future tests. You can do it either manually by inserting the needed changes in structure and data by hand into your test data database or you can automate it. Manually changing can be suitable for some projects, but most projects will prefer to add a process step in the test data creation and exporting. For this you will need DB-change-scripts. These scripts hold all the logic and SQL-Statements to change the data model to the newest state.

The usual process for working with test data and data model changes looks like figure 2.


Figure 2: Creating test data with data model changes


Take action with Your Test Data

The strategies mentioned above give an overview into a fairly complex and context-dependent topic. Nevertheless, they are good entry points for discussing test data management with your team and for starting to introduce and adapt your own process.

Regardless of your current testing status, I would advise you to take action towards better test data management as early as possible in a project lifecycle. If you haven´t yet thought about it, take the examples above into consideration. It can certainly be a hurdle to implement a process at the beginning but it will save you a great deal of pain maintaining your tests in the long term. And don´t forget to encourage all team members –  including developers – to share their knowledge and to take part in implementing and continuously improving your new test data approach. After all, efficient testing benefits the whole team.

Now it’s your turn. What are your experiences with (automated) test data management? Do you have some good practices that fellow testers could look at? If the answer is yes, then don´t hesitate to share them!

Thanks to Mario Kühne for his input and comments in the planning stages of this article.


About the Author


Find out more about @thomas-garus