For most organisations, Test Data Management (TDM) starts and ends with copying, masking, and possibly subsetting a database. Such a logistical approach, however, only focuses on migrating data, and carries over many of the issues inherent to using production data itself.
Masking and subsetting should by no means be abandoned. They are good practices, and in some instances are mandatory. Further, they go some of the way to ensuring compliance, and can reduce the infrastructure costs of storing data. But, there are several reasons why masking and subsetting alone cannot accommodate the successful implementation of a Continuous Delivery framework.
1. Quality
Before masking or subsetting a database, you must ask “do we have the data needed to build a new subsystem?” If using production data, for 90% of possible test cases, the answer will be no. Production data is sanitized by its very nature, and only covers scenarios that have occurred before. Because of this, it typically only provides 10-20% coverage, and places a higher focus on “happy paths”.
However, it is the negative paths and unexpected results that most often cause a system to collapse, and negative testing should in fact constitute 80% of the total testing effort. If using production data, defects are likely to make it into production, where it costs up to 1000 times more to fix them. In fact, 40-50% of project costs are currently expended on re-work.[1]
2. Time
When using production data, teams have to manually find the data to subset and mask. But, in most instances, data is stored inconsistently, and in uncontrolled spreadsheets. Testers can therefore spend up to half their time finding data, while as much as 20% of the Software Development Lifecycle (SDLC) is spent investigating, finding and manipulating data. When this happens, project delays quickly mount, and budgets soon over-run.
3. Dependency Constraints
As in reason number two, the poor provisioning of data should be a major concern for organizations wishing to implement a continuous delivery framework. Teams often find themselves waiting for data to become available from upstream teams, or unable to work with data being used by another team. In some instances, teams might wait weeks for data to become available, rendering them unable to quickly respond to changing business requirements and leading to costly delays.
4. Compliance
This is a big one. Even masked data does not necessarily ensure compliance with regulation. There are some instances where using production data of any kind is forbidden – this might be the policy at a government department, for example, while the new EU Data Directive, due for implementation in 2016, is set to further restrict the use of personally identifiable information in non-production environments.
What’s more, 58% of data breaches are caused by something that masking alone cannot necessarily prevent: human error (Infosec, 2014). Further, referential integrity has to be maintained when masking data, as otherwise testing might break down. But, the more complex the data is, the more easily information can be correlated and cracked, even after it has been masked, and organizations then risk an average fine of $3.5million (Ponemon Institute, Cost of Data Breach, 2014)
5. Cost
In addition to the costs of non-compliance and late delivery, the infrastructure required to copy, store and maintain production data can be prohibitively expensive. As applications become ever-more complex, expenditure on hardware, licenses and support will increase, and yet some organizations maintain as many as 20 copies of a single database (Bloor, 2013).
Organizations wishing to continually deliver quality software, on time and within budget, can no longer overlook Test Data Management. They need to consider how they store and provision data, and how to best ensure compliance. The quality of the data used should also be a serious concern for organizations wishing to prevent costly defects making it to production, while a good TDM policy can help lift the dependency constraints that prevent parallel development.
References:
[1][1] http://www.critical-logic.com/cause-effect-modeling-whitepaper/