Big Data is here. And it’s only going to get bigger. It’s happening so fast that most organisations do not yet fully understand what Big Data is, let alone have a plan for managing it. What are the main challenges of big data testing and what is the threshold for managing large quantities of test data with existing tools and resources? This article will discuss these issues, what is big data testing and point to some additional resources.
The widely accepted definition for Big Data talks about exploiting “data sets whose size is beyond the ability of commonly used tools to process it within acceptable time”. Big Data is one of today’s most discussed topics, and the volume of data is growing exponentially with the popularity of Social media and the ever-growing internet of things, the mass of data produced by smart electric grids, intelligent traffic systems, etc. According to IBM, “Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone.”
The big data construct theory described by Gartner has 3Vs – data Volume, Velocity, and Variety. All are growing dramatically and we’re being asked to process data ever more quickly, and that data is coming from an ever wider array of channels, sensors and formats.
Big Data Testing
What are today’s testers armed with to face this onslaught? According to Naysawn Naderi, a program manager at Microsoft who played an integral role in the R&D process of defining and developing Microsoft® Test Manager 2010™, Microsoft Excel® continues to be the most popular testing tool in the industry for manual testing. Said Naderi, “It amazes me that testers live with it, as it is roughly equivalent to having a developer code using Microsoft Word.”
How do we meet the challenges in Big Data Testing?
Software engineering organizations need to adopt new technologies, as well as learn new skill sets to work with data on this scale. New specialty roles such as “Data Scientist” are emerging to meet the challenges of Big Data.
A tools landscape should meet the needs of manual testers who have to test over large amounts of data. To begin, testers need a tool that just allows them to test manually, letting them perform exploratory tests, write manual test cases, run them while focusing on the tested application, fast-forward through the uninteresting parts, report defects, share testing results with their team, and validate that bugs are fixed, all without having to worry about managing large volumes of test data and the various formats provided.
There will then need to be a clear path to test automation, with the ability to use that where it makes sense, leveraging best-of-breed tools for automating unit, performance, and UI testing, keeping manual tests where they are most effective, and making sure there’s a clear visualization of the aggregate results of all testing activity in a single place easily accessible to all stakeholders. When failures occur, the process of tasking and notifying engineers must be automated, preferably in an intelligent way that an roll up multiple related failures into one issue for the engineer.
Clearly, the ubiquitous spreadsheet must give way to a true Test Management solution. Sooner is likely better than later, given the growth we’re seeing. Such a solution should enable testing teams to leverage existing test cases and suites by importing them from Excel. Round-trip collaboration via Excel is desirable to support the transition period, and may be useful if some manual testing is outsourced. Ability to scoop up results from external specialty tools and present a clear picture of the state and results of all testing, with automatic traceability back to original requirements is another critical must-have.
If you would like a bit more depth on some of the points I’ve touched on here on big data testing, and/or some additional pointers on exploring a Test Management solution, let me recommend this PDF resource.