In this article testing experts Eva Holmquist and Rik Marselis explain why AI is often biased and what can be done to avoid this.
People think AI makes unbiased decisions
People make biased decisions, because of their experiences and their beliefs. There is a common conviction that AI makes unbiased decisions. In reality, the kind of narrow artificial intelligence that exists today is far from unbiased. There are many examples of bias in systems with artificial learning capabilities that are evident, such as racist twitter bots, recruitment systems only choosing male applicants and systems that predict colored criminals will have higher risks of recidivism than they actually do. We wonder how many biased systems there are that we haven’t discovered…
Good reads on this topic are Richard Fall’s articles ‘When AI goes bad‘ and ‘Algorithms and bias in the criminal justice system‘ , and the great article by Rahul Bhargava ‘The Algorithims aren’t biased, we are‘ which discusses the need to shift focus from the learning to the teaching aspect of machine learning. Because that’s the thing, no machine is learning in a vacuum. They are learning in our world, which is full of bias. It’s also humans that choose which training data to use and what criteria to use for decisions.
When we test systems with learning capabilities, we must test for these aspects as well. Is the system behaving with unacceptable prejudice? And because it continues to learn, we also need to monitor it during live operation to catch newly acquired biased behavior.
In this article, we discuss why bias occurs and how we can avoid it.
Bias in a system due to biased training data
Bias in artificial intelligence often occurs because there is a bias in the training data. It can also occur because it is introduced during operation. In our blog-post about the basics in machine learning, we give examples of inherent problems with the training data. Knowledge about these potential problems is important to be able to avoid bias. We’re going through each of them and explain how these inherent problems can be the source of bias.
Data elements that shouldn’t influence the outcome
In machine learning, we can’t control what data is the basis for the learning process. As humans, we constantly filter out information that shouldn’t be used for our decisions, but artificial intelligence doesn’t come equipped with such a filter. We need to manage the filtering of the data. A typical example is that information about gender, sexual orientation, religion, and origin of birth shouldn’t influence a decision. If we don’t remove this information or filter it, we are likely to end up with a biased system.
Another example is an image recognition system powered by artificial intelligence that should not take the background of the image into account for determining the object. If for example, an AI must distinguish cups from glasses and all training images of cups had a black background and the glasses have a white background, the machine-learning algorithm will easily conclude that the background color determines whether it’s a cup or a glass.
Outliers in the dataset that shouldn’t be a basis for learning
Outliers, i.e. abnormal data points, can be a problem in many systems including systems with machine learning capabilities. These can occur both in the training data and during operation. They can come from people with malicious intent but can also occur in other circumstances. How much the outliers influence will depend on their proportion in the total data set.
This is a phenomenon that especially testers should be aware of. Testers tend to use edge-cases and rare examples to test the behavior of a system. A machine-learning algorithm may treat these test cases as normal input and thus learns that outliers are normal input, which it should not. Testers can prevent this by using a lot of normal data as well, or (better) to take a branch from a continuous learning algorithm and only use the branch for testing without influencing the original algorithm.
The data used for training is often gathered from another system. In these instances, there may have been bugs in that system that have resulted in some incorrect information. There may also be mistakes in the preprocessing of the training data or when giving feedback on the results. Incorrect information can also be given during operation. If the amount of incorrect information is high, we will end up with problems in the decisions taken by the artificial intelligence.
Outdated information, for example, information that resulted from decisions based on changed regulations, is a common source of bias in training data. When training an artificial intelligence, we need a lot of data. Shortly after a change, there may not yet be much information based on the new regulations. This means that often old data is still used. This will result in decision rules with a bias for the old regulations in our new systems.
This lack of up-to-date training data can be solved by creating synthetic training data. Synthetic data can be created using traditional data management software but also with the help of artificial intelligence algorithms in which the new rules have been applied.
Data that is not diverse enough
It’s easy to end up with training data that isn’t diverse enough. This can happen when the data comes from a subset of the population and therefore isn’t representative of the whole population. One example is image recognition systems that identify white males more correctly than black women. For more information about this real-life problem see Joy Buolamwini’s video-poem. This is a clear example of a problem due to using training data that isn’t diverse enough.
The problem with biased decisions
Now that more and more systems include machine learning capabilities, the problem of having biased decisions also grows bigger. Sometimes we don’t even know that it is biased. All of these systems are used to either make decisions or guide the decisions. A lot of people believe that the systems are more impartial than humans, therefore they gladly base their decisions on the advice of artificial intelligence. This means there is a growing risk for biased decisions that will influence everyday lives. Biased systems may decide you’re not going to be hired, won’t get an apartment, won’t get a mortgage, don’t get into the school you’ve applied for, or you’ll get a harder sentence in court.
Unfortunately, even when we know all of the above-mentioned decisions may be biased, it’s often impossible to prove this, because with artificial intelligence we often don’t know how the algorithm comes to its decision.
The need for decision transparency
There are a lot of decisions that may change your life and if we don’t agree we could ask for the basis of that decision and appeal against it. With artificial intelligence, it may not be possible to know the basis of the decision because the algorithms are too complex and don’t disclose their reasoning. This doesn’t remove the need for decision transparency though. In the book “Testing in the digital age”, the authors identify new quality characteristics which are needed for assuring the quality of artificial intelligence. Under the new quality characteristic “Intelligent behavior”, we find the subcharacteristic “Transparency of choices”. This is important so we know how and why a decision was made and can appeal against it. This is paramount for us to be able to trust artificial intelligence. One way of achieving this is through the means of “explainable AI”.
Explainable Artificial Intelligence
Explainable AI (XAI) aims to solve the transparency of choice problem by showing how a specific decision is made, see the Forbes article ‘Understanding Explainable AI‘ for more details. One way is of course to use simpler algorithms such as decision tree and Bayesian classifiers. But there are ways to solve it even if we’re using more complex algorithms such as deep learning neural networks.
There are three parts to AI explainability:
- Prediction accuracy which means models will explain how conclusions are reached to improve future decision making
- Decision understanding and trust from human users and operators
- Inspection and traceability of actions undertaken by the AI systems
There are more and more techniques developed that can be used to provide explainable AI. They can be divided between those that can predict the how beforehand and those that can do so after the fact. If you’re interested, more information can be found in https://labs.sogeti.com/explainable-ai/ and https://labs.sogeti.com/make-ai-trustworthy-with-explainable-ai/.
How to avoid bias in Artificial Intelligence
The risk of biased artificial intelligence is a very real threat. Fortunately, there are ways of dealing with these risks. The following are some examples and shouldn’t be viewed as a complete list.
Analyze the training data to discover bias
Biased artificial intelligence mainly occurs due to the usage of biased training data. When testing artificial intelligence, it’s not enough to just test the system. We also need to analyze the training data to see if it’s appropriate for the system’s purpose. When analyzing training data, we need to look for both intentional and unintentional bias.
Analyze the training data to see if it’s diverse enough
Another common reason for artificial intelligence to become biased is when training data isn’t diverse enough. Therefore, we need to check this. Does the data represent the population? For example, if there is a disproportionate number of people from a specific country, the data does not represent the world population.
Pre-process the training data with bias in mind
Training data needs to be pre-processed to remove any bias from the dataset. This can, for instance, mean removing outliers, removing data elements that shouldn’t be the base for decisions, adjust the diversity and add additional data to get the right level of diversity.
Use explainable AI to identify biased decisions
Explainable AI can be used to find out how a specific decision was made by the artificial intelligence. This means we can use it in testing to discover biased decisions. It can also be used during monitoring to discover an emerging bias.
Use filters to exclude operational data from learning
Today, we have a lot of problems with hacking. In the future, we will have a problem with information hacking, where individuals or groups of malicious people destroy artificial intelligence systems by feeding them faulty or biased data. To reduce this risk, we need to implement filters that exclude specific operational data from the learning process. These filters, of course, need to be tested as well.
Test for bias
When testing artificial intelligence systems, it’s imperative to test the system for bias. One way of doing this is to change the not-relevant part of the data set, to see if you still get the same decision. For instance, change the gender in an applicant’s curriculum vitae or change the background of an image, and check that the outcome doesn’t change.
As testers, we’re responsible to make sure everybody is aware of possible quality risks. This includes the risk of a biased system. As more systems include machine learning capabilities, we need to be ready to address this risk. So, let’s make sure to decrease the risk of bias in artificial intelligence and include testing of possible bias when testing these machine learning systems. This way we testers will contribute to a fair and equal world.
This article was written by Eva Holmquist (Sogeti Sweden) and Rik Marselis (Sogeti Netherlands).
Eva Holmquist is a senior test specialist at Sogeti. She has worked with activities from test planning to execution of tests. She has also worked with Test Process Improvement and Test Education. She is the author of the book “Praktisk mjukvarutestning” which is a book on Software Testing in Practice.
Rik Marselis is a test expert at Sogeti. He worked with many organizations and people to improve their testing practices and skills. Rik contributed to 19 books on quality and testing. His latest book is “Testing in the digital age; AI makes the difference” about testing OF and testing WITH intelligent machines.