LLMs: Testing the Unknowns that Seem to Know It All
Vimmi Walia and Manisha Mittal
Nagarro, India
As an Emerging QA practice lead at Nagarro, Vimmi is obliged to stay current ahead and develop my testing capabilities. She recently tested six applications based on Large Language Models (LLM) in a couple of months.
A good LLM test strategy requires the tester to open the black box and look inside. We must understand new technical concepts like embeddings and mappings, token limits, and temperature. We need to learn the functions in the product, and sometimes we must go into the code to identify potential issues and vulnerabilities. Learning to code helps too.
An excellent test strategy focuses on what can go wrong. Through experience, we’ve noticed problems and risks common to LLM-based applications: hallucinations, deception, security breaches, biases, explainability issues — and test coverage that’s insufficient to reveal them.
In this talk, we’ll share my experiences and a few practices we’ve found helpful, including
- Checking for biases like race, gender, and ethnicity
- Testing model robustness by varying query syntax, semantics, and response length; adding distractors; and asking malicious questions.
- Validating responses with sources of reliable information.
- Security testing for prompt injections, insecure output handling, poisoned training data, permission issues, data leakage, and more.
In this talk, using a sample LLM-based application, we will show how we can test these models. We’ll share my experiences, challenges, learnings, practices, and quick bites on how to test LLM-based applications effectively. We’ll help you to develop a basic understanding of generative AI and LLMs through play and example scenarios, and show a live demo of search and conversations. You may see a bit of Python code, too. Then we’ll talk about risks associated with LLMs and how you can address them in your test strategy.
About Me!
Vimmi Walia
Principal Consultant – Nagarro, India
A seasoned technologist, and a continuous learner, Vimmi has more than 16 years of experience in software engineering. She is leading Centre of Quality Excellence and working as head of Emerging QA and Performance Engineering practice at Nagarro.
A passionate QA, blogger, trainer, speaker, Vimmi is dynamic, self-motivated leader and believes in delivering value driven software quality. She has deep knowledge and experience in AI testing, Performance Testing, Test Automation, DevOps and proven QA delivery experience in different SDLC models like Agile, V model, waterfall. She is passionate about community driven learning specially in the area of technology. She is leading the local chapter for Ministry of Testing, esteemed member on advisory board for CPSAT and active speaker on renowned QA communities like EuroSTAR, STeP-IN, QAI STC, AgileTestingAlliance.
Manisha Mittal
Principal Consultant – Nagarro, India
In the wild world of QA engineering, my job description read something like: Embrace your inner creator! Hunt for flaws, and you must break the system. For the past 17 years, I’ve been bending, twisting, and stretching systems through various types of testing to ensure they last longer and stay user-friendly. It’s like giving our digital devices a daily workout, but without the sweatbands.
My journey can be summed up as a quest for excellence, where I’ve worked in various domains & strong track record in leadership and management. I lead and manage diverse groups at Nagarro. From juggling code monkeys to taming the QA lions, my leadership skills are being tested every day. Furthermore, I’m on a quest for personal growth that’s more persistent than a raccoon raiding a trash can at midnight. I’m always hungry for knowledge and like a tech-hungry squirrel, I never miss a chance to stay ahead of the curve in the ever-changing world of new technologies!