Top three talks from TestBash 2025
My personal notes from the top three talks of TestBash Brighton 2025
I came back from TestBash Brighton 2025 with pages of notes and plenty to think about. Rather than trying to cover everything, I wanted to share my notes from the three talks that stuck with me the most. The ones that challenged how I think about testing, leadership, and quality.
Before you dive in, I’d love your input. I’ve been experimenting with different ways to share what I learn from conferences, from full notes to short summaries and even live Q&As.
What would you like to see more of?
Here are the three talks I’ve shared notes from:
A Tester’s Role in Evaluating and Observing AI Systems by Carlos Kidman
Why Automation Is Holding Back Continuous Quality: Finding Balance in Modern Testing by Philippa Jennings
GenAI Wars Episode 3: Return of the Explorer by Martin Hynie
If you want the full event write-up, you can read it here: TestBash Brighton 2025: Reflections on Two Days of Quality.
Tip: For the best experience viewing my notes posts, use a desktop browser. This allows you to easily navigate between chapter headings using the contents panel on the left side of the page.
What a tester’s role in evaluating and observing AI systems
By Carlos Kidman
Summary
Really interesting talk on how testers can systematically test AI systems. It walked us through different evaluation tools, showing how they help us understand where uncertainty lies in AI and what quality attributes we should be assessing. The tools themselves weren’t out of reach for testers, you don’t need to be an AI specialist to use them. You just need the willingness to get involved and get stuck in.
One powerful point that stood out for me was that teams often don’t do this kind of testing simply because they’re not used to thinking in that way. That’s exactly where the real value of testers and quality engineers comes in: bringing a quality mindset to engineering teams, helping them identify the attributes that matter, and guiding how to assess them in a systematic way.
Another brilliant session, well worth looking into if you’re trying to understand how to evaluate AI systems or are now being asked to.
Key takeaways
Testers can apply systematic evaluation techniques to AI systems without needing to be AI experts.
Existing testing skills (designing experiments, defining metrics, building test datasets) transfer directly to AI evaluation.
Benchmarking and evaluators (like annotations, custom code, or LLM-as-judge) make AI performance measurable.
Quality engineers play a vital role in helping teams identify which quality attributes matter and how to assess them.
Tools such as LangSmith build observability into AI systems and make testing more transparent.
Evaluating AI is about managing uncertainty - making it visible, measurable, and actionable.
The real value of testers is in shifting how teams think about quality and helping them test AI more systematically.
My notes
Keep reading with a 7-day free trial
Subscribe to Quality Engineering Newsletter to keep reading this post and get 7 days of free access to the full post archives.