Linky #10 - Tools, Tests, and Thinking Clearly
Why the Marines might be better at AI than tech, not all UI tests are equal, and quality starts with how you frame the work
Welcome to Linky #10. In each edition, I share articles, essays, or books I've recently enjoyed, along with some thoughts on why they stood out and why I think they’re worth your time.
This week’s linky includes a range of reflections from the latest AI practices to testing strategies and quality culture. There is a strong thread running through many of the pieces this time: the importance of distinguishing between tools, practices, and outcomes, and not confusing one for the other. From how the Marines are structuring AI adoption, to how we talk about guardrails, testing, or even collaborating with AI, the lesson seems to be: what works well is often not the tool itself, but how intentionally it’s applied.
Latest post from the Quality Engineering Newsletter
This is a write-up of my latest talks, including a recording of the talk that lays out a framework to help engineering teams build quality in. A key part of the approach is identifying what quality means for your stakeholders.
To help with that, I followed up with:
A How-to guide for running a quality mapping session to surface which quality attributes matter most
A facilitation guide packed with hints and tips for running the session successfully
I’ve been indirectly saying QE is about quality, not testing, for some time now, but never said it this directly. This post explains why I believe that distinction matters and how we can practically move towards it.
Marines adopting AI better than the tech industry
𝐀𝐈 𝐈𝐬 𝐚 𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧 𝐒𝐭𝐫𝐚𝐭𝐞𝐠𝐲, 𝐍𝐨𝐭 𝐚 𝐓𝐞𝐜𝐡 𝐅𝐞𝐚𝐭𝐮𝐫𝐞
Marine Insight: AI is embedded within digital transformation - not treated as a bolt-on tool.Lesson: Don’t just hire an “AI lead” or spin up PoCs. Rethink your workflows, decision models, and structures around data. AI should reshape how work happens.
𝐂𝐫𝐞𝐚𝐭𝐞 𝐄𝐦𝐛𝐞𝐝𝐝𝐞𝐝 𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧 𝐓𝐞𝐚𝐦𝐬
Marine Insight: They deploy Digital Transformation Teams (DXTs) into commands to drive implementation and report insights.Lesson: Form cross-functional squads that embed into business units. Skip the innovation theatre - solve real problems in real contexts.
𝐃𝐚𝐭𝐚 𝐈𝐧𝐟𝐫𝐚𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞 𝐈𝐬 𝐭𝐡𝐞 𝐑𝐞𝐚𝐥 𝐅𝐨𝐮𝐧𝐝𝐚𝐭𝐢𝐨𝐧
Marine Insight: The focus is on data lifecycle, governance, architecture, and accessibility - before AI is even discussed.Lesson: AI lives downstream of your data. If it's fragmented or inaccessible, your AI won't scale. Fix the plumbing first.
𝐓𝐫𝐚𝐢𝐧 𝐟𝐨𝐫 𝐀𝐈 𝐅𝐥𝐮𝐞𝐧𝐜𝐲 𝐚𝐭 𝐀𝐥𝐥 𝐋𝐞𝐯𝐞𝐥𝐬
Marine Insight: They categorise the workforce into users, builders, and decision-makers - with tailored training for each.Lesson: AI literacy is for everyone. Execs, compliance officers, and operators all need the right level of fluency to lead and adapt safely.
𝐆𝐨𝐯𝐞𝐫𝐧𝐚𝐧𝐜𝐞 𝐓𝐡𝐚𝐭 𝐄𝐧𝐚𝐛𝐥𝐞𝐬, 𝐍𝐨𝐭 𝐁𝐥𝐨𝐜𝐤𝐬
Marine Insight: Legacy risk frameworks were flagged as blockers - and they propose streamlining them to accelerate deployment.Lesson: Ditch the red tape disguised as risk management. Build guardrails that allow for fast, safe experimentation.
𝐔𝐬𝐞 𝐂𝐚𝐬𝐞𝐬 𝐃𝐫𝐢𝐯𝐞 𝐒𝐭𝐫𝐚𝐭𝐞𝐠𝐲 - 𝐍𝐨𝐭 𝐭𝐡𝐞 𝐎𝐭𝐡𝐞𝐫 𝐖𝐚𝐲 𝐀𝐫𝐨𝐮𝐧𝐝
Marine Insight: They gather and prioritise use cases directly from the field to guide investment and capability building.Lesson: Let operational pain - not hype - steer your roadmap. Start where the ROI is clear and measurable.
𝐌𝐞𝐚𝐬𝐮𝐫𝐞 𝐖𝐡𝐚𝐭 𝐌𝐚𝐭𝐭𝐞𝐫𝐬
Marine Insight: They track adoption, blocker resolution, manual work reduction, and time-to-value - not just model count.Lesson: Success isn’t about how many models you build. It’s what they improve, who uses them, and what outcomes they drive.
Many of these takeaways for AI adoption can also be applied to automation within teams as well. There is a strong focus on outcomes over outputs, and enabling teams, rather than just doing “AI for AI’s sake.” Via The best enterprise AI playbook comes from the US Marines | Stuart Winter-Tear
Guardrails are not for steering
That's what railways do: it's not the flange that keeps the axle on the track, but the conical shape of the wheels. The flange is just for tight turns or emergency. So, when your platform team talks too much about #GuardRails, remind them that guardrails are not a steering mechanisms, but a emergency device. No one follows the road by scraping along the guardrail.
Such a great point to keep in mind. Guardrails are there to stop things spilling into areas that could be risky, not to guide the way we work. Via Gregor Hohpe: Platforms steer with low friction
New AI development practice: Specification Driven Development
Specification Driven Development. The talk is chuck full of insights, here’s a brief summary from his talk
- “The new code: specs write once, run everywhere” (it’s a must watch!):
- Prompting is sorta dead, in the next level you’ll be writing specifications
- Prompts are ephemeral, the specs will persist
- Specifications are a form of structured communication between humans and machines. Specs > Code: so much nuance and knowledge is lost just in the code
- Writing them down makes specifications useful for LLMs but also aligns humans to agree upon them.
- Tests are what makes specifications executable and verifiable.
Using a spec file seems like one of the best ways to prompt AI in a consistent way. You’ll probably never get the exact same outcome every time, but it keeps things more stable. Another tip I picked up: get the AI to refer to you by a specific name. If it stops doing so, that’s a good signal that its context window has likely filled up. Don’t know what a context window is? Then check out the key AI concepts QE should get to grips with. Via 7 learnings from the AI Engineering SF World Fair 2025 | AI Native Dev
AI is a tool, not a person
With LLMs and chatbots, the anthropomorphizing really took off. Models learned; they researched; they understood; they perceived; they were unsupervised. Soon, everyone realized models “hallucinated.” We used to say “the computer was wrong” when it offered a bogus spelling suggestion or a search produced a weird result, but suddenly, using AI meant the result was equivalent to “perceiving something not present” but it’s software; it didn’t perceive anything. It just returned a bogus result.
It’s a great point. AI is a tool, and the terminology we use around it matters. It’s probably too late to fix how we talk about this stuff, but as quality engineers, we need to be more mindful of the language we use and the frames it sets, for ourselves and others. Via “The Illusion of Thinking” | Hardcore Software
TDD is an AI superpower
2. Test driven development (TDD) is a “superpower” when working with AI agents. AI agents can (and do!) introduce regressions. An easy way to ensure this does not happen is to have unit tests for the codebase.
Kent Beck is one of the biggest proponents of TDD, so it’s no surprise he is using this approach when coding with these agents as well. What is surprising is how he’s having trouble stopping AI agents from deleting tests in order to make them “pass!”
I’ve seen this myself when playing around with LLMs. You tell it not to do something - it does it anyway - and then “remembers” in subsequent calls. Building testability from the beginning is key to catching this. Teams with testability already baked into their process are going to adapt to AI the fastest and most safely. Also see: Linky #9 and Rob Bowley making the same point. From Kent Beck’s interview with Pragmatic Engineer
Facebook wrote no unit tests before 2011 🤯
3. Facebook wrote no unit tests in 2011, and this stunned Kent, back in the day. Kent joined Facebook in 2011, and was taken aback by the lack of testing and how everyone pushed code to production without automated testing.
What he came to realize – and appreciate! – was how Facebook had several things balancing this out:
- Devs took responsibility for their code very seriously
- Nothing at Facebook was “someone else’s problem:” devs would fix bugs when they saw them, regardless of whose commit caused it
- Feature flags were heavily used for risky code
- Facebook did staged rollouts to smaller markets like New Zealand
This blew my mind. Facebook was already huge in 2011, and this was just before the mobile rocket ship really took off. I can’t help wondering: would a no-unit-test approach have worked as well once they started shipping regularly to mobile? Via Kent Beck - Pragmatic Engineer
Great example of collaborating with AI
These are the tasks I do to complete to plan my lesson:
- Engaging hook
- Direct instruction content
- Reading/video on the topic
- Differentiate the reading
- Retrieval practice questions
- Sentence starters for writing
- Closing discussion questionsWhen I use AI to plan my lessons, I go one task at a time.
This is the first prompt I use with ChatGPT
(Role) You are an expert [INSERT SUBJECT] teacher.
(Task) Help me create a lesson plan step-by-step. I will ask for various resources. Every time I ask you for something, give me 3 versions of that thing. I will choose the best one and we will continue to plan from there. Start by giving me 3 lesson hooks for a lesson on [INSERT TOPIC].
(Format) Give me the resources I ask for at [INSERT READING LEVEL].I start choosing the best hook of the 3 provided.
Then I go step-by-step and build the resources I need.
I really like how he always asks for three versions so he can pick and choose. It’s a great way to avoid simply accepting the first thing the LLM suggests.
What he doesn’t mention, though, is assessing whether what the LLM offered is any good. That’s where his expertise comes in, that tacit knowledge, and that’s the part we often take for granted with these tools. You need to already be highly proficient in the task to assess what the AI is suggesting. Via Paul Matthews on AI and lesson planning.
The five phases of quality
🔹 Phase 1: Manual testing + JIRA (insert your TCM) chaos
🔹 Phase 2: Automation heroes write all the tests
🔹 Phase 3: Flaky pipelines, frustrated devs, nobody trusts test results
🔹 Phase 4: Centralized QE strategy, traceability, dashboards, shift-left mindset
🔹 Phase 5: Quality becomes a product conversation — not just a test conversation
From the Director of QE at Snap eHealth. I think the phases are a bit broad, which means most companies will fall somewhere between 1 to 3, with many aspiring to get to 4 and 5.
As they say, all models are wrong, but some are useful. This one is useful in that it gives you a rough idea of where you are and where you’re heading. But that shift from 3 to 4 is less of a step and more of a leap, and very few (if any?) make it. Via The QE Roadmap I Wish I Had 10 Years Ago | Thomas Howard
Breaking down layers of testing
Great post by Richard Bradshaw on different types of automated UI testing:
Visual testing - comparing screens to a base image
End to end testing - testing the system as whole via the UI
UI testing - stubbing out dependencies and only rendering the UI for testing in isolation
(UI) Component testing - testing small UI components in isolation
Cross-X testing - on device, browser, platform testing
I really appreciate the distinction between visual testing, end-to-end testing, and UI testing. I’ve often seen automation test suites lump all of these together, but in reality, they serve different purposes and help validate different use cases. Recognising these differences adds valuable nuance, especially in large automated end-to-end test suites.
By separating them, you not only gain the flexibility to run them at different stages, but you also help teams understand the unique value each type of test provides. For example, UI tests might be critical enough to block the pipeline, while visual or end-to-end tests could run in parallel or even overnight, especially if they’re large or flaky.
I can definitely see myself using these terms more intentionally going forward. Via Exploring the different types of automated UI testing | QT | Richard Bradsaw.
Let me know which links resonated or if you’ve read anything lately that should make the next Linky. Always up for a good share.
Past Linky Editions
Linky #9 - Prompts, Priorities, and the People Part of Quality
Welcome to Linky No. 9 (Scroll to the bottom for past issues.) In each edition, I share articles, essays, or books I've recently enjoyed, plus some thoughts on why they stood out and why I think they're worth your time.