Quality Engineering - The study of how quality is created, maintained and lost
This week's post examines how to create, maintain, and lose quality using a team's attempt to improve speed and quality.
In What is Quality Engineering? a core part of quality engineering is:
The study of quality and how it is created, maintained and lost[...]
In this week's post, I'd like to take you through an example of how Quality Engineers study quality using a talk I recently delivered at Testbash UK called Speed Vs. Quality: Can you have both? In this talk, I describe a scenario where two hypothetical teams get stuck in the loop of working harder and working smarter.
Working harder and Working smarter
The working harder team stayed late, took shortcuts and focused on doing the work. Which at first gives them a speed boost, and everything seems great. But the technical debt they've been accruing eventually catches up with them, and their quality drops to unacceptable levels. Then, to improve their quality, they have to do more testing and ship bug fixes, which slows them down, therefore trading quality for speed.
The working smarter team meanwhile focus on improving their capability to do the work by running experiments on how they can do things differently and see how they can improve both speed and quality. Therefore, they continuously address their technical debt and never let it overwhelm them. It leaves them more time to ship value in the long run and more frequently, allowing them to have both speed and quality.
The working harder and working smarter loop is taken from the 2001 California Business Review article "Nobody Ever Gets Credit For Fixing Problems That Never Happen: Creating and Sustaining Process Improvement" by Nelson Repenning and John Sterman, which is well worth your time to read.
Unfortunately, speed and quality are ambiguous, leaving them open to interpretation. Hence, we need a way to describe both that minimises the ambiguity. This is where the DORA key metrics come in.
DORA uses throughput of code as a proxy measure for speed and stability of the system as a proxy measure for quality. Throughput is measured by frequency of deployment and change lead time. While stability is measured by change failure rate and mean time to restore service (MTTR). See Four key metrics to learn more about the core metrics, how they can be measured and the research behind it all.
Working smarter to improve your speed and quality
In my talk, I suggest that teams use DORA key metrics to measure the speed and quality of their delivery process. This will give the team a consistent way to measure both, which has been proven to improve throughput (a proxy measure for speed) and stability (a proxy measure for quality).
While those metrics help you consistently measure your delivery process, they don't tell you how to improve them.
Theory of Constraints (ToC)
This is where the five focusing steps from the Theory of Constraints come in. This approach gives you a systematic way that allows you to use your identified metrics (see above) to
Identify the constraint: What in your development process constrains your speed or quality?
Exploit the constraint: Finding ways to tweak the process to help that constraint flow more easily.
Subordinate the constraint: Review what is feeding the constraint and alter those processes to help the constraint flow more freely.
Elevate the constraint: If steps 2 and 3 have not improved the constraint sufficiently, experiment to find new ways to improve the flow through the constraint. Do this until you've sufficiently resolved or improved the constraint for your process.
Do not allow Inertia to cause a system constraint: Go back to step 1 and ensure the constraint has not moved to a new part of the process. If it has, then start the process over at the new constraint.
Applying the Theory of Constraints (ToC) to your development process
We have two pieces of the puzzle to improve your speed and quality. DORA metrics to measure your development process and ToC to systematically improve your process. But we need a way to identify what is constraining our speed and quality.
Step 1: Identify the constraint
Modelling your deployment process
One of the best ways I've found to model your development process that doesn't involve a lot of documentation and is enjoyable is using a Toast workshop. This is based on Tom Wujec's Ted talk on how to make toast or read a written description of the session on his website. The power of this session is in its simplicity. Essentially, you get your team together, hand them sticky notes and ask them to draw their development process end-to-end. This allows the team to model their entire development process in about 2-3 hours. Drawing without words is critical as it forces the team to think about what they do and how they can describe that in pictures. Which automatically enables them to break down the process into logical parts that writing it out could miss. It also allows the team to do it collaboratively - I've run sessions with up to 15 people without too much trouble. Which you couldn't do if it were written out. These sessions work best done in person rather than remotely. Having tried in the past, it took nearly 3 times as long to complete (approximately 9hrs as opposed to 3). You can read about my findings in 13 things I learned from running remote workshops.
The resulting model is a systems model of your development process. You can use this alongside your metrics on throughput and stability to help you identify where the process is being constrained. Now you want to do this whole process as a team so you 1] help everyone improve their understanding of your development process - you'd be surprised at how many times people say I didn't know we did that! and 2] help build buy-in from the start to improve the process.
Now, you've identified your key constraint to throughput or stability. It's time to exploit it!
Step 2: Exploit the constraint
This is all about quick wins. What can you do right now to improve the constraint that doesn't result in radically changing the whole process? Now, the temptation to skip to step 4 will be strong, but resist and see what you can do right now to improve things. You never know. A quick win may be all you need. But it also helps you better understand how the constraint works and why it is the way it is. Remember, we want to improve our understanding of the system. What better way to do that than tweaking it and seeing what happens. This will be valuable information from what comes later.
Step 3: Subordinate the constraint
Once you've exploited the constraint, you can look to subordinate it. You'll need your systems model again for this step. Look at the processes that feed into and out of the constrained area. What alteration can you make here to help improve the constraint even further? It may be to slow things down, add a gated process, or manually push enough work that the constraint can handle. Whatever it is, tweak the inputs and outputs of the constraint. Again, take this step as you'll learn valuable information about the process, and there is every chance these tweaks sufficiently improve the constraint.
Step 4: Elevate the constraint
Suppose steps 2 and 3 have not sufficiently improved the constraint. In that case, this is where you can start to experiment with your process to improve the constraint. At this stage, you're looking for quantity, not quality, so design processes like Crazy 8s (or any other ideas from Mural) work well for developing new ideas. Again, you want to involve the whole team - hopefully, they have been through all the steps. This is where all that understanding you've developed about the constraint from steps 1, 2 and 3 can be very valuable when coming up with new ideas to try.
Then, once you've identified your experiment, run it as a team and see if it helps improve the constraint.
Typically, the experimentation process is left up to the teams to figure out. Still, we should be more deliberate about our behaviours. Team working behaviours in most teams are not the best, with people usually muddling along and hoping for the best. This is where the Five behaviours of Teamwork can be conducive to improving the experimentation process and their general behaviours in working as a team.
Five Behaviours of Teamwork
Going into detail about these behaviours is beyond the scope of this post - one to follow up in the future, perhaps or making my Speed vs Quality talk available - let me know in the comments if either of these options interests you.
The five behaviours are
Positively framing the work: Helping people understand who the work is for, why it matters, and what they will do.
Showing fallibility: This helps people understand that things will go wrong as that it's part of the process of learning and experimenting
Creating psychological safety: This is all about helping people take interpersonal risks and being willing to admit to mistakes, not understanding things, pointing out issues and disagreeing with others.
Working across discipline boundaries: This is about recognising that we need to work with others and sometimes doing work that may not be your speciality but helps the team achieve their objective
Learning from failure: This is about using all those misunderstandings, mistakes and other things that don't quite go to plan and learning from them to get better
What the behaviours do is help people understand the purpose of their work (framing). Allow people to make and share mistakes without fear of embarrassment or judgment (fallibility and psychological safety). This speeds the teams up as they can identify and correct problems collaboratively (learning from failure and working across discipline boundaries).
Step 5: Do not allow Inertia to cause a system constraint
Assuming that you've improved your constraint at steps 2, 3 or 4, you need to start the process over and see where the constraint has now moved to. What usually happens is once teams have improved their process, they go back to business as usual. But when you improve your process at one point, other parts of the development process begin to reveal their constraints. Therefore, teams should capitalise on their new approach to enhancing their capability to do the work and start immediately on the next constraint.
Studying how quality is created, maintained and lost
The working harder and working smarter loops describe how quality can be lost by working harder and how quality can be maintained and created by working smarter. Which I frame as Speed Vs. Quality, where you see them as trade-offs or Speed and Quality, where you see them as two different sides of the same coin. That coin being value.
The DORA Key metrics help teams find ways of objectively measuring their quality via the stability of their system and throughput of code.
The Theory of Constraints helps teams identify how quality can be lost or constrained via step 1 and using the Toast Workshop to understand how the team are currently creating quality. But also how they can increase quality via steps 2, 3 and 4.
As for the Five behaviours of Team working well, this is about how you create quality in your people. Again, from Why Quality Engineering, I said:
Quality engineering is more than testing earlier in the software life cycle. It's about looking at all the facets of software engineering, from delivering the product to the processes we use to build it and the people involved. It's about taking a holistic approach to quality and understanding how quality is created, maintained and lost throughout the software life cycle. Then, using this insight, we build quality at the source.
This will be the focus of the next post. I'd like to use a concrete example of "how we build quality at the source" by focusing on three distinct areas of "products, processes and people" or 3Ps.