DevOps: Small Deployment Size and Low Work-in-Progress Drive Success
By Michael Ryan, Head of Architecture and Managing Principal Consultant
Fast and Frequent: Is There Really a Need For Speed?
Most DevOps transformations focus on speed. The primary goals are often to:
- Shorten the time between code being committed to code running in production(Lead Time for Changes).
- Deploy code to production more frequently(Deployment Frequency)
The 2021 DORA(DevOps Research and Assessment) report shows Elite DevOps teams are fast. They deploy on demand, often multiple times per day.
On the other hand, Low performing teams are slow. They deploy less than once per 6 months.
Elite DevOps teams quickly push code through DevOps pipelines and into production. Their Lead Time for Changes is less than one hour.
Low performing teams take more than 6 months to get code from commit to production.
From the DORA 2021 Report:
Much of the tooling and practices built around DevOps pipelines focus on quickly validating code and deploying to production. Practices like “shifting left” on testing — automatically validating code on developer’s laptops before it is even committed — detects and fixes defects before code even gets into a DevOps pipeline. This reduces Lead Time for Changes and increases Deployment Frequency.
Overall, the industry emphasis is on being fast and frequent: quickly getting code from engineer laptops to production and doing this dozens, even hundreds, of times per day.
But what is the point of deploying fast and frequently? High Deployment Frequency and low Lead Time for Changes are very important but do not provide value in themselves. Instead, increases in team productivity and software quality are driven by the combination of small Deployment Size and low work-in-progress (WIP).
Work-In-progress (WIP) is a concept that comes from the Lean manufacturing movement. WIP is considered a form of waste — work that has been done and paid for but is providing no benefit to the customer or the company.
In a software development context, WIP is written code that is not yet in production. It can be on an engineer’s laptop but not committed to a repository, committed to a repository but not yet in a DevOps pipeline, or code in a DevOps pipeline waiting to be deployed. Essentially, WIP represents completed work that is already paid for by the business but is not providing value for a customer.
More important than financial considerations, WIP requires cognitive bandwidth to manage. In manufacturing, partially finished items need to be moved and stored until they can be completed. In software, partially finished items often end up in long running code repository branches. A team needs to keep track of what features are in what branch and be aware of any interactions between branches. The cognitive load this imposes on a team takes away from providing value to the customer and company.
Deployment Size and WIP are What Matters
Deploying Small and Single Piece Flow
The benefits of DevOps are not derived from going fast, but from continuously pushing small amounts of code to production in a process similar to single-piece flow in manufacturing. Within limits, the smaller the amount of code pushed through a DevOps pipeline into production the better.
Small deployments are better than large deployments. Why?
- Small amounts of code are easier to develop and test than large amounts of code. Testing small amounts of code is simpler and less error prone than testing large amounts of code.
- Small chunks of code are easier to build and deploy. The time to deploy is less because the complexity is less.
- Monitoring and defect detection are easier. In the event a defect is released to production it is much easier to pinpoint the problem when only a small amount of code has been released. This reduces the DORA metric Mean Time To Resolution.
- It is easier to roll back small amounts of code if there is a production defect. In practice, changes in a small deployment model are often so incremental that the customer impact of any defect is negligible. A “fix forward” approach is often used instead of a rollback.
- It is easier and quicker to get customer and stakeholder feedback on the deployment of a small feature, and easier to implement future changes based on that feedback.
- Constant small deliveries of value to customers increases both customer satisfaction and the developer experience of engineers.
How small a deployment is small enough? One quick test is if a single person fully understands all the code being deployed. It is nearly impossible for one person to fully understand everything in a deployment that combines code from multiple teams each contributing several features. If a team’s deployments are not passing this test it is a warning the deployments are too large.
Work to deploy smaller — there are tips to do so throughout this article.
Many of these benefits disappear with larger deployments. Small Deployment Size drives value, but…
Deploying Small is Not Enough: How High WIP Increases Decreases Team Effectiveness
The benefits of small deployments are greatly reduced if Lead Time for Changes gets so large that multiple deployments are stacked up in a DevOps pipeline. Stacked deployments means WIP increases, and when WIP increases the cognitive load on a team increases and software quality tends to suffer.
Large deployments and high WIP increase cognitive load on a team while also increasing the complexity of the tasks the team must handle. They kill productivity and harm the overall developer experience. This is a form of Extraneous Cognitive Load(described below) and can cause the performance and morale of a team to deteriorate badly.
On the other hand, low WIP reduces cognitive load on a team. Limiting WIP shortens the lead time for new features, reduces context switching, and allows engineers to concentrate resources and attention on the tasks that are most important. When team members focus on only one thing at a time they are more productive and generally more satisfied with their work. Developer experience is enhanced.
Low WIP is correlated with, but does not automatically come with, small Deployment Size. Small Deployment Size is critical — but the amount of un-deployed code a team is responsible for also needs to be minimized to get the full benefit. An engineering group could deploy 10 times per day but if those deployments each take a week to get through testing and another day waiting for manual approval there will still be a high level of WIP. The team will suffer under the increased cognitive load and become frustrated.
Of course WIP can be reduced by having engineers stop writing code — and this approach is used more often than realized. The better solution is to automate and optimize the team’s DevOps pipeline so code gets through as rapidly as possible. When code flows rapidly through a DevOps pipeline at any point in time there is very little code (WIP) in the pipeline itself. Deployment size is small. This is why the DORA Lead Time for Changes metric is important but not itself a driver of software quality.
In the end it is coupling small Deployment Sizes with low WIP that reduces team cognitive load, improves software quality and enables a team to reach elite status.
Cognitive load is a concept introduced in 1988 by psychologist John Sweller and recently made more popular by the book Team Topologies: Organizing Business and Technology Teams for Fast Flow by Matthew Skepton and Manuel Pais.
There are three different kinds of cognitive load:
Extraneous Cognitive Load — items that demand attention but do not add much value. An example would be typing a lengthy and arcane set of commands to deploy an application to a test environment. This sort of load should be minimized through automation or better tooling. A high level of extraneous cognitive load can be very demotivating for experienced developers. They want to work on tasks that provide high value not waste time on manual work or bureaucratic tasks.
Intrinsic Cognitive Load — mental capacity related to foundational elements of the current task. In software engineering these are items an engineer should “just know”. Examples would be debugging with an IDE, knowledge of a particular language, commonly used libraries or other specialized tools. Teams should try to minimize this kind of cognitive load through training, hiring talented staff, proper tooling, etc.
Germane Cognitive Load — this is where most of an engineer’s cognitive capacity should be spent. Examples are higher level “value add” thinking on architectural concerns or code-level tactical decisions like API design. This is where deep understanding of a code base takes place.
A team’s capacity to handle cognitive load is very limited and must be consciously managed. Try to eliminate Extraneous Cognitive Load, minimize Intrinsic Cognitive Load and maximize Germane Cognitive Load.
Know Where The Constraints Are
Now we understand why rapidly moving code through a pipeline (Lead Time for Changes) is so important — it reduces cognitive load by reducing WIP and also results in small deployments.
In most real-life scenarios there will be a number of pipeline improvements to choose from. How does a team know which change will most reduce WIP or Deployment Size? This is the crucial question. “Improvements” that do not reduce WIP, do not reduce Deployment Size, do not reduce the cognitive load on a team or provide other benefits are a form of waste. It is very easy to spend much time and money improving the wrong thing.
This is where the Theory of Constraints can play a role. Focusing on constraints is the fastest and most effective way to improve performance of a pipeline or any other process.
Theory of Constraints states that in any process there is usually one most important limiting factor (the constraint). Only improvements to this constraint will translate into maximal improvement of the whole process. Improvements to factors that are not constraints provide less benefit if any at all.
When one constraint is removed then another step becomes the constraint — the new most important limiting factor. Focusing on removing constraints ensures improvements are always made on the most important factor — and what is the most important factor often changes as improvements are made to other steps in the process.
A high level of WIP often means there are significant constraints in the system. Early in a DevOps transformation these constraints are often manual steps.
For example, waiting one week for manual approval of a release may be the constraint in one deployment system. It would probably not be a wise investment if a team were to spend two weeks improving a different part of the system so work got to the manual approver in three hours instead of four. The constraint in this system is manual approval of deployments — working to eliminate or greatly optimize that step will provide the most value to the team.
In some processes there is a single constraint, which is the case when multiple processes are run in parallel. In that case only improvement in the longest process will result in time savings.
More rarely processes may have more than one constraint, such as when tasks are arranged linearly with no buffer time between them. In this case time savings on any task results in overall time savings. The choice of which constraint to optimize may come down to the cost of improvement.
In most real-world scenarios Value-Stream Mapping can help identify these bottlenecks. Value-Stream Mapping originated in the Lean manufacturing movement. It is a technique that creates a visualization of all the components and steps necessary to deliver a product or service. Once the visualization is created the goal is to analyze and optimize the entire process.
The Wrong Branching Model Can Increase Deployment Size and WIP
Gitflow vs Trunk-based Development
Time and again teams use finite cognitive bandwidth to coordinate multiple “release” code branches, each scheduled for deployment on a different date. By definition each release branch is WIP. Engineers have to keep track of what features are in each release branch while also developing new features. This sort of structure creates distraction and requires frequent context switching. It can be very error-prone.
Using the right branching strategy is critical to deploying small. Some branching models like Gitflow optimize on protecting a release from engineers the team doesn’t know or trust. This is often the case in a large open-source project with thousands of committers.
With Gitflow the code base is split into several long running branches, each designated for a future release. The cognitive load on a team using Gitflow tends to be high. They have to remember what each branch is for, what features are included in each branch, and how code in each branch interacts with other branches. There are many checks as code winds its way to production.
Gitflow is the right tool for certain use cases, but the payment for all the protection it offers are often very large, infrequent releases. This is the opposite of what will drive value for most DevOps teams.
Instead, look at Trunk-based development. Trunk-based development is a practice where developers merge small, frequent updates to a core main branch. Trunk-based development has strong support in the DevOps community as it streamlines merging and reduces cognitive load. It is designed for speed and frequent small deployments.
The Deployment Death Spiral
Deployment Frequency is Still Very Important
To review, deployments of small amounts of code drive improvement in software quality, developer experience and customer satisfaction. A team has to deploy frequently if they want to continually deploy small amounts of code while maintaining good productivity. Frequent deployments enable small deployments which improve software quality.
The reverse is also true: when a team deploys less frequently their Deployment Size tends to increase, developer experience worsens and software quality goes down. A decrease in software quality leads to more of a team’s capacity spent fixing defects instead of building features that provide value to customers. Things can unravel quickly from here.
In practice this anti-pattern often unfolds as follows: A team deploys on a set schedule (weekly, monthly or quarterly). In one cycle a defect is found in testing just before the deployment date and the deployment must be delayed. As the diagram below illustrates, something as simple as delaying a deployment can sometimes create a downward spiral that is hard to escape from.
Decreased Deployment Frequency. When defects are discovered prior to a deployment the application needs to be re-tested. This takes time even if all testing is automated. If testing is manual it can take weeks to get even small changes or bug fixes through validation. Much of this work may have to be repeated if the defect is not completely fixed on the first try. As a result, production Deployment Frequency slows.
Increased Deployment Size. Let’s say there are 10 engineers working on a software system. Most of those engineers are evaluated on producing features or fixing defects — evaluated at least in part on writing code.
This team of engineers continues to churn out code and features whether they are deployed quickly or not. All this code is WIP. Deployment delays often result in more WIP building up. The size of the next deployment increases.
Increased Production Defect Rate. As the size of deployments increase the likelihood defects are released to production also grows. Each team has an error rate which could be expressed as defects per line of code or defects per story point. Either way, the more lines of code or story points deployed the odds the deployment contains a defect increase. Some defects will be trivial, others more serious and require immediate re-deployments.
Larger deployments increase the percentage of production deployments resulting in customer complaints and emergency hot-fixes increases.
Increased Time to Fix Production Defects. A larger deployment makes fixing production defects more difficult. There is more code to inspect and more possible interactions between features being deployed. It is simply harder to find and fix defects when a large amount of code is deployed. This increases the Mean Time to Resolution metric used by DORA.
Increased Testing Time. When more defects are released to production the natural impulse is to fix the problem by testing more. If this involves manual testing, even manual spot checking “just to be sure” the automated tests are working correctly, then Deployment Frequency is slowed and Deployment Size and WIP increases. Paradoxically, this extra testing effort can increase the odds a production defect is released and increases the time to find and fix that defect.
The Spiral Worsens. With more time spent on testing, Deployment Frequency drops again. This causes another corresponding increase in the size of each deployment and increases the odds of a defect being released to production. The time to fix any defects also increases, more restrictions and checks are made before each deployment (perhaps a second level of approval by rarely available senior management) and the cycle repeats itself.
All the while the original 10 engineers are working earnestly and continue to churn out more features and bug fixes for the next release. Deployments grow bigger and more complex each cycle.
This can become a downward spiral. Engineers become frustrated and look for other work, taking their domain knowledge with them. Dissatisfied customers don’t understand why the software can’t be made to work properly. They are aggravated by the overall poor quality and are likely to switch to other providers if given the chance.
Sure, a team may not fall into the death spiral on their first large deployment. Many teams skate by for some time and suffer no ill effects. But sooner or later serious defects emerge, fixing them takes longer, the amount of WIP grows, deployments get larger and the Deployment Death Spiral takes hold.
Four Steps to Escape the Deployment Death Spiral
Being trapped in the Deployment Death Spiral is painful. There are no cookie-cutter solutions to this problem. Each engineering group and company are different, as are the demands placed upon them. Political considerations are different everywhere.
A common counterproductive approach is an attempt to “catch-up” by adding new features on top of any rework from the previous deployment. The appeal is with a little extra work over the next few releases the team can get back on schedule. In fact, this approach only perpetuates the Deployment Death Spiral by simultaneously increasing the size of the next deployment while increasing WIP. It often fails.
Instead, take care that each step to escape the Deployment Death Spiral either reduces Deployment Size, lowers WIP, or both.
Below are some concrete techniques to pull out of the death spiral. They have worked for other successful teams and may work for the reader’s team as well.
Step 1: Do not make matters worse by continuing to merge new code into the pipeline.
- Merging new code into the DevOps pipeline just increases WIP and increases the size of the next deployment. This is exactly what a team wants to avoid. Only merge code for absolutely “must have” features or bug-fixes. This reduces WIP in the pipeline and the size of the next deployments.
- WIP still exists in the form of features waiting to be merged into the DevOps pipeline, but for now this form of WIP is easier to manage than WIP in the pipeline itself.
- This step reduces Deployment Size and lowers WIP.
Step 2: Assign a portion of the team to automating testing and optimizing the DevOps pipeline.
- This step can sometimes be done concurrently with Step 1.
- This step can be politically challenging to implement. However, if 100% of an engineering team is always putting out fires there will never be any capacity to improve.
- Moving engineers from writing application code to work on DevOps pipelines or automated testing reduces WIP (by not creating it in the first place) and reduces the size of the next deployment.
- Automating the DevOps pipeline makes it run faster. This reduces WIP and reduces Deployment Size by getting code to production faster. Value-Stream Mapping can identify constraints to be removed.
- Automating testing reduces overall WIP by getting code through the DevOps pipeline and into production faster. It can take some time to automate testing in a large application, so one good first step is to require any new feature or defect fix be tested entirely through automation. Just this one practice can make a profound difference in Deployment Frequency, Deployment Size, Change Failure Rate and WIP.
- This step lowers WIP and reduces Deployment Size.
Step 3: Commit to reducing Deployment Size by increasing Deployment Frequency
- Remember that high Deployment Frequency leads to low Deployment Size. Low deployment Frequency leads to high Deployment Size.
- Use increasing Deployment Frequency as a forcing function. If a team is deploying quarterly, challenge them to deploy monthly. If they deploy monthly, challenge them to deploy every two weeks. The team provides a list of steps to meet the Deployment Frequency goal and then implements those steps. This can be very effective at identifying waste in the entire code/build/deploy process.
- Working on increasing Deployment Frequency also keeps engineers from adding code to the DevOps Pipeline, which reduces WIP and the size of the next deployment.
- Keep challenging the team to reduce Deployment Size. The optimal way to do this is to deploy very small chunks of code virtually on demand. However, a team does not need to deploy on demand to see substantial improvements in application quality and overall work life.
- Important Note: this isn’t about forcing engineers to work harder or produce more features. The amount of work produced by the team should be constant. Instead, the team is simply reducing Deployment Size by deploying the same amount of work to production more frequently.
- This step reduces Deployment Size and lowers WIP.
Step 4: Pay Attention to Conway’s Law
- Use cross-functional teams to design and construct a DevOps pipeline. This team’s objective is reducing WIP and reducing Deployment Size by rapidly moving code through the pipeline.
- Fight against any manual checks or approvals. If a human can test quality, so can a machine. But a machine can do it over and again without getting bored, distracted or tired.
- Reduce hand-offs to a bare minimum, even in an automated process. Hand-offs are weak points in any process.
Conway’s Law was created by Melvin Conway in 1967. Conway’s Law states that “Organizations, who design systems, are constrained to produce designs which are copies of the communication structures of these organizations.” In software architecture, this means any architectural design tends to mimic the structure of the organization designing it.
Many companies manually pass each deployment through multiple teams: Dev hands-off the deployment to QA, QA hands-off to Performance, Performance hands-off to Security and so on. The work is done sequentially and involves manual hand-offs. Any additional steps are simply added to the chain.
When it comes to automating this process the hand-of pattern often repeats itself. Work is not done in parallel because each team is accountable for their own results and therefore insists on having their own isolated step of the process. The hand-offs between teams tend to become manual. This slows the entire process. It is better than when all work was manual, but still not very effective.
The problem is not each team zealously making sure their responsibilities are met. The problem is how the pipeline team itself is constructed. A cross-functional team with representatives from each group (Dev/QA/Performance/Security) charged with maintaining high quality while reducing WIP will likely come up with a better pipeline design.
A Virtuous Upward Spiral Begins
Implementing these steps can produce a virtuous upward spiral, essentially the reverse of the Deployment Death Spiral. The team’s capacity to produce quality software is increased at every step.
This pattern often unfolds as follows:
- The faster a team deploys the smaller their deployments become.
- Smaller more frequent deployments reduce WIP.
- WIP is further reduced by automating DevOps pipelines and testing.
- Small deployments and lower WIP lower Change Failure Rates (odds of a defect released to production). The team spends less time fixing defects after every production release and more time providing value to customers.
- Even if a defect is released to production, it is easier to find and fix the defect because less code was released. This lowers Mean Time To Resolution.
- An organization’s confidence in automated DevOps pipelines and automated testing increases as the quality of deployments increases. More and more testing is automated, which reduces WIP and reduces Change Failure Rate.
- The cognitive load on the team continues to decline. They have more time and capacity to further decrease Deployment Size and reduce WIP. Software quality increases.
- Production defects become very rare. Instead of spending engineering capacity putting out production fires, engineers are continually seeking new ways to improve their processes and delight customers.
Why does this even matter? In most cases if a team deploys small it has to deploy fast (increase Deployment Frequency). Increasing Deployment Frequency tends to make deployments smaller and also reduce WIP. Deploying small and deploying fast seem like two sides of the same coin. Does keeping the team’s focus on small deployments and low WIP really matter?
It does matter. Here’s why:
Benefits are Easy to Explain to Stakeholders. Stakeholders can understand the benefits of small deployments. A team deploys small because small deployments are easy to understand, easy to test, easy to rollback, and make it is easier to find and fix defects. High Deployment Frequency is one of the things that enable small deployments, but high Deployment Frequency is not the goal in itself. It’s important that stakeholders understand this.
Ease Fears of Recklessness. Very frequent deployments worry stakeholders. Speed seems reckless. If some stakeholders learn the team wants to deploy multiple times a day they will worry the team won’t be testing carefully.
Stakeholders aren’t irrational. Sometimes they don’t understand how software quality is improved because no one has properly explained it to them.
Demonstrate to these stakeholders how large deployments are inherently risky. Describe the Deployment Death Spiral to them. Let them know the team is proactively reducing risk by breaking up large deployments into smaller and more manageable chunks that are easier to test. Hopefully they better understand the benefits small deployments and low WIP provide.
Think of it as the difference between someone memorizing and then perfectly reciting a hundred lines of poetry all at once versus memorizing and reciting three or four lines of poetry at a time until the poem is completed. Which approach is going to be less error-prone and more likely to succeed?
Measure what matters. If small Deployment Size and low WIP drive quality, shouldn’t the team measure them along with the other DORA metrics?
Admittedly, measuring Deployment Size and WIP is hard and might require tracking lines of code deployed or something more indirect like story points or function point analysis . Even if Deployment Size and WIP are hard to measure, we recommend teams consider tracking them along with the other DORA metrics.
When things go sideways. Even with great engineers, superior automated processes and the most agile of practices a team will run into stretches where a few non-trivial defects are released to production. Perhaps Deployment Size has inched up and now the team is paying the price.
Stakeholders might again think the team is not being careful enough. These stakeholders might be senior people in the company who strongly suggest the team reduce its Deployment Frequency and be more careful. They want to personally approve each release. In their minds frequent deployments are reckless because in many areas of life speed IS reckless. It is the team’s job to explain why frequent, small deployments reduce risk.
By arguing the focus is on continuously deploying small chunks of code and reducing WIP the team might keep from falling into the Deployment Death Spiral.
In Kenzan’s experience engineering organizations can be broken down into three groups:
Base-level engineering organizations employ many manual processes. Performance declines rapidly under a limited amount of change or load. Their capacity for improvement is small and they can only recover to baseline performance with extreme difficulty.
Effective engineering organizations produce good quality software while enduring a moderate amount of change or load. If either exceeds a certain threshold performance rapidly declines. These organizations use some automated processes while still depending on manual work for important tasks. Their capacity for improvement is modest and they can recover to baseline performance with moderate difficulty.
Elite engineering organizations grow stronger the more change or load is applied to them, in the same way the human body grows stronger through rigorous exercise. They actively seek to learn from mistakes and incorporate those learnings into future work, often embracing cloud infrastructure and cloud-native architectures. These organizations depend heavily on automation to increase quality while reducing cognitive load on teams and individuals. Their capacity for improvement is great and they can recover to baseline performance with relative ease.
Getting DevOps right is critical to becoming an elite engineering organization. If you’d like to discuss these concepts further please reach out to the author at firstname.lastname@example.org. Michael Ryan is Head of Architecture and Managing Principal Consultant at Kenzan.