2023 Year in Review

My team has seen a lot of changes in the last year. These are things that we didn’t really have in 2022 but are became a part of our day-to-day in 2023.

Feature flags

We started to introduce the concept of flags in late 2022 but didn’t adopt them until 2023. We’ve rewritten the framework a few times. The team has created guidelines for flag creation, management, and removal. We’ve introduced over 200 flags in 2023. The adoption of our feature flag process has led to…

Deploying multiple times per day

In May of 2023, we moved to hourly deploys. We had previously been on a structured 2-week deployment cadence. There are some specific challenges with a 2-week cadence: maintaining the “release branch,” being beholden to the release schedule and work done or not done in time, the fact that we were deploying a bundle of 2 weeks of work, and hotfixes bypassing all the process. We’ve since moved to hourly deploys. We currently deploy on the hour and will be moving to full continuous deployment in January. Production incident remediation times are now tracked in minutes and not hours.

DDOS protection

In 2023, we moved our WAF to Cloudflare. This has given us DDOS protection and a CDN. The DDOS mitigation has proved extremely valuable, as our system has been able to withstand attacks over 10M requests per minute.

WASM

We’ve introduced Blazor to our stack to add frontend code quickly and reliably. We’re using Blazor WASM, which is C# and HTML compiled to WebAssemly. This allows us to use our C# knowledge and best practices (including automated testing) for browser code.

Running on Linux in prod

In the first half of 2023, we migrated our production servers to Linux. In the second half of the year, we migrated our remaining dev and staging servers to Linux. We’ve also migrated our build servers to Linux. These migrations saved costs on the computing side, allowing us to scale up our data side without any overall cost increase.

Latest .NET

Staying on the latest version of the framework is uncommon in most .NET shops. In 2022, we migrated to dotnet 6. In 2023, we’ve done it again and migrated to dotnet 7. In early 2024, we’ll move to the newly released dotnet 8.

Increased automated testing

In August, we increased our expectations around automated testing. We’re now near 40% for total line coverage for all codebases. We’ve adopted behavioral testing across all of the backend code. We’ve introduced Playwright, which allows us to test our frontend code in a more automated fashion.

Codified SDLC

In 2022, our SDLC was very loose and ad-hoc. In 2023, we’ve codified our SDLC. Our SDLC is meant to be flexible while maintaining consistency across the department. Our SDLC guidelines represent sensible defaults, and we hope they will continue to evolve to best serve the teams leveraging them.

Structured teams

At the end of 2023, we had one team of 12, one team of 5, and one team of 2 with QA floating across teams. We’ve since restructured into 3 teams of even size and even staffing.

Job descriptions

I know the engineering team had been working on some job descriptions/matrices, but they never quite made it to fruition. This year, Engineering leadership created measurable job expectations for software engineering levels 1-4. We’ve published these to our team and are using them in our 1:1s and reviews. This gives clarity to both our team members and managers. We’ll be creating similar documents for our managers and QA and DevOps teams in 2024.

Consistent meeting schedule

In addition to the meeting guidelines of our SDLC, we’ve also established a monthly department-wide meeting. This meeting is an opportunity to showcase the great work done each month, share department-level information, and keep each other accountable for our organizational goals.

Company-wide bug reporting

Open bug reporting is a sign of engineering team maturity, and in May of 2023, we opened up our bug reporting process to the whole company. We previously had two competing processes. Not only did this reduce transparency and create confusion, but issues reported in the support team’s system had to be verified and triaged before being added to the engineering backlog. This dual process limited visibility into the bug backlog and also skewed reporting.

This has been one of the most remarkable years of my career. Teams rarely see this much evolution in such a short time. I can’t wait to see what interesting enhancements 2024 delivers.


Improving Software Team Metrics

A healthy engineering organization (or any healthy team, for that matter) should be tracking itself across a variety of metrics. This is not covered by the standard CS curriculum but is readily encountered in the real world. Once someone is paying for software, there will invariably be questions about how that money is being spent. The most common metrics are bug count and velocity. Followed by automated code coverage. These are common because they’re the cheapest to produce. Bugs are, unfortunately, the most visible part of engineering output. Counting them is the start of reducing them. Code coverage is freely available in every modern build pipeline, although not always enabled. And velocity is the treasured metric of any young engineering leader, the end-all answer to the question “How much work are we getting done!?”

However, once you start looking, there is so much more insight you can gain and so many more things to track and compare. And, eventually, when you’re answering to very clever investors, you’ll need to provide the metrics that they care about. One of those, which I have come to appreciate, is the sprint completion percentage. This expounds on velocity and compares that actual value to the estimated or planned value. A high velocity is excellent, but accurate forecasting is even better for the overall business. This metric is easy enough to retrieve. Azure DevOps (ADO) has this baked into its velocity dashboards. The granularity is obviously at the sprint level.

With a little API magic, we can easily get:

Team Iteration Path StartDate EndDate Planned Completed Completed Late Incomplete Total
Avengers 21 2023-10-10 2023-10-23 87 58 0 0 58
Avengers 20 2023-09-26 2023-10-09 46 38 0 0 38
Avengers 19 2023-09-12 2023-09-25 51 50 0 0 50
X-Men 21 2023-10-10 2023-10-23 51 41 0 0 41
X-Men 20 2023-09-26 2023-10-09 66 79 0 3 79
X-Men 19 2023-09-12 2023-09-25 18 30 0 0 30
Justice League 21 2023-10-10 2023-10-23 90 75 0 0 75
Justice League 20 2023-09-26 2023-10-09 120 121 8 0 129
Justice League 19 2023-09-12 2023-09-25 108 77 0 0 77

The definitions for these states can be found here.

We need to do a little more math, though, for this to become a valuable reporting metric. Unfortunately, the rest of the business and the investors don’t care about your sprints; they care about monthly and quarterly aggregates.

So, let’s start there with the math that rolls up sprints to a monthly value. It’s pretty fun. We need to determine what month a sprint falls into. My calculation chooses the month that contains more days of the sprint, and if it is equal, the sprint starts.

Team Iteration Path StartDate EndDate Planned Completed Completed Late Incomplete Total Completion % Month Year
Avengers 21 2023-10-10 2023-10-23 87 58 0 0 58 67% 10 2023
Avengers 20 2023-09-26 2023-10-09 46 38 0 0 38 83% 10 2023
Avengers 19 2023-09-12 2023-09-25 51 50 0 0 50 98% 9 2023
X-Men 21 2023-10-10 2023-10-23 51 41 0 0 41 80% 10 2023
X-Men 20 2023-09-26 2023-10-09 66 79 0 3 79 120% 10 2023
X-Men 19 2023-09-12 2023-09-25 18 30 0 0 30 167% 9 2023
Justice League 21 2023-10-10 2023-10-23 90 75 0 0 75 83% 10 2023
Justice League 20 2023-09-26 2023-10-09 120 121 8 0 129 108% 10 2023
Justice League 19 2023-09-12 2023-09-25 108 77 0 0 77 71% 9 2023

Aggregating these values can be done in a few different ways. We’re combining teams and sprints to get a monthly representation for the group as a whole. I’ve found four reasonable ways to calculate this value across teams and sprints:

  • Basic Average
  • Unweighted Average
  • Weighted Average
  • “Inverted”

Basic Average

The most basic average. This would be the average of all the values for the Completion % column for a given month and year. While this is a straightforward value to calculate, I’ve found it gives too much weight to the individual sprints. For example, one lousy sprint, even with a minimal planned value, can drastically change this calculation.

Unweighted

This is the sum of the Total column divided by the sum of the Planned column for a given month and year. This assigns too little weight to individual sprints and doesn’t address the discrepancies in point values across teams.

Weighted

This has been my go-to calculation for years. This is a two-phased calculation. First, we roll up the value for the individual teams. We do this with the unweighted model but filter by Team in addition to month and year. Then, we average those values. This handles a team having a lousy sprint but recovering in the next, as well as the differences in point values.

But what about team B? They didn’t get all that work done. It doesn’t feel like the numbers represent the reality if the work not getting done was high value / high vis. The 1st phase of the weighted model allows for a disappointing sprint. And if the team is working ahead or catching up, we’re sweeping that bad sprint under the rug. While this hadn’t always directly worried me, my colleagues who had been expecting certain things and not seeing them delivered despite the 100%+ completion rates were getting a little frustrated.

So I’ve come up with a new number to properly represent just that: how much work we aren’t getting done every month.

“Inverted”

“Inverted” may be more representative of the commitment to the business. It shows if we did what we committed to but discounts the value of above and beyond work. This calculation has a maximum of 100%. The calculation is multi-phased. The first phase is the same as weighted. Then, we “invert” the monthly team values. If the number is less than 100%, we report the difference; otherwise, we report 0. Next, we average those shortfall percentages. And finally, we subtract that value from 100%.

The inverted value is more representative of our accountability to the business. It should be noted that this value doesn’t entirely neglect above and beyond work but severely discounts it. Namely, when the X-Men go above and beyond, it won’t outweigh the shortcomings of the Avengers that month.

Conclusion

Tracking software team metrics is an essential aspect of maintaining a healthy engineering organization. While common metrics such as bug count and velocity provide a basic understanding of team performance, they often fall short in providing a comprehensive view of the team’s efficiency and productivity. This article has explored the concept of sprint completion percentage as a more insightful metric, offering a comparison of actual work done against planned work.

In essence, the choice of metric and calculation method should align with the team’s objectives and the expectations of stakeholders. By adopting a more nuanced approach to tracking software team metrics, organizations can gain deeper insights into team performance, improve forecasting accuracy, and ultimately drive better business outcomes.


What Even Is Innovation?

I was once asked about the most inventive or innovative thing I’d done. Where to start? I’m a middling engineer at best. I fully subscribe to my own pitch as a leader that engineers should prioritize simplicity and obviousness over performance and cleverness.

That said, I have an obvious answer to “What is the most interesting problem you ever solved?” And just to be transparent and fair, I didn’t solve this in a vacuum. I worked with a great team and would not have succeeded without their help.

The innovation I’m proud of is a little embarrassing due to the underlying technology. While I was at Mindbody, we uncovered an impactful limitation of scaling Classic ASP web applications. That’s right, Mindbody was still very much reliant on Classic ASP, which had been deprecated with the arrival of .NET. The solution to this scaling problem wasn’t particularly complex, but the novelty and impact qualify as innovative. In the end, we were able to proactively identify, remediate, and prevent future consequences of the limitation.

In late 2017, our VP of Engineering asked me to investigate an issue plaguing another team in his org. I was a Senior Manager overseeing other teams in technically a different department, but I and some of my group had historical experience in the code in question. The nominal problem: a deployed bundle of changes resulted in a 10% increase in CPU usage in production. Rolling the deployment back brought the usage back down, and vice versa. Additionally, the CPU increase was not detectable outside of the production environment. ☹️

I started by enlisting one of the senior engineers on my team, and we began reviewing the changes in the associated deployment. Nothing initially jumped out at us, but on the 3rd pass, I began to suspect that the problem could be related to a change of an #include reference file. Please see my early post about conditional include references to understand why this is already a potential issue. (And begin to understand my absolute hat of the continued use of VBScript). – Side note: VBScript is awesome circa 1997. But, like everything else in the universe, we evolved, and the evolution of VBScript on the server was .NET. Now, if you want to complain about people choosing to use VBScript after 2001, I’d be happy to drink my sorrows away beside you. Rant over, for now– This reference file had itself added another reference, which is typical. But in this case, the outer file was almost ubiquitously referenced in every top-level file. Specifically, the heavy usage of the modified file meant that this small change was probably causing a wider-than-obvious impact.

To test the hypothesis that this one-line change was the culprit, we removed that commit from the bundle and redeployed it without issue. The CPU usage increase disappeared! While the immediate problem was solved, I still wanted to know the root cause and prevention methods.

I then endeavored to prove this issue was detectable via static code analysis. My second hypothesis was that this was related to the server doing more work interpreting more lines per request. The structure of Classic ASP requires that every single line be interpreted when served. Therefore, I suspected that more lines interpreted meant more work being done per request and, in turn, higher CPU usage.

We created a NodeJS command line tool to analyze the codebase to represent this. We used NodeJS because it truly is the best way to share multi-platform CLIs. And thank you, TJ, for commander.js! The references in the include files created an easily traversed tree. The tree was then flattened and converted to a total number of interpreted lines for any given top-level file.

We enhanced the tool to provide additional insights, such as the theoretical minimum total lines (fully optimized but impractical to maintain) and the specific references to any included file, as well as a bloat factor, which represented how far the structure of a file was from the optimal. The results were output as one CSV file and a collection of JSON files.

The results were astounding! The original (problematic) one-line change increased the total number of interpreted lines from 26 million to 52 million. On the other side of the spectrum, the theoretical optimal number of lines was just over 12 million.

From the insights gleaned from the analysis, we could then restructure the file references to a more optimal state. Finally, we submitted pull requests to the owning team and reduced the total interpreted lines to 19 million.

Lastly, I saw that this specific issue could be prevented with these new insights. So, we created a step in the build process to run the analysis and limit the total interpreted lines not to exceed a variable maximum value.

Over the years, other engineers extended the tool to support visualizations of the reference tree and various library upgrades and bug fixes. It was still a critical build step at the time of my departure.

While none of the technology is particularly glamorous, I am proud of this innovation. Over a few weeks, existing concepts and platforms were reorganized to create something novel and beneficial. We didn’t patent anything. We didn’t write a new language. Heck, we couldn’t even really talk about it for two main reasons: 1. The org didn’t want to admit to using outdated technology, 2. Who else was using that tech and would be interested in listening?

So, as I said at the beginning, I subscribe to my own pitch of simplicity. We used basic tools and concepts and put them together in a new way.

P.S. I’m not sure how much we saved the company, but it has to be substantial. At least 10 teams were blocked for 3 weeks from deploying to production. I think they would’ve continued to run into this issue, even if they found it in this instance, and probably would’ve resorted to massively overscaling production infrastructure. Yikes!

P.P.S. Let’s take a minute to discuss what was probably happening here. I say probably because I don’t know for sure the absolute underlying issue, and even if I did, there really isn’t any fixing it for this ecosystem.

VBScript works by retrieving the requested page/resource (something.asp) and then processing the contents based on the context/request and rendering the output. Again, top-notch for 1997.

VBScript is a v1 product. It isn’t optimized beyond what the engineers fathomed at the time of writing. So, VBScript pulls the initial ASP file from disk and processes it line-by-line. If there is an #include, it retrieves that and also processes it line-by-line. Why does it process every line? Because it’s a scripting language at heart, and those lines can modify global state outside of a method body( again, see my post on VBScript conditional includes). So, it is doing a lot of work for each page request. The engineers knew about this, so they created a cache of page contents to not have to go to disk every time.

In our case, though, these two concept collide and clobber each other. The need to process each request creates a ton of work, and the page sizes themselves become massive due to the (substantial but not infinite) recursive nature of the pages. Doing more work, and the cache can’t keep up, so it’s doing more work in vain. Brutal.

In the end, they did improve Classic ASP/VBScript … they created .NET.


When to Microservice

I’m enjoying Microsoft Build 2022. Developer experience (especially in the face of common and complicated IaaS and PaaS scenarios) was my favorite topic of the day 1 keynotes.

Later, watching the keynote after hours, I stumbled on a gem of a conversation between Scott Hanselman and Scott Guthrie.

Lot’s of classic “it depends” which is totally true. For me it depends on at least one of three macro factors being present:

  1. Teams/people need to develop and deploy at different paces.
  2. Parts of the system need to scale independently
  3. Parts of the system need to be segmented for security purposes. Ex: only engineers from the payments team can make changes to payments systems.

You can have any decomposition you like, but in that video Scott Guthrie alludes to the challenges you can face on either end of the spectrum (1 engineer with 100 micro-services or 100 engineers with one service).

One last note, I may start saying containers instead of micro-services going forward. I usually try say that I prefer macro-services, but then we have to have a whole discussion about the difference. Maybe the term container will become the defacto descriptor of services and their boundaries.


Software Leadership - Fostering the Team

Introduction

I’ve been meaning to write my thoughts on software leadership for a while. I’ll try to do tht here with a series of posts on the topic. Let’s start with fostering a strong team.

First of all, most teams are already great, they may just not know or show it. The talent is usually already in the building. I’m confident in the abilities and potential of most engineering team members to succeed with any project we throw at them.

But all teams can do better. We can do better as organizations leaders by setting a good example. And we can start by setting an example in 3 key ways: Transparency, Respect, and Accountability.

Transparency

One thing I am not confident in most teams to do, is communicate the company vision. I think very few of developers could tell you how what they’re currently working on lines up with company goals, or what priority it is.

We can start to change this with transparency, which happens to also be a core tenet of Scrum. By being transparent, as leaders, as an organization, we can empower our product development teams to make good, informed decisions about their projects. And we can show them the visible importance of their work on our backlog.

In setting this example, we can expect the same from them. It is as simple as professional courtesy. We can show our steps in decision making, and they can show theirs. The same goes for progress. This expected transparency will increase mutual understanding between product development and all other departments and department heads.

By increasing this communication and buy-in, we will provide our teams with intrinsic motivation. We will all be purpose-driven, and we will see better results in quality and productivity.

Respect

Respect may be the hardest thing to foster in our product development team. From my perspective, mutual respect between all members of a team is often missing in many of the relationships. Sometimes it is simply person to person, but other times it is person to department. A lot of the time it is person to organization. And again, we can fix this by setting an example. We can lean on our teams harder than we have to date. We can expect great things, and we can communicate our expectations. And not just verbally. We really need to trust our teams to solve problems, and not to simply follow instructions.

I often see team members hesitating to make suggestions or question the projects goals. This is a bad thing. We should be fostering these conversations, because these conversations lead to innovation and creative solutions. Let’s encourage questions, and expect great solutions from our development teams.

Accountability

With our freshly conveyed respect, we gain accountability.

Most notions of accountability come from negative perspectives. We only need to foster accountability because we need someone to be accountable for something that has gone wrong. It’s true, and from time to time we need in our organizations. It’s also a key part of coaching. And sometimes, it’s a good motivator to know that if a project fails, some person or group can be held accountable.

The concept I think everyone misses with accountability, though, is that it’s a two way street. Early in my career, I worked on a team for 18 months before I knew about my accountability for someone’s positive experience with our software. That’s right, for a year and a half, the only post-release feedback I received for my code was the negative type. Obviously my team appreciated my work, but all I really ever saw beyond that were the bugs.

The feedback loop needs to continue regardless of the feedback.

We all go to many postmortems when something our team worked on broke. We rarely go to a meeting to explain why or how something our team worked on was produced so well. As engineering leaders, we need to be asking a lot more questions of the successful teams than the unsuccessful ones. We should maximize our success instead of minimizing our failure.

Conclusion

It’s amusing how little software leadership has to do with the actual software sometimes. I promise I will have some more technical leadership topics in the future, but many will be like this. I hope some of this will help you with your teams. As always feel free to send any comments to me on Twitter at @clintcparker.