Measuring Software Development Productivity

Measuring software development productivity is challenging in that there are no useful objective ways to measure it. Traditional approaches that strive for objectivity like counting lines of code, story points, or function points fall short in one way or another. In this post, I’ll review traditional approaches to software development productivity and discuss their shortcomings. Alternatives to these traditional methods will be discussed in a future post.

Motivation

If you can’t measure it, you can’t improve it.

Peter Drucker

If we can measure software development productivity we will know if changes we make to the people (e.g., individuals, roles, responsibilities), processes (e.g., scrum, kanban), or technology (e.g., language, IDE) are improving productivity or not. It would also be helpful to be able to compare the productivity of different software development teams and objectively measure the benefit (if any) of outsourcing.

Software Development Productivity Definition

What is software development productivity? In the business world, productivity is defined as the amount of useful output of a production process per unit of input (e.g., cost). Thus, software development productivity could be defined as the amount of quality software created divided by the resources expended to produce it. The denominator is fairly straightforward and could be measured as the cost of the software development effort (e.g., wages, benefits, tools, office and equipment costs). The numerator is the challenge.

Software development productivity is the amount of quality software created divided by the resources expended to produce it.

Note that quality must be taken into consideration when measuring the amount of software produced or the productivity measure will not be useful.

Quick Note on The Observer Effect

The Observer Effect is the idea that merely observing a phenomenon can change that phenomenon. Because there is additional overhead required to train the organization on productivity metrics and to measure and report on them, establishing a software development productivity measurement process could actually lower productivity because of the overhead. Measuring the productivity impact of a productivity measurement process is an interesting topic. Intuitively, the benefits of measuring productivity should outweigh the cost of the measurement process but I think it is a good idea to be aware of the added cost.

What We Want in a Productivity Measure

Ideally, our productivity metric would have the following properties:

  • Development team can’t game it to look more productive
  • Objective rather than subjective for consistency
  • Reasonably easy to measure (not a lot of overhead)
  • An absolute measure that can be used to compare productivity of different teams/organizations

Hours Worked

It may be hard to believe that hours worked would be a measure of software development productivity, but it is frequently used.

Hours Based Productivity = (total hours the team works) / (cost of the team)

If you compare the productivity of two software development teams using this measure, and they work the same number of hours, you will conclude that the less expensive team is more productive (i.e., that you will get more useful software produced per dollar of investment). This is often used as justification for moving software development offshore where labor rates are cheaper or the driver for a policy to hire the cheapest developers in a local market.

This is also used in some organizations as justification for encourage software developers to work more hours per week. Managers who use this productivity metric are focused on increasing the numerator (hours worked) and decreasing the denominator (cost).

The problem with this metric is that it assumes that every software developer produces the same amount of quality code per hour. This far from the truth. Studies have found that there can be an order of magnitude (10x) difference in productivity between programs and between teams. Alan Eustace (Google SVP) argued that a top notch developer is worth three hundred average ones. Bill Gates said that a great writer of code is worth 10,000 times the price of an average writer. Robert C. Martin said that ninety percent of code is written by ten percent of the programmers.

“A great lathe operator commands several times the wage of an average lathe operator, but a great writer of software code is worth 10,000 times the price of an average software writer.”

Bill Gates

“90% of the code is written by 10% of the programmers.”

Robert C. Martin

There is also a myth that the more time a developer spends in her seat, the more productive she will be (the greater the hours-worked numerator will be and the more quality code she will produce). As Tom DeMarco and Timothy Lister pointed out in their book PeopleWare, overtime will often lead employees to take compensatory undertime whenever possible, either through suboptimal work, leaving early, getting sick and so on. In essence, overtime is like sprinting: it’s great for the final stretch, but if you sprint for too long, you won’t be able to finish the race. It gives you the illusion of higher productivity.

Source Lines of Code (SLOC) Completed

Another measure of software development productivity that is used is counting the amount of code that has been written and dividing that by the cost to write it.

SLOC Productivity = (number of lines of code) / (cost to write the code)

There are different ways to count lines of code and a variety of code counting tools available. The term “source code” refers to the human-readable source code before it is compiled. There are several problems with using this as a measure of software development productivity.

The first issue is that not all SLOC takes the same amount of effort. For example, complex scientific code takes a lot longer than text boxes and radio buttons on a user interface. This has been addressed in software estimation tools like SEER-SEM and COCOMO by assigning a complexity factor to software products (e.g., an e-commerce website is less complex than an image processing system). But software cost estimation is not the same as measuring the productivity of a software development team. It is not practical to ask developers to assign a complexity measure to every software component they develop and it would be difficult to normalize this between developers. But there is another more serious problem of using SLOC as a productivity measure.

“Measuring software productivity by lines of code is like measuring progress on an airplane by how much it weighs.”

Bill Gates

The issue is that we strongly prefer software solutions with fewer lines of code than those with more code. One software developer’s implementation may be 50 lines of code and another’s might be 500 lines of code for the same functionality. The shorter the software solution, the easier it is to maintain. Most of the cost of software is in maintenance. The shorter code may also be more performant (e.g., requiring less computing resources, providing lower latency, more throughput). If we were to use quantity of SLOC as a productivity measure, we would think that a good programmer who creates efficient code is not as productive as a bad programmer that produces more verbose code which is wrong.

Function Points

The basic idea of Function Points (FP) is to quantify business functionality provided by software. There are formal methods for FP estimation and even ISO standards that govern the methodology. Function points are somewhat obscure in that they are rarely used at commercial companies in the US (they seem more popular in other countries). Allan Albrecht, the inventor of function points, observed in his research that FPs were highly correlated to lines of code and thus they share the same issues using them as a software development productivity measure. A large criticism of FPs is that like SLOC counting, they don’t take into account the complexity of the software being written. They work better for business software but not so well for software that has more algorithmic complexity (e.g., a data science application).

FP Productivity = (function points completed) / (cost to write the code)

The bottom line is that if a software development team completes 1000 function points one month and 1100 function points the next, you can’t conclude that their productivity increased 10%. That is because function points don’t take into account complexity (i.e., the software they developed in the second month might have been a lot easier than the first month). There is also significant overhead to assigning function points and it would be difficult to find staff with function point experience in the US.

User Story Points

Story Points are used by agile development teams to estimate the amount of effort it will take to complete a user story. They are typically used to determine how many user stories can be planned into a sprint (a time-boxed development cycle). They are very subjective measure that only has value within a team. Comparison to other teams, departments and organizations is not possible. A software development team can track its velocity (the number of story points completed each sprint) with a goal to improve it, but velocity can easily be gamed by the team. Since story points are completely subjective, the team can just estimate them to be higher and velocity will appear to increase. They are useful within the team to improve their own performance, but not as an external productivity measure.

SP Productivity = (story points completed) / (cost to write the code)

Use Case Points

Use Case Points (UCP) rely on the requirements for the system being written using use cases, which is part of the UML set of modeling techniques. The software size is calculated based on elements of the system use cases with factoring to account for technical and environmental considerations. The UCP for a project can then be used to calculate the estimated effort for a project. Thus UCP is only applicable when the documentation contains use cases (i.e., you have to write use cases for everything). UCP is also a highly subjective method, especially when it comes to establishing the Technical Complexity Factor and the Environmental Factor. Also, there is no standard way to write use cases.

UCP Productivity = (use case points completed) / (cost to write the code)

Subjective Measures of Productivity

As we have seen, useful, practical, comparative, objective measures of software development productivity simply do not exist. We’ve been searching for them for many decades to no avail. I believe the core reason for this is that software development is knowledge work and a complex creative endeavor. Vincent Van Gogh produced more than 2,000 artworks, consisting of around 900 paintings and 1,100 drawings and sketches. Frida Kahlo in her shorter lifetime produced approximately 200 paintings, drawings, and sketches. Which painter was more productive? Maybe one artist’s paintings took longer to create because they were more complex or required more careful thought or experimentation. How would one go about analyzing this to come up with a reasonable productivity metric?

In the absence of useful objective productivity measures, we must turn to subjective measures. I’ll discuss these approaches in a future post.