The Code Climate team relies on many Open Source tools to help our application give the best feedback to our customers. These tools often depend on ideas with fascinating histories, and investigating these histories can teach us how to use these tools properly. In this post we’d like to focus on the origins of one of the main features of Code Climate - measuring code complexity. We’ll look at the original paper that introduced the idea, and discuss how we discovered that understanding the role of intuition in quantifying code complexity is crucial to correctly interpreting complexity measurements.
While Code Climate doesn’t use this exact measurement, the history of quantifying the complexity of a computer program can be traced to an algorithm known as “cyclomatic complexity.” This concept was introduced in “A Complexity Measure,” a 1976 paper by Thomas J. McCabe, a United States Department of Defense employee who was involved in many large scale programming and programming management efforts during his career. As is the case with most enduring concepts in computer science and software engineering, the problems that motivated McCabe’s original work are still relevant today. The text of the paper begins:
“There is a critical question facing software engineering today: How to modularize a software system so the resulting modules are both testable and maintainable?”
While we have more ideas about modularity, testability, and maintainability than we did in 1976, this is still at the heart of what makes programming complex and challenging for modern programmers, product managers, stakeholders, and more. In order to answer this critical question, McCable makes the claim that:
“What is needed is a mathematical technique that will provide a quantitative basis for modularization and allow us to identify software modules that will be difficult to test or maintain.”
Charged with the task of how to identify complex programs in order to reduce testing time and maintenance costs, McCabe takes a page from his experience with graph theory and provides a framework for determining the complexity of a computer program based on the idea of graph theoretic complexity.
The details of the algorithm aren’t terribly important, but the basic idea is that the connectedness of a graph relates to its complexity, and that the notion of complexity is independent of size. Here’s the author’s description of the strategy:
“The overall strategy will be to measure the complexity of a program by computing the number of linearly independent paths, control the “size” of programs by setting an upper limit to these paths (instead of using just physical size), and use the cyclomatic complexity as the basis for a testing methodology.”
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
In the first function we can see that there are unnecessary conditional statements that cloud the intent of the (admittedly trivial) function. By removing these if statements and compacting the function to its essentials, we intuitively have a less complex function. By that token, the cyclomatic complexity score would be lower.
While the concept of applying graph theory to the structure of computer programs is novel and would have alone made this paper a landmark, the true measure of its genius lies in the desire of the author to “illustrate the correlation between intuitive complexity and graph-theoretic complexity.” In other words, the author was aware that the intuition of a programmer with respect to their notion of complexity is a powerful one that is worth preserving, and instead of seeking an algorithm that programmers could lean on, he sought one that would confirm what they already believed.
Modern static analysis tools deal with a larger volume of code than McCabe probably ever imagined possible, and so their goals have to be somewhat aligned with modern practice. Tools like Code Climate that analyze entire code bases are going to be more powerful in the historical view than in the moment - there is simply too much information to consume to be able to review a quality report every time new code is pushed to a repository. Instead, the onus is on modern tools to provide a glimpse into how things are changing.Essentially, for a tool to be applicable, it must be trustworthy, and to be trustworthy, it must conform to the underlying assumptions that engineers have about the material that they work with: code.
For Code Climate and other code quality tools to measure their success, they should look to see if they are, for the most part, conforming to the intuition of developers. Where they’re not doing so, they should be opening interesting conversations that help programmers get to the heart of the designs they are implementing. Can you think of examples when static analysis has confirmed or contradicted your intuition? Leave us some examples in the comments below and we’ll share them.
Stay tuned for the next post in which we’ll explore some of the visualizations McCabe used to express code complexity, and look at some real world code examples to see how lowering cyclomatic complexity can make code intuitively less complex.
Works Cited McCabe, T.J. A Complexity Measure IEEE Transactions on Software Engineering, Vol. SE-2 No.4, December 1976. Department of Defense, National Security Agency.