I hate long methods. Not just in the academic "Oh, I've read Refactoring and long methods are kind of ugly" sense, but rather in a deep visceral sense. I see more than my share of them when I help people with legacy code issues. It's like stepping into a new yet familiar swamp every single time.

Scroll, scroll, scroll. "Oh I'm in that method." Scroll, scroll, scroll. "Oh, that's happening in this method too?"

You get tired of it. You start to search desperately for answers.

A friend of mine, Alan Francis, had an interesting idea a while back. He thought it would be nice to have some language enforcement for method size. Imagine a language which only allowed methods with, say, three statements? Wouldn't it all be better then? Sadly, I don't think it would. People would resort to something I once saw at a company I visited. The edict came down: methods could only be X lines long. The results that policy produced were predictable. The code was littered with artificial sub-methods. Imagine a method named processTransactions() calling methods named processTransactions1(), processTransactions2(), and processTransactions3(). It was that bad.

It's easy to imagine that this is just ignorance; that in right and capable hands all methods would be well named and less than ten lines long, but when was the last time you saw a code base like that? Does it ever happen?

A few years ago, I read an interesting post on a mailing list. Someone mentioned that method size in projects is a long tail distribution.

Before reading that, I hadn't given it much thought. I'd assumed that, in a typical code base, the vast majority of methods might be about 20 lines or so, with an equal number of smaller and larger methods. The normal distribution holds in so many other areas, I'd just assumed that it held for method size. But, it turns out that it doesn't.

I saw this for myself when I produced a frequency histogram of method size for a Python project I was working on. It looked like a power-law distribution. I had dozens of one line methods, fewer two and three line methods, and the curve bottomed out at a couple of 20 line methods. As I looked at these 20 line monstrosities (indulge me), it did seem that each of them were long for a reason. Sure, I could have broken them down but in each case it would've been worse. In some cases, I just couldn't name the sub-parts well enough. In other cases, the logic was just clearer in one piece. But, regardless, I knew that if I did break up each of these methods I wouldn't have a distribution that looked substantially different. It was as if was going to whack a rock repeatedly with a hammer. I'd still end up with a couple of big chunks outnumbered by smaller chunks.

Statistics has always fascinated and frustrated me. When I was in school it was never enough for me to know what distribution applied in a problem, I always wanted to know why. What physical process was at work? It's something that the textbooks never seem to address. And, I've been having the same frustration with the method size. Fortunately, there is some speculation about the processes that lead to power law distributions.

Power law distributions seem to arise in processes where there is preferential attachment. The "hub and spoke" routing system that airlines use is a good example. If you are an airline running flights from "backwoods country airport", you are usually better off if you route flights through a larger airport rather than direct to a neighboring little airport. The flights you schedule to the larger airport can carry people who want to travel locally and people who want to travel further along in the network. You get economies of scale. If you graph a frequency histogram of airports by number of routes, you'll get the familiar curve.

It turns out that there has been quite a bit of power-law research recently. People have noticed that power law distributions or distributions that look like power law are all around us, in biological systems, social systems etc. A while back, Gareth Baker, James Noble and a few associates published a paper (PDF) on a project of theirs called the 'Shape of Java'. They'd assembled a large corpus of closed and open source Java code and ran an incredible number of measurements upon it, looking at distributions for method size, class size, number of implemented interfaces, etc. One of the observations that they made was that program qualities that programmers tend to be aware of tend to not quite fit the power law mold. There's always some limit that programmers won't go past. I can confirm, at least, that my years of looking at legacy code, I've never seen a one hundred thousand line method, for instance. But, that long tail effect does seem to be there. Why?

One theory that I have is that there is a preferential attachment process that we adhere to, whether we are conscious of it or not, when we are coding. When we add a method we have to call it from some other method. Do we commonly pick bigger methods to call from or smaller ones? On balance, I think we choose the bigger ones because they are closer to our original intention, our goal when we are programming. If we see work that has to happen before or after some other work, it is easy to go to an existing method and add the call there. Sure, we refactor and break our bigger methods down, but that doesn't seem to change the shape of the distribution, it may just jiggle the coefficients. It's like hitting that rock with a hammer again.

I still hate long methods, but I've given up hope of getting rid of all of them. It seems that software adheres to some natural laws. A good code base consists of a large number of small methods and dwindling number of larger ones. And, if that's the case, we shouldn't strive for the absence of long methods, we should strive to make them rare.

Maybe those last few long methods in each code base can serve a purpose - they can remind us of what we'd like to avoid.