Where leading programmers explain how they find
unusual and carefully designed solutions

Recent Posts

Michael Feathers

In my last blog I waxed poetic about C. It was heart-felt, but it was also a little bit embarrassing. I do love the language but it does have its share of trouble.

The problem that I run into most often in C is testability, particularly unit-testability. In theory, there should be no problem. It's easy to test simple functions in C. You just call them from a test harness and away you go. The "not so simple functions" are the ones that cause trouble.

In a procedural language, there are at least two types of functions: leaves and branches. Leaves sort of "bottom out"; they don't call anything else other than library functions, and they are relatively easy to test as long as they don't have side effects outside the system. Branches are tougher: they call leaves and other branches. If you test a few branches up from the leaves, you are not really unit testing any more - you're component testing. And, if those leaf functions touch a database or argue across a socket, you're probably system testing through your unit tests, whether you want to be or not.

So, how can we handle this? How can we easily unit test a branch function without going all the way down the call graph?

The nice thing about C is that it presents you with options. You can play clever tricks with the processor and #define various lower level calls, but it's ugly.

Another option that comes up is the link seam. If you have functions that you want to replace you can place them in another library. When you are testing, you can link to a library of stubs. When you are building your production code, you link in the real library. This approach can work, but it leads to a lot of complication. You can only stub functions that you place in a different compilation unit and your makefiles become a mess.

Fortunately, C has a third way of breaking dependencies: function pointers. Let's look at a radically simplified example.

    void process_message(struct message *in)
    {
        struct message out;
        encode(in->header, strlen(in->header), out.header);
        encode(in->payload, strlen(in->payload), out.payload);

        send_message(&out);
    }

Can we stub out the send_message function?

How about this? We can introduce a stuct that holds a pointer to the send_message function and an instance of that struct type:

    typedef struct {
        void (*send_message)(struct message *in);
    } message_port_type;

    extern message_port_type message_port;

Then, we can change our call to this:

    void process_message(struct message *in)
    {
        struct message out;
        encode(out.header, in->header, strlen(in->header));
        encode(out.payload, in->payload, strlen(in->payload));

        message_port.send_message(&out);
    }

All we have to do now is make sure that we initialize the function pointers at startup. If we need to mock those functions, we can set the function pointers to the appropriate mocks prior to each test case.

It looks suspiciously like C++, doesn't it?

I often show this style to C teams, and most of the time, they regard it with suspicion initially. Some (but not all) C programmers regard OO as hype; they think that it muddies the design. But, the testing problem is real, and this technique can help. It's a thin layer of objectness that makes unit testing manageable.

Michael Feathers

I have something to admit. I love C. I've known it for a while, but it's easy to forget. I've gone away to Ruby, Haskell, OCaml, C++, Java, and C# - I spend time with many languages, but when I go back to C, it's like coming home.

I remember flipping through the Kernighan and Ritchie book decades ago, trying to pick up the language. I remember a lot of frustration, but I also remember a lot of satisfaction. C has its quirks, but in retrospect, they are a lot less mysterious than the quirks of many other languages. They don't require deep reasoning. The behavior of a construct is either defined or it isn't, and since undefined behavior has such a high cost, people are careful. They have to be.

A friend of mine, Kevlin Henney, talks quite a bit about affordances in software. An affordance is, essentially, the envelope of perceived action possibilities that a thing offers. People often think about this in terms of "what the software allows you to do" but I think there is a more expansive way of looking at this that explains C. C is prickly, so its affordance is care. If you don't approach your C code with care, you're dead in the water. C's conceptual simplicity and its prickliness are a powerful combination.

Another nice thing about C, from my perspective, is recovery. I don't like repeating this over and over, but I think it gives a bit of perspective to say that I spend most of my time visiting extremely ugly code bases and helping people get past issues in them, and of all of the languages that I frequently encounter code in, the one which is hardest to recover in is C++. There are quite a few reasons, and I won't go into them here but, really, at certain point in a degenerated C++ code base, the language is actively fighting you. I can't count the number of dependency nightmares I've seen - systems where any attempt at a build takes any insane amount of time - where the cure is at least as painful as the path to disease. C, on the other hand, seems to be more recoverable. Or, maybe I've just been lucky.

The problem with C++ (and it's not unique to C++ by any means) is that it tried hard to make things better within an existing framework rather than jumping into a new one. It tried to tune toward better and often that is a disaster. Yet we keep trying. We try to mutate languages like Java and C# into things that they aren't - and, without knowledge or respect for what is there, we fail.

There's something deep in software development that not everyone gets but the people at Bell Labs certainly did. It's an undercurrent of "the New Jersey Style", "Worse is Better", and "the Unix philosophy" - and it's not just a feature of Bell Labs software either. You see it in the original Ethernet specification where packet collision was considered normal.. and the same sort of idea is in the internet protocol. It's deep awareness of design ramification - a willingness to live with a little less to avoid the bigger mess, and a willingness to see elegance in the real rather than the vision.

I was reading a magazine article a few weeks ago whose author implied that the designers of the internet had just stumbled across an approach that embodied sound systems design principles. It really bothered me. I wanted to throw the article across the room. No, those guys knew. They had deep systems thinking skills and they acquired them the hard way. That sort of design only happens when you reflect deeply on a problem and figure out how to work with its grain. C is a language that works with a grain.

I've never met Dennis Ritchie, Doug McIlroy, Ken Thompson, or any of the other pioneers but my impression from afar is that they didn't believe in the perfectibility of man or software - they were beyond that trap. They saw limitations and they worked with them - they didn't over extend.

The fact that C embodies that style, is one of the things that makes it special to me. It's a style that deserves more recognition and emulation.

Michael Feathers

I hate long methods. Not just in the academic "Oh, I've read Refactoring and long methods are kind of ugly" sense, but rather in a deep visceral sense. I see more than my share of them when I help people with legacy code issues. It's like stepping into a new yet familiar swamp every single time.

Scroll, scroll, scroll. "Oh I'm in that method." Scroll, scroll, scroll. "Oh, that's happening in this method too?"

You get tired of it. You start to search desperately for answers.

A friend of mine, Alan Francis, had an interesting idea a while back. He thought it would be nice to have some language enforcement for method size. Imagine a language which only allowed methods with, say, three statements? Wouldn't it all be better then? Sadly, I don't think it would. People would resort to something I once saw at a company I visited. The edict came down: methods could only be X lines long. The results that policy produced were predictable. The code was littered with artificial sub-methods. Imagine a method named processTransactions() calling methods named processTransactions1(), processTransactions2(), and processTransactions3(). It was that bad.

It's easy to imagine that this is just ignorance; that in right and capable hands all methods would be well named and less than ten lines long, but when was the last time you saw a code base like that? Does it ever happen?

A few years ago, I read an interesting post on a mailing list. Someone mentioned that method size in projects is a long tail distribution.

Before reading that, I hadn't given it much thought. I'd assumed that, in a typical code base, the vast majority of methods might be about 20 lines or so, with an equal number of smaller and larger methods. The normal distribution holds in so many other areas, I'd just assumed that it held for method size. But, it turns out that it doesn't.

I saw this for myself when I produced a frequency histogram of method size for a Python project I was working on. It looked like a power-law distribution. I had dozens of one line methods, fewer two and three line methods, and the curve bottomed out at a couple of 20 line methods. As I looked at these 20 line monstrosities (indulge me), it did seem that each of them were long for a reason. Sure, I could have broken them down but in each case it would've been worse. In some cases, I just couldn't name the sub-parts well enough. In other cases, the logic was just clearer in one piece. But, regardless, I knew that if I did break up each of these methods I wouldn't have a distribution that looked substantially different. It was as if was going to whack a rock repeatedly with a hammer. I'd still end up with a couple of big chunks outnumbered by smaller chunks.

Statistics has always fascinated and frustrated me. When I was in school it was never enough for me to know what distribution applied in a problem, I always wanted to know why. What physical process was at work? It's something that the textbooks never seem to address. And, I've been having the same frustration with the method size. Fortunately, there is some speculation about the processes that lead to power law distributions.

Power law distributions seem to arise in processes where there is preferential attachment. The "hub and spoke" routing system that airlines use is a good example. If you are an airline running flights from "backwoods country airport", you are usually better off if you route flights through a larger airport rather than direct to a neighboring little airport. The flights you schedule to the larger airport can carry people who want to travel locally and people who want to travel further along in the network. You get economies of scale. If you graph a frequency histogram of airports by number of routes, you'll get the familiar curve.

It turns out that there has been quite a bit of power-law research recently. People have noticed that power law distributions or distributions that look like power law are all around us, in biological systems, social systems etc. A while back, Gareth Baker, James Noble and a few associates published a paper (PDF) on a project of theirs called the 'Shape of Java'. They'd assembled a large corpus of closed and open source Java code and ran an incredible number of measurements upon it, looking at distributions for method size, class size, number of implemented interfaces, etc. One of the observations that they made was that program qualities that programmers tend to be aware of tend to not quite fit the power law mold. There's always some limit that programmers won't go past. I can confirm, at least, that my years of looking at legacy code, I've never seen a one hundred thousand line method, for instance. But, that long tail effect does seem to be there. Why?

One theory that I have is that there is a preferential attachment process that we adhere to, whether we are conscious of it or not, when we are coding. When we add a method we have to call it from some other method. Do we commonly pick bigger methods to call from or smaller ones? On balance, I think we choose the bigger ones because they are closer to our original intention, our goal when we are programming. If we see work that has to happen before or after some other work, it is easy to go to an existing method and add the call there. Sure, we refactor and break our bigger methods down, but that doesn't seem to change the shape of the distribution, it may just jiggle the coefficients. It's like hitting that rock with a hammer again.

I still hate long methods, but I've given up hope of getting rid of all of them. It seems that software adheres to some natural laws. A good code base consists of a large number of small methods and dwindling number of larger ones. And, if that's the case, we shouldn't strive for the absence of long methods, we should strive to make them rare.

Maybe those last few long methods in each code base can serve a purpose - they can remind us of what we'd like to avoid.

Michael Feathers

As I mentioned earlier, I've been doing some programming in OCaml and Haskell, a couple of very operator-rich languages, but they have a cousin that has been getting a lot of attention recently: Scala.

Scala is OO/Functional hybrid language designed by Martin Odersky and his group at École Polytechnique Fédérale de Lausanne (EPFL). If I had to describe it in a sentence or two (and that's no way to do it justice), I'd say Scala is an attempt to help mainstream programming catchup with modern type theory. It tries to make programming safer and (the feeling is) more scalable, by introducing local type inference, a full fledged generics system, and variety of functional programming features. And.. following OCaml and Haskell's example, it lets you define your own operators.

Is that good?

I think it is. But, there is a problem.

People confuse operator definition with operator overloading, and it appears that we're going to have to listen to quite a few arguments based upon experience with operator overloading in C++ - arguments that just may not apply to languages like OCaml, Haskell, and Scala. It's one thing to be shocked by an unintuitive overload of plus ('+') and yet another to have to look up an operator like '+<' because you haven't seen it before. I'm not saying that operator definition doesn't have its own set of problems. I think it does, but the fact is, many of those problems are different.

But, that aside, people are going to be defining more operators. It's inevitable. The thing that I'm wondering now, is what best practice will look like in a few years.

The other day, Bill Venners posted a blog over on Artima announcing ScalaTest, his unit testing framework for Scala. I love testing frameworks, so despite have too much else to do, I downloaded it and gave it a peek.

One of the interesting things that Bill did was define a new operator for test comparison. The operator is '==='. Here is an example which depicts its semantics:

    assert(7 === 3 + 4)

In this code, the operator does a comparison of two values and emits an informative error message if the comparison fails. The expected value is on the left side of the operator and the actual is on the right (there were some rather deep discussions about whether the order should be expected followed by actual or vice versa, but I think they were settled).

It seems like an interesting use of operator definition, doesn't it? Well, I didn't like it. It felt wrong, but I didn't know why. It took a bit of reflection to figure it out.

The thing that I didn't like about the '=== ' operator was its appearance. It seemed a like a lie to me. How? Well, for one thing it's lexically symmetric, however, its semantics are not symmetric. If you switch the order of the operands, the message that you get will have the expected and actual terms switched.

There are ways around this. One is to be order agnostic and emit an error message that just tells you that two values didn't match, but if we want to emit an expected/actual style message, the operator just looks off to me. Somehow, in my gut, lexical symmetry implies semantic symmetry.

Is my gut right? It's true that C/Java style equality ('==') symmetric, but many single character operators ('-', '/') aren't. It does seem, though, that lexically symmetric multi-character operators tend to either be either semantically symmetric, or something that tends to convey the semantics of concatenation. Operators like Haskell's '++' fall into this category. The musical composition operators I described from Paul Hudak's DSL do also.

Regardless, I was a bit thrown by the operator. Part of it might be because I've seen an asymmetric operator in HUnit which does the same thing ('@=?'), expected is on the left and actual is on the right, and the asymmetry is clear. Somehow, the '===' operator violated one of my internal least surprise conditions.

Unfortunately, this sort of thing happens to me often. Something looks wrong to me and then when I reflect, I realize what it is, and it's usually because it's inconsistent with something else that many people aren't looking at.

So the question becomes: if people aren't, for instance, looking at the lexical symmetry of an operator as an indicator of semantic symmetry, are they any worse off for it? How much should we pay attention to these little nuances?

The thing I do know, is that I do notice it when people are aware of these little things, and their code reads very well to me. It's like they're speaking my language.

Michael Feathers

In my last blog, I hypothesized that symbolic notations were a better fit for structural programming.

What is structural programming? It's a term which describes a concept that I haven't seen named before. A structural program (or program snippet) is a program that is essentially a data structure on a page. The structure both dominates and conveys the semantics. Code like this is particularly amenable to a structural approach:

	private TreeNode createNodes() {
            DefaultMutableTreeNode root;
            DefaultMutableTreeNode grandparent;
            DefaultMutableTreeNode parent;

            root = new DefaultMutableTreeNode("San Francisco");

            grandparent = new DefaultMutableTreeNode("Potrero Hill");
            root.add(grandparent);

            parent = new DefaultMutableTreeNode("Restaurants");
            grandparent.add(parent);
            
	    return root;
        }

All that is happening in it is node creation and node addition. If we had an operator which meant add node we could probably write it in a briefer way without much loss of clarity. For instance, I could imagine the code looking like this:

        TreeNode createNodes() {
            return node("San Francisco") 
                          +< node("Potrero Hill") 
                              +< node("Restaurants");
        }

It would read a bit better to me than:

        TreeNode createNodes() {
            return node("San Francisco").addNode( 
                           node("Potrero Hill").addNode( 
                               node("Restaurants")));
        }

Let's push this idea further in a domain that most of us are familiar with: unit testing.

Many of the xUnit testing frameworks self-register tests. For instance, if you have a JUnit test like this:

    public class PatternTest extends Assert {

        @Test
        public void emptyPattern() {
            assertEquals(0, new Pattern().getSize());
        }

        @Test
        public void singleElementPattern() {
            assertEquals(1, new Pattern("1").getSize());
        }
    }

the framework will find that class and all of the test case classes in a package, register them as test suites, and then use reflection to register all of the tests within each test case class.

In some other languages, we aren't as lucky. In C++, for instance, we either have to put the test case functions into a collection ourselves or use some template/macro magic.

The OCaml programming language is another language without reflection, and its xUnit (OUnit) handles the problem a bit differently. It lets users group test cases and test suites in lists, and it defines three operators that can be used to label them:

    val (>:) : string -> test -> test
    val (>::) : string -> (unit -> unit) -> test
    val (>:::) : string -> test list -> test
We can use them like this:
    "test1" >: TestCase((fun _ -> ()))
    "test2" >:: (fun _ -> ())

    "test-suite" >::: ["test2" >:: (fun _ -> ());]

It might not seem like much, but consider that the last line would actually look like this if we didn't have the operators:

    TestLabel("test-suite", TestSuite(
            [TestLabel("test2", TestCase((fun _ -> ())))]))

There are some regrettable things about the operator syntax. One is that we can't use a single operator for all of the different test types, but I'd imagine that the operators would be very convenient if we decided to create a large list of tests in a file. We could use indentation and, essentially, make them read like a data structure.

Looking back on this blog, I'm wondering if any readers will think that I'm getting operator happy. I don't think I am. There are many ways to abuse operator overloading, but it seems that most of them involve the use of an operator to hide tricky deep semantics. The semantics of something like, say, add or label are not deep. When those operations are used, they are used in bulk, and structurally, with the layout of the code conveying much of the semantics.

Michael Feathers

My last blog entry was on my mind quite a bit last night. I kept coming back to the same questions - when are narrative languages appropriate? And when are they overkill?

I started to think a bit about some of great APIs I've seen that live in a completely different world. One is an embedded DSL for music composition from Paul Hudak's book The Haskell School of Expression. Here's a little example from the book:

    funkGroove
        = let p1 = perc LowTom qn
              p2 = perc AcousticSnare en
          in Tempo 3 (Instr Percussion (cut 8 (repeatM
                  ((p1 :+: qnr :+: p2 :+: qnr :+: p2 :+:
                    p1 :+: p1 :+: qnr :+: p2 :+: enr)
                   :=: roll en (perc ClosedHiHat 2))
             )))

It is a little scary at first glance, but it becomes clearer once you know that en is eighth note, qnr is quarter note rest, and the symbols :+:and :=:are sequential and parallel composition; i.e., play after another phrase and play at the same time as another phrase, respectively.

The thing that is very cool about this DSL is that you can easily format your code to see phrases above or below phrases that will play at the same time.

The variable names p1 and p2aren't very informative, but I do think that they aren't quite as shocking as they would be in a conventional program. This DSL is completely about structure. It's about building up a piece of music and being able to tell, at a glance, whether it is what you want it to be. The abbreviations for notes and rests actually look like they work well within it. It pains me to say that, because I usually hate abbreviations with passion.

No, it's hard for me to imagine a narrative, fluent, or English-like API for this problem that I would like better than the symbolic approach. I guess it has something to do with the domain. People don't talk about how one note follows another as much as they listen for notes or see them on a page. For people who read music, it is a very spatial domain.

But, maybe it's more than that. Maybe the reason why the symbolic approach works so well here is because these programs are about structure. The order and arrangement of the notes has a very direct meaning. Other domains like time and money, are less structural and more semantic, so it seems that they might be more suited to the narrative approach.

Perhaps the choice between narrative and symbolic is more than just style?

Michael Feathers

It's hard to pick out trends in the industry because most things that we would call trends are really swings of a pendulum. This was brought home to me a few months ago when I was sitting across a table from someone who told me that the problem with IT is that we just don't have enough meta-data. If we had meta-data for everything, all of our problems would go away. I tried to tell him that this had been attempted number of times in the software industry and that it was quixotic - sort of like programming in pictures. But memes rise and fall. We each have to learn the hard way that the light you see from a distance isn't always as illuminating as we expect once we get to it.

Recently, another meme has been on the rise. The meme I'm talking about is the notion that code becomes better as it approaches English. I'm seeing this in a couple of different places now. One is the Domain Specific Languages community. While it's true that DSLs don't have to become English-like, there are a quite a few people who try.

The Behavior Driven Development (BDD) community also tends toward English-esque readability. Here is an example in rspec from Luke Redpath's blog:

  specify "should be invalid without a password" do
    @user.email = 'joe@bloggs.com'
    @user.username = 'joebloggs'
    @user.should_not_be_valid
    @user.password = 'abcdefg'
    @user.should_be_valid
  end

It's fine, it reads like English. But, is that good?

I think that, fundamentally, we have at least two different ways of understanding text. One is narrative and the other is symbolic. In the narrative mode, we are approach text like prose and read through it the way we would read through English or any other natural language - it tickles the verbal centers of our brain. The symbolic mode is more visual. It helps us understand code like this even if we would fumble when giving a verbal explanation:

  primes = sieve [ 2.. ]
           where
           sieve (p:x) = p : sieve [ n | n <- x; n mod p > 0 ]

We have to decode it, but the decoding is often swift and more precise than what we are used to with natural language.

I like to see narrative approaches in tools that can be user facing, like rspec's Story Runner, and FIT's DoFixture, but in code written by and for developers, I have mixed feelings. This code:

  5.days.fromTomorrow.at(2.am)

reads very well, and it's hard to imagine a good symbolic alternative, but there are times when a symbolic approach can make code comprehension a breeze. The C++ iostream library is a good example. With the use of a couple of operators, much of the arcana of I/O is stripped away.

Ultimately, I think that some domains are more suited to the narrative approach than others but the criteria are tricky. On the one hand, if you are working in a domain like dates or scientific units, with settled definitions and nomenclature, it can pay to use a natural language style in your API. When definitions are settled, there is less of a chance that the machinery of a DSL will have to be refactored as much over time. On the other hand, once definitions are settled and well understood, you can eliminate a lot of cruft by moving toward a symbolic approach. The price, of course, is a steeper learning curve. It seems that there is a sweet point between those two poles and I have the sense that it might be narrower than we expect.

The narrative meme is rising. Many people are trying to write English in Java, C# and Ruby. I just hope that we learn where it is applicable and where it isn't. Natural language should not be the only direction we explore when we look for expressivity. Hopefully, after a few years, we'll walk away with a stronger sense of where natural language helps and hinders.

Michael Feathers

Sometimes you find an elegant solution for a problem that doesn't exist. So, what do you do then? In my case, I just forgot about it for a while.

About five or six years ago, I was invited to a testing summit. There were plenty of testing gurus in attendance: Cem Kaner, Brian Marick, Bret Pettichord, and James Bach were all there. I was there along with a few other developers to talk about TDD, refactoring and some of the more agile development techniques. We all sat together and worked on tests and testing approaches for an application that Pragmatic Dave Thomas had written. It was a blog publication system written in Ruby.

We sat, coded, and discussed things into the wee hours of the night. And, like, most geeks, we didn't really go outside much. When we had breaks, we huddled around our computers and read the web to see if the world had changed outside.

During one of these breaks, I started playing with an idea that was inspired by the method_missing method in Ruby.

What if we had a class like this:

class Pebble
  def method_missing(meth, *args)
    puts meth
    return self
  end
end

We could find some arbitrary method in an application that we don't understand, and just drop a Pebble into it.

  controller = BasicController.new
  controller.accept_renderer(Pebble.new)

Then we'd see a trace of all of the calls to that object, and we'd see traces to calls to all of the returned objects as well. It would be kind of like dropping a pebble into a cave. You drop it, and listen to it on the way down. As you listen you can tell how deep the cave is, and get a sense of its layout. Hokey? Yes, but give us a break. It was late.

We sat and played with this idea a bit. I forget who the other developers were but Jeremy Stell-Smith was there. The code, unfortunately, is long gone. And, it's a shame. We ran into some interesting cases when the pebbles bottomed out at operator calls and primitives in Ruby.

We had pebbles generate other pebbles, and we built up a list of calls with appropriate indentation. And, although we didn't have time to code it, we thought about bringing up a GUI with the source and allowing people to type in the return values of various calls as a pebble dropped.

Unfortunately, we didn't see much practical use to the idea but it still seemed kind of neat. It's just a shame that it was a solution to a problem that didn't exist.

Michael G Schwern

When Andy first asked me to write on "Beautiful Code" just two lines of Perl immediately sprang to mind.

    use LWP::Simple;
    my $page = get("http://www.google.com");

I see beauty in this code at many layers. Let's start with the first, the beauty of its simple interface.

Whether or not you know Perl, or even know how to program, you can figure out what that code does and that is beautiful. Why is that code so readily obvious to anyone who's used a web browser? Because the Gulf of Execution is so narrow.

> continue reading
Michael Feathers

What is the correct way to implement object equality? There is no dearth of advice on this subject. If you google it, you’ll find a number of blogs and articles on the topic.

I remember that in Java, in the earliest days, the advice was to implement object equality like this:

public class Contract
{
     private String name;
     private String identifier;
     ...

     public boolean equals(Object other) {
         if (!(other instanceof Contract))
             return false;
         Contract otherContract = (Contract)other;
         return name.equals(otherContract.name)
             && identifier.equals(otherContract.identifier);
    }
}

But then someone discovered that, in the presence of subclassing, that implementation isn’t symmetric, so we eventually ended up with this as good practice:

public class Contract
{
     private String name;
     private String identifier;
     ...

     public boolean equals(Object other) {
         if (other == null || other.getClass() != getClass())
             return false;
         Contract otherContract = (Contract)other;
         return name.equals(otherContract.name)
             && identifier.equals(otherContract.identifier);
    }
}

Which is fine, but of course, we have to remember to write hashCode function that is consistent with equals, just on the odd chance that we put our objects into a HashMap. But will we? And, what do we need equals for anyway?

It’s a valid question.

Often, people implement equals on speculation. They feel that it will be useful at some point, but the times when equals is useful are surprisingly rare. In fact, I implement it only when I need an object to play well with a preexisting framework or container class.

The fact is, equals has a problem, and it’s a rather large one. The problem with equals is that there may be no one definition of equality for a particular class. Consider our Contract class. In one context, we may consider contracts equal if their names match. In another context, we may want to match names and identifiers. Equality is a mathematical concept, not an object concept. It’s contextual.

public class Contract
{
     private String name;
     private String identifier;
     ...

     public boolean matchesByName(Contract other) {
         return name.equals(other.name);
     }
}

So, what is the correct implementation of equals? Sometimes it’s no implementation at all.