In Design Sense, Michael Feathers ponders how to teach what is and is not a good design. He mentions an exercise to help think like an artist; flip a picture upside-down. This eliminates the bias of recognition; it's not a face or a flower, it's just a series of lines and shapes. Is there anything similar for coding? Where are our biases?
Programmers tend to be overly detail oriented. We get hung up on what the code does and how it does it. I've observed that struggling programmers focus on (and argue about) syntax details. Is it "if ( condition )" or "if( condition )" or "if (condition)"? Do I use camelCase or underscore_delimited? Should I use a regex or substr()? Can I tweak this line to make it run faster?
Experienced programmers look at structure. How deeply nested is the code? How well encapsulated? Long subroutines with lots of nesting raise red flags and are immediate targets for extract method refactoring. Compound statements are looked upon with skepticism; would a temp variable make it simpler?
How do you think structurally? As with the artist-to-be turning their picture upside-down to eliminate the distraction of recognition, programmers must eliminate their bias towards syntax details. A simple way to do this is to zoom out, literally. Make your font so small the syntax details are no longer visible, only the indentation. There's nothing left to consider but structure.
Let's take a look at an example from ExtUtils::MakeMaker, a very old and very complicated Perl module. Look at one of the most complicated sub-modules, ExtUtils::MM_Unix. Colored syntax highlighting helps, so you might want to have a look in your favorite editor. Lower the font size until the letters are almost indistinguishable; 7 point Monaco works well for me.

Structural quirks stand out immediately. c_o() contains an odd, repeating indentation structure with duplication.
push @m, '
.c.s:
$(CCCMD) -S $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.c
';
push @m, '
.c$(OBJ_EXT):
$(CCCMD) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.c
';
push @m, '
.C$(OBJ_EXT):
$(CCCMD) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.C
' if !$Is_OS2 and !$Is_Win32 and !$Is_Dos; #Case-specific
push @m, '
.cpp$(OBJ_EXT):
$(CCCMD) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.cpp
.cxx$(OBJ_EXT):
$(CCCMD) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.cxx
.cc$(OBJ_EXT):
$(CCCMD) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.cc
';
join "", @m;
First thing is to clear up all the clutter caused by the duplication and put in some vertical whitespace.
my $command = '$(CCCMD)';
my $flags = '$(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE)';
push @m, qq{
.c.s:
$command -S $flags $*.c
};
push @m, qq{
.c$(OBJ_EXT):
$command $flags $*.c
};
push @m, qq{
.C$(OBJ_EXT):
$command $flags $*.C
} if !$Is_OS2 and !$Is_Win32 and !$Is_Dos; #Case-specific
push @m, qq{
.cpp$(OBJ_EXT):
$command $flags $*.cpp
.cxx$(OBJ_EXT):
$command $flags $*.cxx
.cc$(OBJ_EXT):
$command $flags $*.cc
};
return join "", @m;
This turns out to be a series of pushes onto an array with a special case in the middle. A little restructuring moves the special case to the end and collapses two pushes into one.
my $command = '$(CCCMD)';
my $flags = '$(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE)';
push @m, qq{
.c.s:
$command -S $flags \$*.c
.c$(OBJ_EXT):
$command $flags \$*.c
.cpp$(OBJ_EXT):
$command $flags \$*.cpp
.cxx$(OBJ_EXT):
$command $flags \$*.cxx
.cc$(OBJ_EXT):
$command $flags \$*.cc
};
push @m, qq{
.C$(OBJ_EXT):
$command \$*.C
} if !$Is_OS2 and !$Is_Win32 and !$Is_Dos; #Case-specific
return join "", @m;
Now deeper issues can be considered; does that special case at the end belong in an OS-specific sub-class? Peer closer and you notice a discrepancy between what the condition is doing (checking the presence of various operating systems) and what the comment says (looking for a case-sensitive filesystem). Hmmm!
cflags() spans more than a page and is an obvious candidate for breaking up. The condition commented "Expand hints for this extension via the shell" begs for extraction.
my($name);
( $name = $self->{NAME} . "_cflags" ) =~ s/:/_/g ;
if ($prog = $Config{$name}) {
# Expand hints for this extension via the shell
print STDOUT "Processing $name hint:\n" if $Verbose;
my(@o)=`cc=\"$cflags{cc}\"
ccflags=\"$cflags{ccflags}\"
optimize=\"$cflags{optimize}\"
perltype=\"$cflags{perltype}\"
optdebug=\"$cflags{optdebug}\"
eval '$prog'
echo cc=\$cc
echo ccflags=\$ccflags
echo optimize=\$optimize
echo perltype=\$perltype
echo optdebug=\$optdebug
`;
my($line);
foreach $line (@o){
chomp $line;
if ($line =~ /(.*?)=\s*(.*)\s*$/){
$cflags{$1} = $2;
print STDOUT " $1 = $2\n" if $Verbose;
} else {
print STDOUT "Unrecognised result from hint: '$line'\n";
}
}
}
The long, embedded lists in constants() also stand out.
for my $macro (qw(
AR_STATIC_ARGS DIRFILESEP DFSEP
NAME NAME_SYM
VERSION VERSION_MACRO VERSION_SYM DEFINE_VERSION
XS_VERSION XS_VERSION_MACRO XS_DEFINE_VERSION
INST_ARCHLIB INST_SCRIPT INST_BIN INST_LIB
INST_MAN1DIR INST_MAN3DIR
MAN1EXT MAN3EXT
INSTALLDIRS INSTALL_BASE DESTDIR PREFIX
PERLPREFIX SITEPREFIX VENDORPREFIX
),
(map { ("INSTALL".$_,
"DESTINSTALL".$_)
} $self->installvars),
qw(
PERL_LIB
PERL_ARCHLIB
LIBPERL_A MYEXTLIB
FIRST_MAKEFILE MAKEFILE_OLD MAKE_APERL_FILE
PERLMAINCC PERL_SRC PERL_INC
PERL FULLPERL ABSPERL
PERLRUN FULLPERLRUN ABSPERLRUN
PERLRUNINST FULLPERLRUNINST ABSPERLRUNINST
PERL_CORE
PERM_RW PERM_RWX
) )
{
next unless defined $self->{$macro};
# pathnames can have sharp signs in them; escape them so
# make doesn't think it is a comment-start character.
$self->{$macro} =~ s/#/\\#/g;
push @m, "$macro = $self->{$macro}\n";
}
They suggest they would be better off in their own methods. The change improves not only readability, but also flexibility, because a subclass can now override with their own lists.
...
for my $macro ( $self->list_of_constants )
{
next unless defined $self->{$macro};
# pathnames can have sharp signs in them; escape them so
# make doesn't think it is a comment-start character.
$self->{$macro} =~ s/#/\\#/g;
push @m, "$macro = $self->{$macro}\n";
}
...
sub list_of_constants {
my $self = shift;
return qw(
AR_STATIC_ARGS DIRFILESEP DFSEP
NAME NAME_SYM
VERSION VERSION_MACRO VERSION_SYM DEFINE_VERSION
XS_VERSION XS_VERSION_MACRO XS_DEFINE_VERSION
INST_ARCHLIB INST_SCRIPT INST_BIN INST_LIB
INST_MAN1DIR INST_MAN3DIR
MAN1EXT MAN3EXT
INSTALLDIRS INSTALL_BASE DESTDIR PREFIX
PERLPREFIX SITEPREFIX VENDORPREFIX
),
(map { ("INSTALL".$_,
"DESTINSTALL".$_)
} $self->installvars),
qw(
PERL_LIB
PERL_ARCHLIB
LIBPERL_A MYEXTLIB
FIRST_MAKEFILE MAKEFILE_OLD MAKE_APERL_FILE
PERLMAINCC PERL_SRC PERL_INC
PERL FULLPERL ABSPERL
PERLRUN FULLPERLRUN ABSPERLRUN
PERLRUNINST FULLPERLRUNINST ABSPERLRUNINST
PERL_CORE
PERM_RW PERM_RWX
);
}
The "in and out and in and out..." indentation of init_dirscan() is caused by a long chain of if/elsif/else conditions that could be simplified.
if (-d $name){
next if -l $name; # We do not support symlinks at all
next if $self->{NORECURS};
$dir{$name} = $name if (-f $self->catfile($name,"Makefile.PL"));
} elsif ($name =~ /\.xs\z/){
my($c); ($c = $name) =~ s/\.xs\z/.c/;
$xs{$name} = $c;
$c{$c} = 1;
} elsif ($name =~ /\.c(pp|xx|c)?\z/i){ # .c .C .cpp .cxx .cc
$c{$name} = 1
unless $name =~ m/perlmain\.c/; # See MAP_TARGET
} elsif ($name =~ /\.h\z/i){
$h{$name} = 1;
} elsif ($name =~ /\.PL\z/) {
($pl_files{$name} = $name) =~ s/\.PL\z// ;
} elsif (($Is_VMS || $Is_Dos) && $name =~ /[._]pl$/i) {
# case-insensitive filesystem, one dot per name, so foo.h.PL
# under Unix appears as foo.h_pl under VMS or fooh.pl on Dos
local($/); open(PL,$name); my $txt = ; close PL;
if ($txt =~ /Extracting \S+ \(with variable substitutions/) {
($pl_files{$name} = $name) =~ s/[._]pl\z//i ;
}
else {
$pm{$name} = $self->catfile($self->{INST_LIBDIR},$name);
}
} elsif ($name =~ /\.(p[ml]|pod)\z/){
$pm{$name} = $self->catfile($self->{INST_LIBDIR},$name);
}
Since the code uses an inline documentation style, with each method's documentation appearing just before the method itself, it's easy to identify undocumented and under-documented methods. Color makes the large comment blocks in dynamic_lib() stand out.
my $libs = '$(LDLOADLIBS)';
if (($Is_NetBSD || $Is_Interix) && $Config{'useshrplib'}) {
# Use nothing on static perl platforms, and to the flags needed
# to link against the shared libperl library on shared perl
# platforms. We peek at lddlflags to see if we need -Wl,-R
# or -R to add paths to the run-time library search path.
if ($Config{'lddlflags'} =~ /-Wl,-R/) {
$libs .= ' -L$(PERL_INC) -Wl,-R$(INSTALLARCHLIB)/CORE -Wl,-R$(PERL_ARCHLIB)/CORE -lperl';
} elsif ($Config{'lddlflags'} =~ /-R/) {
$libs .= ' -L$(PERL_INC) -R$(INSTALLARCHLIB)/CORE -R$(PERL_ARCHLIB)/CORE -lperl';
}
}
Any code with a story attached is worth some investigation. Long comments mean the code doesn't explain itself. Sometimes this means bad code, sometimes this means code to handle an odd situation, but it always means complexity. That means get it out into its own subroutine where it can be better tested, documented and not interrupt the narrative of the main code flow. A simple extract method takes care of that.
my $libs = '$(LDLOADLIBS)';
$libs .= $self->_special_shared_lib_flags;
...
# Use nothing on static perl platforms, and to the flags needed
# to link against the shared libperl library on shared perl
# platforms. We peek at lddlflags to see if we need -Wl,-R
# or -R to add paths to the run-time library search path.
sub _special_shared_lib_flags {
my $self = shift;
my $flags = '';
if (($Is_NetBSD || $Is_Interix) && $Config{'useshrplib'}) {
if ($Config{'lddlflags'} =~ /-Wl,-R/) {
$flags = ' -L$(PERL_INC) -Wl,-R$(INSTALLARCHLIB)/CORE -Wl,-R$(PERL_ARCHLIB)/CORE -lperl';
} elsif ($Config{'lddlflags'} =~ /-R/) {
$flags = ' -L$(PERL_INC) -R$(INSTALLARCHLIB)/CORE -R$(PERL_ARCHLIB)/CORE -lperl';
}
}
return $flags;
}
After looking through the routines for a while one wonders if alphabetical order is best. The large size of the class (3700+ lines) and the sheer amount of time it takes to look through it all suggests it is too large and consideration should be given to breaking it down.
Like an artist stepping back from her work, this simple method of lowering the font size allows the programmer to forget for a while the syntactical details and business rules. Instead you can observe nothing but the structure and, by doing so, spot points in the code that need extra care. I've maintained ExtUtils::MM_Unix for years now, yet this simple exercise revealed a pile of simple and obvious improvements.


Comments (3)
Excellent advice there. It reminded me of a cool feature in one of my favorite free editors : SciTE. It lets you zoom in and out just by holding the Ctrl key and spinning the mouse wheel, or using the + and - keys on the keypad. It's exactly what you need for that kind of job !
Posted by Miklos | August 17, 2007 8:02 AM
Ways to achieve artistic perspective:
- Turn on line numbers
- Set your IDE text window to be exactly the maximum length of a function at your default font size
- Try renaming class variables and see how many "hits" you get
- organize code as stack traces, so that the entire call stack for a particular function is together, depth-first. In Visual Studio, wrap deep stacks with #regions, and see at a glance how many lines of code the whole stack takes.
That's all I got for now
Posted by Mike G Burton | August 20, 2007 5:08 PM
Several years ago I wrote a viewer for VB code that takes this idea far (perhaps too far); each individual routine is a one-pixel high line whose length to the left indicates the number of lines in the routine, and to the right indicates the number of characters per line (with gray density shading that shows patterns of periodicity like c_o() has). I didn't get a whole lot of use from it, but I found it occasionally helpful. Here's a screenshot.
Posted by Carl Manaster | August 28, 2007 4:37 PM