Refactoring a switch statement
This article was originally published in my blog (affectionately referred to as blargh) on . The original blog no longer exists as I've migrated everything to this wiki.
The original URL of this post was at https://tmont.com/blargh/2011/11/refactoring-a-switch-statement. Hopefully that link redirects back to this page.
One day I found myself happily writing code, and then I realized I was adding a case
to a
switch
statement that took up the entire screen. I'm pretty sure the Gang of Four had some kind
of postulate that said (paraphrasing): "If you can't see the closing curly brace, you're doing it
wrong." Or something like that.
Since I've read all those stupid design pattern books and spent several years doing SOA and writing
"enterprise applications" using other useless acronyms I assumed I was well suited for refactoring a
switch
statement. And if
ReSharper has taught me anything, it's that I'm an
absolute pro at refactoring.
It turns out refactoring a switch
statement isn't very cut and dry. If
you google it, you'll find lots of articles with intelligent-sounding words and phrases like
polymorphism
and factory pattern. But in most cases they were either moving a switch
statement to
a different place, or replacing an easily-understood switch
statement with a less-readable design pattern
with more layers of abstraction and indirection. So which way is the right way?
How switch
statements work
Obviously, this varies from language to language, but switch
statements are internally represented
in one of several different ways: as a jump table or as a series of if..then..else
branches.
If a switch
statement has few case
blocks (like, say, less than 10) and the
case
values are close
together (like, say, integers between 0 and 10), then the compiler will probably convert a switch
statement to a jump table. A jump table, for the sake of simplicity, is basically a dictionary
which contains function pointers. For example, this switch
statement:
switch (foo) {
case 0: return 'foo';
case 1: return 'bar';
case 2: return 'baz';
}
would be represented internally by this jump (hash) table:
var switchOnFoo = {
'0': function() { return 'foo'; },
'1': function() { return 'bar'; },
'2': function() { return 'baz'; }
};
A jump table is more akin to an array, so lookup will take O(1)
. A hash table will take
(depending on the hashing function and the size of the table) O(1)
-ish. The point is
that it's faster than comparing each case
value, which would be O(n)
.
In the case where the values are not close together or there are many cases, the switch
statement
is often just represented internally as a series of if..then..else
branches. When
executed, a comparison must be made against each case
value in order, so the complexity is
O(n)
.
if (foo == '0') {
return 'foo';
} else if (foo == '1') {
return 'bar';
} else if (foo == '2') {
return 'baz';
}
Refactoring
So why have I gone through the trouble of describing the internal representation of a switch
statement? Well, it helps to understand how things work if you're going to refactor something. In the
general case, switch
statements are fast, because they use jump tables rather than branches.
So you don't want to refactor a switch
statement to something that performs worse just
because of ignorance.
Note that any performance gains to be had from optimizing your switch
statements are probably not
worth worrying about. Refactoring should ALWAYS be done with readability and maintainability in mind. Sometimes
compromises must be made between readability, maintainability and performance.
Anyway, the point is that you shouldn't refactor a switch
statement because you read
somewhere that they're bad and oh my god the cyclomatic complexity! Refactoring should
always have a purpose beyond "I read that it's a good idea." If you can't support the reason you're
refactoring with something concrete, then you have no business doing it.
Recognizing the problem
switch
statements can cause problems in several different ways. If a switch
statement
is duplicated somewhere else in the code, then that's a problem. If you add another case
, you have
to grep the code and update all other instances of that switch
statement. That can get out of
control quickly, particularly if the number of people committing code is greater than one or can't
read your mind. Also, it sucks and is no fun.
The other problem is simply readability. Giant blocks of code are hard to read, so if you find yourself
having to scroll your editor to find the end of a switch
statement, it's probably time for a cleanup.
So how do you solve the switch
statement problem? There are two ways.
Polymorphism
Ooooh. Look at all the words I know. Try not to be intimidated by my copious patois.
Polymorphism really only applies to object-oriented languages, but you can apply a bastardized form of polymorphism to pretty much any language that has the notion of an object.
You can look up the formal definition of polymorphism elsewhere, but basically the gist of this technique
is that you replace the switch
statement with a method call. Say you had this function that
contained a switch
statement that was duplicated in other parts of the code (let's do this one
in C♯):
public class CrappySwitch {
public string DoSomething(int foo) {
switch (foo) {
case 0: return "foo";
case 1: return "bar";
case 2: return "baz";
}
return null;
}
}
What you want is to extract that switch statement and move it higher up in the stack, so that other callers can you use the same code. This is accomplished by using everybody's favorite pattern: the factory.
public interface IFooable {
string DoSomething();
}
public class ZeroFooable : IFooable {
public string DoSomething() {
return "foo";
}
}
public class OneFooable : IFooable {
public string DoSomething() {
return "bar";
}
}
public class TwoFooable : IFooable {
public string DoSomething() {
return "baz";
}
}
public class DefaultFooable : IFooable {
public string DoSomething() {
return null;
}
}
public class FooableFactory {
public IFooable GetFooable(int foo) {
switch (foo) {
case 0: return new ZeroFooable();
case 1: return new OneFooable();
case 2: return new TwoFooable();
}
return new DefaultFooable();
}
}
public class CrappySwitch {
private readonly FooableFactory factory;
public CrappySwitch(FooableFactory factory) {
this.factory = factory;
}
public string DoSomething(int foo) {
return factory.GetFooable(foo).DoSomething();
}
}
So, obviously this is a lot more code. Particularly since our case
statements were pretty much
empty. But creating more code to mitigate potential disasters is a fair trade-off. To accomplish
what previously would have taken a code grep, you must now:
- Add a
case
statement in the factory method - Create a new implementation of
IFooable
corresponding to that case statement
That's not so bad, right? The point is that the switch
statements have been consolidated, and
the potential for catastrophe has been mitigated. Probably. Unless you're an idiot.
Hash tables
The other way to refactor a switch
statement is through the use of a hash table.
Or dictionary. Or array if you're using PHP. Using this method, your switch
statement
disappears and in its place a hash table emerges. In JavaScript this time:
function doSomething(foo) {
switch (foo) {
case 'f': return 'foo';
case 'b': return 'bar';
case 'z': return 'baz';
}
return null;
}
becomes
var fooTable = {
f: function() { return 'foo'; },
b: function() { return 'bar'; },
z: function() { return 'baz'; }
};
function doSomething(foo) {
var fooable = fooTable[foo];
if (fooable) {
return fooable();
}
return null;
}
In this method, the analog of adding a case
statement is adding another item
to the hash table (or dictionary, or array, or whatever).
Wrapping up
It should be noted that in both cases the amount of code didn't necessarily decrease. The point of refactoring is rarely to decrease the amount of code, but rather to improve the design of the current code.
Whichever method you choose is contingent upon the code you're working with. A couple common use cases are:
-
If you're trying to remove duplicated
switch
statements in separate classes/objects/files/scopes, try to use polymorphism - If you're trying to improve performance, use a hash table
- If you're trying to improve readablity only, use a hash table
-
If your
switch
statement is embarrasingly huge and only used in one place, use a hash table (or figure out why it's so huge and fix that)