Refactoring a switch statement[source]
xml
<glacius:metadata> | |
<title>Refactoring a switch statement</title> | |
<description>Methods to refactoring a switch statement</description> | |
<category>Legacy blog posts</category> | |
<category>Programming</category> | |
<category>Refactoring</category> | |
<category>JavaScript</category> | |
<category>C#</category> | |
</glacius:metadata> | |
<glacius:macro name="legacy blargh banner"> | |
<properties> | |
<originalUrl>https://tmont.com/blargh/2011/11/refactoring-a-switch-statement</originalUrl> | |
<originalDate>2011-11-29T06:12:47.000Z</originalDate> | |
</properties> | |
</glacius:macro> | |
<p> | |
One day I found myself happily writing code, and then I realized I was adding a <code>case</code> to a | |
<code>switch</code> statement that took up the entire screen. I'm pretty sure the Gang of Four had some kind | |
of postulate that said (paraphrasing): "If you can't see the closing curly brace, you're doing it | |
wrong." Or something like that. | |
</p> | |
<p> | |
Since I've read all those stupid design pattern books and spent several years doing SOA and writing | |
"enterprise applications" using other useless acronyms I assumed I was well suited for refactoring a | |
<code>switch</code> statement. And if | |
<a href="https://www.jetbrains.com/resharper/">ReSharper</a> has taught me anything, it's that I'm an | |
absolute pro at refactoring. | |
</p> | |
<p> | |
It turns out refactoring a <code>switch</code> statement isn't very cut and dry. If | |
you google it, you'll find lots of articles with intelligent-sounding words and phrases like | |
<strong>polymorphism</strong> | |
and <strong>factory pattern</strong>. But in most cases they were either moving a <code>switch</code> statement to | |
a different place, or replacing an easily-understood <code>switch</code> statement with a less-readable design pattern | |
with more layers of abstraction and indirection. So which way is the right way? | |
</p> | |
<h3>How <code>switch</code> statements work</h3> | |
<p> | |
Obviously, this varies from language to language, but <code>switch</code> statements are internally represented | |
in one of several different ways: as a jump table or as a series of <code>if..then..else</code> branches. | |
</p> | |
<p> | |
If a <code>switch</code> statement has few <code>case</code> blocks (like, say, less than 10) and the | |
<code>case</code> values are close | |
together (like, say, integers between 0 and 10), then the compiler will probably convert a <code>switch</code> | |
statement to a jump table. A jump table, for the sake of simplicity, is basically a dictionary | |
which contains function pointers. For example, this <code>switch</code> statement: | |
</p> | |
<glacius:code lang="javascript"><![CDATA[switch (foo) { | |
case 0: return 'foo'; | |
case 1: return 'bar'; | |
case 2: return 'baz'; | |
}]]></glacius:code> | |
<p> | |
would be represented internally by this jump (hash) table: | |
</p> | |
<glacius:code lang="javascript"><![CDATA[ | |
var switchOnFoo = { | |
'0': function() { return 'foo'; }, | |
'1': function() { return 'bar'; }, | |
'2': function() { return 'baz'; } | |
}; | |
]]></glacius:code> | |
<p> | |
A jump table is more akin to an array, so lookup will take <code>O(1)</code>. A hash table will take | |
(depending on the hashing function and the size of the table) <code>O(1)</code>-ish. The point is | |
that it's faster than comparing each <code>case</code> value, which would be <code>O(n)</code>. | |
</p> | |
<p> | |
In the case where the values are not close together or there are many cases, the <code>switch</code> statement | |
is often just represented internally as a series of <code>if..then..else</code> branches. When | |
executed, a comparison must be made against each <code>case</code> value in order, so the complexity is | |
<code>O(n)</code>. | |
</p> | |
<glacius:code lang="javascript"><![CDATA[ | |
if (foo == '0') { | |
return 'foo'; | |
} else if (foo == '1') { | |
return 'bar'; | |
} else if (foo == '2') { | |
return 'baz'; | |
} | |
]]></glacius:code> | |
<h3>Refactoring</h3> | |
<p> | |
So why have I gone through the trouble of describing the internal representation of a <code>switch</code> | |
statement? Well, it helps to understand how things work if you're going to refactor something. In the | |
general case, <code>switch</code> statements are fast, because they use jump tables rather than branches. | |
So you don't want to refactor a <code>switch</code> statement to something that performs worse just | |
because of ignorance. | |
</p> | |
<p> | |
Note that any performance gains to be had from optimizing your <code>switch</code> statements are probably not | |
worth worrying about. Refactoring should ALWAYS be done with readability and maintainability in mind. Sometimes | |
compromises must be made between readability, maintainability and performance. | |
</p> | |
<p> | |
Anyway, the point is that you shouldn't refactor a <code>switch</code> statement because you read | |
somewhere that they're bad and oh my god the cyclomatic complexity! Refactoring should | |
always have a purpose beyond "I read that it's a good idea." If you can't support the reason you're | |
refactoring with something concrete, then you have no business doing it. | |
</p> | |
<h4>Recognizing the problem</h4> | |
<p> | |
<code>switch</code> statements can cause problems in several different ways. If a <code>switch</code> statement | |
is duplicated somewhere else in the code, then that's a problem. If you add another <code>case</code>, you have | |
to grep the code and update all other instances of that <code>switch</code> statement. That can get out of | |
control quickly, particularly if the number of people committing code is greater than one or can't | |
read your mind. Also, it sucks and is no fun. | |
</p> | |
<p> | |
The other problem is simply readability. Giant blocks of code are hard to read, so if you find yourself | |
having to scroll your editor to find the end of a <code>switch</code> statement, it's probably time for a cleanup. | |
</p> | |
<p> | |
So how do you solve the <code>switch</code> statement problem? There are two ways. | |
</p> | |
<h4>Polymorphism</h4> | |
<p> | |
Ooooh. Look at all the words I know. Try not to be intimidated by my copious patois. | |
</p> | |
<p> | |
Polymorphism really only applies to object-oriented languages, but you can apply a bastardized | |
form of polymorphism to pretty much any language that has the notion of an object. | |
</p> | |
<p> | |
You can look up the formal definition of polymorphism elsewhere, but basically the gist of this technique | |
is that you replace the <code>switch</code> statement with a method call. Say you had this function that | |
contained a <code>switch</code> statement that was duplicated in other parts of the code (let's do this one | |
in C♯): | |
</p> | |
<glacius:code lang="csharp"><![CDATA[ | |
public class CrappySwitch { | |
public string DoSomething(int foo) { | |
switch (foo) { | |
case 0: return "foo"; | |
case 1: return "bar"; | |
case 2: return "baz"; | |
} | |
return null; | |
} | |
} | |
]]></glacius:code> | |
<p> | |
What you want is to extract that switch statement and move it higher up in the stack, so that | |
other callers can you use the same code. This is accomplished by using everybody's favorite | |
pattern: the <a href="https://en.wikipedia.org/wiki/Factory_method_pattern">factory</a>. | |
</p> | |
<glacius:code lang="csharp"><![CDATA[ | |
public interface IFooable { | |
string DoSomething(); | |
} | |
public class ZeroFooable : IFooable { | |
public string DoSomething() { | |
return "foo"; | |
} | |
} | |
public class OneFooable : IFooable { | |
public string DoSomething() { | |
return "bar"; | |
} | |
} | |
public class TwoFooable : IFooable { | |
public string DoSomething() { | |
return "baz"; | |
} | |
} | |
public class DefaultFooable : IFooable { | |
public string DoSomething() { | |
return null; | |
} | |
} | |
public class FooableFactory { | |
public IFooable GetFooable(int foo) { | |
switch (foo) { | |
case 0: return new ZeroFooable(); | |
case 1: return new OneFooable(); | |
case 2: return new TwoFooable(); | |
} | |
return new DefaultFooable(); | |
} | |
} | |
public class CrappySwitch { | |
private readonly FooableFactory factory; | |
public CrappySwitch(FooableFactory factory) { | |
this.factory = factory; | |
} | |
public string DoSomething(int foo) { | |
return factory.GetFooable(foo).DoSomething(); | |
} | |
} | |
]]></glacius:code> | |
<p> | |
So, obviously this is a lot more code. Particularly since our <code>case</code> statements were pretty much | |
empty. But creating more code to mitigate potential disasters is a fair trade-off. To accomplish | |
what previously would have taken a code grep, you must now: | |
</p> | |
<ol> | |
<li>Add a <code>case</code> statement in the factory method</li> | |
<li>Create a new implementation of <code>IFooable</code> corresponding to that case statement</li> | |
</ol> | |
<p> | |
That's not so bad, right? The point is that the <code>switch</code> statements have been consolidated, and | |
the potential for catastrophe has been mitigated. Probably. Unless you're an idiot. | |
</p> | |
<h3>Hash tables</h3> | |
<p> | |
The other way to refactor a <code>switch</code> statement is through the use of a hash table. | |
Or dictionary. Or array if you're using PHP. Using this method, your <code>switch</code> statement | |
disappears and in its place a hash table emerges. In JavaScript this time: | |
</p> | |
<glacius:code lang="javascript"><![CDATA[ | |
function doSomething(foo) { | |
switch (foo) { | |
case 'f': return 'foo'; | |
case 'b': return 'bar'; | |
case 'z': return 'baz'; | |
} | |
return null; | |
} | |
]]></glacius:code> | |
<p>becomes</p> | |
<glacius:code lang="javascript"><![CDATA[ | |
var fooTable = { | |
f: function() { return 'foo'; }, | |
b: function() { return 'bar'; }, | |
z: function() { return 'baz'; } | |
}; | |
function doSomething(foo) { | |
var fooable = fooTable[foo]; | |
if (fooable) { | |
return fooable(); | |
} | |
return null; | |
} | |
]]></glacius:code> | |
<p> | |
In this method, the analog of adding a <code>case</code> statement is adding another item | |
to the hash table (or dictionary, or array, or whatever). | |
</p> | |
<h3>Wrapping up</h3> | |
<p> | |
It should be noted that in both cases the amount of code didn't necessarily decrease. The point | |
of refactoring is rarely to decrease the amount of code, but rather to improve the design of | |
the current code. | |
</p> | |
<p> | |
Whichever method you choose is contingent upon the code you're working with. A couple common | |
use cases are: | |
</p> | |
<ul> | |
<li> | |
If you're trying to remove duplicated <code>switch</code> statements in separate | |
classes/objects/files/scopes, try to use polymorphism | |
</li> | |
<li> | |
If you're trying to improve performance, use a hash table | |
</li> | |
<li> | |
If you're trying to improve readablity only, use a hash table | |
</li> | |
<li> | |
If your <code>switch</code> statement is embarrasingly huge and only used in one place, | |
use a hash table (or figure out why it's so huge and fix that) | |
</li> | |
</ul> | |