Refactoring a switch statement @ r1

Contents hide

This article was originally published in my blog (affectionately referred to as blargh) on . The original blog no longer exists as I've migrated everything to this wiki.

The original URL of this post was at https://tmont.com/blargh/2011/11/refactoring-a-switch-statement. Hopefully that link redirects back to this page.

One day I found myself happily writing code, and then I realized I was adding a case to a switch statement that took up the entire screen. I'm pretty sure the Gang of Four had some kind of postulate that said (paraphrasing): "If you can't see the closing curly brace, you're doing it wrong." Or something like that.

Since I've read all those stupid design pattern books and spent several years doing SOA and writing "enterprise applications" using other useless acronyms I assumed I was well suited for refactoring a switch statement. And if ReSharper has taught me anything, it's that I'm an absolute pro at refactoring.

It turns out refactoring a switch statement isn't very cut and dry. If you google it, you'll find lots of articles with intelligent-sounding words and phrases like polymorphism and factory pattern. But in most cases they were either moving a switch statement to a different place, or replacing an easily-understood switch statement with a less-readable design pattern with more layers of abstraction and indirection. So which way is the right way?

How switch statements work

Obviously, this varies from language to language, but switch statements are internally represented in one of several different ways: as a jump table or as a series of if..then..else branches.

If a switch statement has few case blocks (like, say, less than 10) and the case values are close together (like, say, integers between 0 and 10), then the compiler will probably convert a switch statement to a jump table. A jump table, for the sake of simplicity, is basically a dictionary which contains function pointers. For example, this switch statement:

javascript
switch (foo) {
  case 0: return 'foo';
  case 1: return 'bar';
  case 2: return 'baz';
}

would be represented internally by this jump (hash) table:

javascript
var switchOnFoo = {
  '0': function() { return 'foo'; },
  '1': function() { return 'bar'; },
  '2': function() { return 'baz'; }
};

A jump table is more akin to an array, so lookup will take O(1). A hash table will take (depending on the hashing function and the size of the table) O(1)-ish. The point is that it's faster than comparing each case value, which would be O(n).

In the case where the values are not close together or there are many cases, the switch statement is often just represented internally as a series of if..then..else branches. When executed, a comparison must be made against each case value in order, so the complexity is O(n).

javascript
if (foo == '0') {
  return 'foo';
} else if (foo == '1') {
  return 'bar';
} else if (foo == '2') {
  return 'baz';
}

Refactoring

So why have I gone through the trouble of describing the internal representation of a switch statement? Well, it helps to understand how things work if you're going to refactor something. In the general case, switch statements are fast, because they use jump tables rather than branches. So you don't want to refactor a switch statement to something that performs worse just because of ignorance.

Note that any performance gains to be had from optimizing your switch statements are probably not worth worrying about. Refactoring should ALWAYS be done with readability and maintainability in mind. Sometimes compromises must be made between readability, maintainability and performance.

Anyway, the point is that you shouldn't refactor a switch statement because you read somewhere that they're bad and oh my god the cyclomatic complexity! Refactoring should always have a purpose beyond "I read that it's a good idea." If you can't support the reason you're refactoring with something concrete, then you have no business doing it.

Recognizing the problem

switch statements can cause problems in several different ways. If a switch statement is duplicated somewhere else in the code, then that's a problem. If you add another case, you have to grep the code and update all other instances of that switch statement. That can get out of control quickly, particularly if the number of people committing code is greater than one or can't read your mind. Also, it sucks and is no fun.

The other problem is simply readability. Giant blocks of code are hard to read, so if you find yourself having to scroll your editor to find the end of a switch statement, it's probably time for a cleanup.

So how do you solve the switch statement problem? There are two ways.

Polymorphism

Ooooh. Look at all the words I know. Try not to be intimidated by my copious patois.

Polymorphism really only applies to object-oriented languages, but you can apply a bastardized form of polymorphism to pretty much any language that has the notion of an object.

You can look up the formal definition of polymorphism elsewhere, but basically the gist of this technique is that you replace the switch statement with a method call. Say you had this function that contained a switch statement that was duplicated in other parts of the code (let's do this one in C♯):

C♯
public class CrappySwitch {
  public string DoSomething(int foo) {
    switch (foo) {
      case 0: return "foo";
      case 1: return "bar";
      case 2: return "baz";
    }
  
    return null;
  }
}

What you want is to extract that switch statement and move it higher up in the stack, so that other callers can you use the same code. This is accomplished by using everybody's favorite pattern: the factory.

C♯
public interface IFooable {
  string DoSomething();
}

public class ZeroFooable : IFooable {
  public string DoSomething() {
    return "foo";
  }
}

public class OneFooable : IFooable {
  public string DoSomething() {
    return "bar";
  }
}

public class TwoFooable : IFooable {
  public string DoSomething() {
    return "baz";
  }
}

public class DefaultFooable : IFooable {
  public string DoSomething() {
    return null;
  }
}

public class FooableFactory {
  public IFooable GetFooable(int foo) {
    switch (foo) {
      case 0: return new ZeroFooable();
      case 1: return new OneFooable();
      case 2: return new TwoFooable();
    }

    return new DefaultFooable();
  }
}

public class CrappySwitch {
  private readonly FooableFactory factory;

  public CrappySwitch(FooableFactory factory) {
    this.factory = factory;
  }

  public string DoSomething(int foo) {
    return factory.GetFooable(foo).DoSomething();
  }
}

So, obviously this is a lot more code. Particularly since our case statements were pretty much empty. But creating more code to mitigate potential disasters is a fair trade-off. To accomplish what previously would have taken a code grep, you must now:

  1. Add a case statement in the factory method
  2. Create a new implementation of IFooable corresponding to that case statement

That's not so bad, right? The point is that the switch statements have been consolidated, and the potential for catastrophe has been mitigated. Probably. Unless you're an idiot.

Hash tables

The other way to refactor a switch statement is through the use of a hash table. Or dictionary. Or array if you're using PHP. Using this method, your switch statement disappears and in its place a hash table emerges. In JavaScript this time:

javascript
function doSomething(foo) {
  switch (foo) {
    case 'f': return 'foo';
    case 'b': return 'bar';
    case 'z': return 'baz';
  }

  return null;
}

becomes

javascript
var fooTable = {
  f: function() { return 'foo'; },
  b: function() { return 'bar'; },
  z: function() { return 'baz'; }
};

function doSomething(foo) {
  var fooable = fooTable[foo];
  if (fooable) {
    return fooable();
  }
  
  return null;
}

In this method, the analog of adding a case statement is adding another item to the hash table (or dictionary, or array, or whatever).

Wrapping up

It should be noted that in both cases the amount of code didn't necessarily decrease. The point of refactoring is rarely to decrease the amount of code, but rather to improve the design of the current code.

Whichever method you choose is contingent upon the code you're working with. A couple common use cases are:

  • If you're trying to remove duplicated switch statements in separate classes/objects/files/scopes, try to use polymorphism
  • If you're trying to improve performance, use a hash table
  • If you're trying to improve readablity only, use a hash table
  • If your switch statement is embarrasingly huge and only used in one place, use a hash table (or figure out why it's so huge and fix that)