Refactoring a switch statement | Source

Refactoring a switch statement_[source]

xml
<glacius:metadata>
    <title>Refactoring a switch statement</title>
    <description>Methods to refactoring a switch statement</description>
    <category>Legacy blog posts</category>
    <category>Programming</category>
    <category>Refactoring</category>
    <category>JavaScript</category>
    <category>C#</category>
</glacius:metadata>
<glacius:macro name="legacy blargh banner">
    <properties>
        <originalUrl>https://tmont.com/blargh/2011/11/refactoring-a-switch-statement</originalUrl>
        <originalDate>2011-11-29T06:12:47.000Z</originalDate>
    </properties>
</glacius:macro>
<p>
  One day I found myself happily writing code, and then I realized I was adding a <code>case</code> to a
  <code>switch</code> statement that took up the entire screen. I'm pretty sure the Gang of Four had some kind
  of postulate that said (paraphrasing): "If you can't see the closing curly brace, you're doing it
  wrong." Or something like that.
</p>
<p>
  Since I've read all those stupid design pattern books and spent several years doing SOA and writing
  "enterprise applications" using other useless acronyms I assumed I was well suited for refactoring a 
  <code>switch</code> statement. And if
  <a href="https://www.jetbrains.com/resharper/">ReSharper</a> has taught me anything, it's that I'm an
  absolute pro at refactoring.
</p>
<p>
  It turns out refactoring a <code>switch</code> statement isn't very cut and dry. If
  you google it, you'll find lots of articles with intelligent-sounding words and phrases like
  <strong>polymorphism</strong>
  and <strong>factory pattern</strong>. But in most cases they were either moving a <code>switch</code> statement to
  a different place, or replacing an easily-understood <code>switch</code> statement with a less-readable design pattern
  with more layers of abstraction and indirection. So which way is the right way?
</p>
<h3>How <code>switch</code> statements work</h3>
<p>
  Obviously, this varies from language to language, but <code>switch</code> statements are internally represented
  in one of several different ways: as a jump table or as a series of <code>if..then..else</code> branches.
</p>
<p>
  If a <code>switch</code> statement has few <code>case</code> blocks (like, say, less than 10) and the
  <code>case</code> values are close 
  together (like, say, integers between 0 and 10), then the compiler will probably convert a <code>switch</code>
  statement to a jump table. A jump table, for the sake of simplicity, is basically a dictionary
  which contains function pointers. For example, this <code>switch</code> statement:
</p>
<glacius:code lang="javascript"><![CDATA[switch (foo) {
  case 0: return 'foo';
  case 1: return 'bar';
  case 2: return 'baz';
}]]></glacius:code>
<p>
  would be represented internally by this jump (hash) table:
</p>
<glacius:code lang="javascript"><![CDATA[
var switchOnFoo = {
  '0': function() { return 'foo'; },
  '1': function() { return 'bar'; },
  '2': function() { return 'baz'; }
};
]]></glacius:code>
<p>
  A jump table is more akin to an array, so lookup will take <code>O(1)</code>. A hash table will take 
  (depending on the hashing function and the size of the table) <code>O(1)</code>-ish. The point is
  that it's faster than comparing each <code>case</code> value, which would be <code>O(n)</code>.
</p>
<p>
  In the case where the values are not close together or there are many cases, the <code>switch</code> statement
  is often just represented internally as a series of <code>if..then..else</code> branches. When
  executed, a comparison must be made against each <code>case</code> value in order, so the complexity is
  <code>O(n)</code>.
</p>
<glacius:code lang="javascript"><![CDATA[
if (foo == '0') {
  return 'foo';
} else if (foo == '1') {
  return 'bar';
} else if (foo == '2') {
  return 'baz';
}
]]></glacius:code>
<h3>Refactoring</h3>
<p>
  So why have I gone through the trouble of describing the internal representation of a <code>switch</code>
  statement? Well, it helps to understand how things work if you're going to refactor something. In the
  general case, <code>switch</code> statements are fast, because they use jump tables rather than branches.
  So you don't want to refactor a <code>switch</code> statement to something that performs worse just
  because of ignorance.
</p>
<p>
  Note that any performance gains to be had from optimizing your <code>switch</code> statements are probably not
  worth worrying about. Refactoring should ALWAYS be done with readability and maintainability in mind. Sometimes
  compromises must be made between readability, maintainability and performance.
</p>
<p>
  Anyway, the point is that you shouldn't refactor a <code>switch</code> statement because you read
  somewhere that they're bad and oh my god the cyclomatic complexity! Refactoring should
  always have a purpose beyond "I read that it's a good idea." If you can't support the reason you're
  refactoring with something concrete, then you have no business doing it.
</p>
<h4>Recognizing the problem</h4>
<p>
  <code>switch</code> statements can cause problems in several different ways. If a <code>switch</code> statement
  is duplicated somewhere else in the code, then that's a problem. If you add another <code>case</code>, you have
  to grep the code and update all other instances of that <code>switch</code> statement. That can get out of
  control quickly, particularly if the number of people committing code is greater than one or can't
  read your mind. Also, it sucks and is no fun.
</p>
<p>
  The other problem is simply readability. Giant blocks of code are hard to read, so if you find yourself
  having to scroll your editor to find the end of a <code>switch</code> statement, it's probably time for a cleanup.
</p>
<p>
  So how do you solve the <code>switch</code> statement problem? There are two ways.
</p>
<h4>Polymorphism</h4>
<p>
  Ooooh. Look at all the words I know. Try not to be intimidated by my copious patois.
</p>
<p>
  Polymorphism really only applies to object-oriented languages, but you can apply a bastardized
  form of polymorphism to pretty much any language that has the notion of an object.
</p>
<p>
  You can look up the formal definition of polymorphism elsewhere, but basically the gist of this technique
  is that you replace the <code>switch</code> statement with a method call. Say you had this function that
  contained a <code>switch</code> statement that was duplicated in other parts of the code (let's do this one
  in C&#x266f;):
</p>
<glacius:code lang="csharp"><![CDATA[
public class CrappySwitch {
  public string DoSomething(int foo) {
    switch (foo) {
      case 0: return "foo";
      case 1: return "bar";
      case 2: return "baz";
    }
  
    return null;
  }
}
]]></glacius:code>
<p>
  What you want is to extract that switch statement and move it higher up in the stack, so that
  other callers can you use the same code. This is accomplished by using everybody's favorite
  pattern: the <a href="https://en.wikipedia.org/wiki/Factory_method_pattern">factory</a>.
</p>
<glacius:code lang="csharp"><![CDATA[
public interface IFooable {
  string DoSomething();
}
public class ZeroFooable : IFooable {
  public string DoSomething() {
    return "foo";
  }
}
public class OneFooable : IFooable {
  public string DoSomething() {
    return "bar";
  }
}
public class TwoFooable : IFooable {
  public string DoSomething() {
    return "baz";
  }
}
public class DefaultFooable : IFooable {
  public string DoSomething() {
    return null;
  }
}
public class FooableFactory {
  public IFooable GetFooable(int foo) {
    switch (foo) {
      case 0: return new ZeroFooable();
      case 1: return new OneFooable();
      case 2: return new TwoFooable();
    }
    return new DefaultFooable();
  }
}
public class CrappySwitch {
  private readonly FooableFactory factory;
  public CrappySwitch(FooableFactory factory) {
    this.factory = factory;
  }
  public string DoSomething(int foo) {
    return factory.GetFooable(foo).DoSomething();
  }
}
]]></glacius:code>
<p>
  So, obviously this is a lot more code. Particularly since our <code>case</code> statements were pretty much
  empty. But creating more code to mitigate potential disasters is a fair trade-off. To accomplish
  what previously would have taken a code grep, you must now:
</p>
<ol>
  <li>Add a <code>case</code> statement in the factory method</li>
  <li>Create a new implementation of <code>IFooable</code> corresponding to that case statement</li>
</ol>
<p>
  That's not so bad, right? The point is that the <code>switch</code> statements have been consolidated, and
  the potential for catastrophe has been mitigated. Probably. Unless you're an idiot.
</p>
<h3>Hash tables</h3>
<p>
  The other way to refactor a <code>switch</code> statement is through the use of a hash table.
  Or dictionary. Or array if you're using PHP. Using this method, your <code>switch</code> statement
  disappears and in its place a hash table emerges. In JavaScript this time:
</p>
<glacius:code lang="javascript"><![CDATA[
function doSomething(foo) {
  switch (foo) {
    case 'f': return 'foo';
    case 'b': return 'bar';
    case 'z': return 'baz';
  }
  return null;
}
]]></glacius:code>
<p>becomes</p>
<glacius:code lang="javascript"><![CDATA[
var fooTable = {
  f: function() { return 'foo'; },
  b: function() { return 'bar'; },
  z: function() { return 'baz'; }
};
function doSomething(foo) {
  var fooable = fooTable[foo];
  if (fooable) {
    return fooable();
  }
  
  return null;
}
]]></glacius:code>
<p>
  In this method, the analog of adding a <code>case</code> statement is adding another item
  to the hash table (or dictionary, or array, or whatever).
</p>
<h3>Wrapping up</h3>
<p>
  It should be noted that in both cases the amount of code didn't necessarily decrease. The point
  of refactoring is rarely to decrease the amount of code, but rather to improve the design of
  the current code.
</p>
<p>
  Whichever method you choose is contingent upon the code you're working with. A couple common
  use cases are:
</p>
<ul>
  <li>
    If you're trying to remove duplicated <code>switch</code> statements in separate 
    classes/objects/files/scopes, try to use polymorphism
  </li>
  <li>
    If you're trying to improve performance, use a hash table
  </li>
  <li>
    If you're trying to improve readablity only, use a hash table
  </li>
  <li>
    If your <code>switch</code> statement is embarrasingly huge and only used in one place,
    use a hash table (or figure out why it's so huge and fix that)
  </li>
</ul>

	<glacius:metadata>
	<title>Refactoring a switch statement</title>
	<description>Methods to refactoring a switch statement</description>
	<category>Legacy blog posts</category>
	<category>Programming</category>
	<category>Refactoring</category>
	<category>JavaScript</category>
	<category>C#</category>
	</glacius:metadata>

	<glacius:macro name="legacy blargh banner">
	<properties>
	<originalUrl>https://tmont.com/blargh/2011/11/refactoring-a-switch-statement</originalUrl>
	<originalDate>2011-11-29T06:12:47.000Z</originalDate>
	</properties>
	</glacius:macro>

	<p>
	One day I found myself happily writing code, and then I realized I was adding a <code>case</code> to a
	<code>switch</code> statement that took up the entire screen. I'm pretty sure the Gang of Four had some kind
	of postulate that said (paraphrasing): "If you can't see the closing curly brace, you're doing it
	wrong." Or something like that.
	</p>

	<p>
	Since I've read all those stupid design pattern books and spent several years doing SOA and writing
	"enterprise applications" using other useless acronyms I assumed I was well suited for refactoring a
	<code>switch</code> statement. And if
	<a href="https://www.jetbrains.com/resharper/">ReSharper</a> has taught me anything, it's that I'm an
	absolute pro at refactoring.
	</p>

	<p>
	It turns out refactoring a <code>switch</code> statement isn't very cut and dry. If
	you google it, you'll find lots of articles with intelligent-sounding words and phrases like
	<strong>polymorphism</strong>
	and <strong>factory pattern</strong>. But in most cases they were either moving a <code>switch</code> statement to
	a different place, or replacing an easily-understood <code>switch</code> statement with a less-readable design pattern
	with more layers of abstraction and indirection. So which way is the right way?
	</p>

	<h3>How <code>switch</code> statements work</h3>

	<p>
	Obviously, this varies from language to language, but <code>switch</code> statements are internally represented
	in one of several different ways: as a jump table or as a series of <code>if..then..else</code> branches.
	</p>

	<p>
	If a <code>switch</code> statement has few <code>case</code> blocks (like, say, less than 10) and the
	<code>case</code> values are close
	together (like, say, integers between 0 and 10), then the compiler will probably convert a <code>switch</code>
	statement to a jump table. A jump table, for the sake of simplicity, is basically a dictionary
	which contains function pointers. For example, this <code>switch</code> statement:
	</p>

	<glacius:code lang="javascript"><![CDATA[switch (foo) {
	case 0: return 'foo';
	case 1: return 'bar';
	case 2: return 'baz';
	}]]></glacius:code>

	<p>
	would be represented internally by this jump (hash) table:
	</p>

	<glacius:code lang="javascript"><![CDATA[
	var switchOnFoo = {
	'0': function() { return 'foo'; },
	'1': function() { return 'bar'; },
	'2': function() { return 'baz'; }
	};
	]]></glacius:code>

	<p>
	A jump table is more akin to an array, so lookup will take <code>O(1)</code>. A hash table will take
	(depending on the hashing function and the size of the table) <code>O(1)</code>-ish. The point is
	that it's faster than comparing each <code>case</code> value, which would be <code>O(n)</code>.
	</p>

	<p>
	In the case where the values are not close together or there are many cases, the <code>switch</code> statement
	is often just represented internally as a series of <code>if..then..else</code> branches. When
	executed, a comparison must be made against each <code>case</code> value in order, so the complexity is
	<code>O(n)</code>.
	</p>

	<glacius:code lang="javascript"><![CDATA[
	if (foo == '0') {
	return 'foo';
	} else if (foo == '1') {
	return 'bar';
	} else if (foo == '2') {
	return 'baz';
	}
	]]></glacius:code>

	<h3>Refactoring</h3>
	<p>
	So why have I gone through the trouble of describing the internal representation of a <code>switch</code>
	statement? Well, it helps to understand how things work if you're going to refactor something. In the
	general case, <code>switch</code> statements are fast, because they use jump tables rather than branches.
	So you don't want to refactor a <code>switch</code> statement to something that performs worse just
	because of ignorance.
	</p>

	<p>
	Note that any performance gains to be had from optimizing your <code>switch</code> statements are probably not
	worth worrying about. Refactoring should ALWAYS be done with readability and maintainability in mind. Sometimes
	compromises must be made between readability, maintainability and performance.
	</p>

	<p>
	Anyway, the point is that you shouldn't refactor a <code>switch</code> statement because you read
	somewhere that they're bad and oh my god the cyclomatic complexity! Refactoring should
	always have a purpose beyond "I read that it's a good idea." If you can't support the reason you're
	refactoring with something concrete, then you have no business doing it.
	</p>

	<h4>Recognizing the problem</h4>
	<p>
	<code>switch</code> statements can cause problems in several different ways. If a <code>switch</code> statement
	is duplicated somewhere else in the code, then that's a problem. If you add another <code>case</code>, you have
	to grep the code and update all other instances of that <code>switch</code> statement. That can get out of
	control quickly, particularly if the number of people committing code is greater than one or can't
	read your mind. Also, it sucks and is no fun.
	</p>

	<p>
	The other problem is simply readability. Giant blocks of code are hard to read, so if you find yourself
	having to scroll your editor to find the end of a <code>switch</code> statement, it's probably time for a cleanup.
	</p>

	<p>
	So how do you solve the <code>switch</code> statement problem? There are two ways.
	</p>

	<h4>Polymorphism</h4>
	<p>
	Ooooh. Look at all the words I know. Try not to be intimidated by my copious patois.
	</p>

	<p>
	Polymorphism really only applies to object-oriented languages, but you can apply a bastardized
	form of polymorphism to pretty much any language that has the notion of an object.
	</p>

	<p>
	You can look up the formal definition of polymorphism elsewhere, but basically the gist of this technique
	is that you replace the <code>switch</code> statement with a method call. Say you had this function that
	contained a <code>switch</code> statement that was duplicated in other parts of the code (let's do this one
	in C♯):
	</p>

	<glacius:code lang="csharp"><![CDATA[
	public class CrappySwitch {
	public string DoSomething(int foo) {
	switch (foo) {
	case 0: return "foo";
	case 1: return "bar";
	case 2: return "baz";
	}

	return null;
	}
	}
	]]></glacius:code>

	<p>
	What you want is to extract that switch statement and move it higher up in the stack, so that
	other callers can you use the same code. This is accomplished by using everybody's favorite
	pattern: the <a href="https://en.wikipedia.org/wiki/Factory_method_pattern">factory</a>.
	</p>

	<glacius:code lang="csharp"><![CDATA[
	public interface IFooable {
	string DoSomething();
	}

	public class ZeroFooable : IFooable {
	public string DoSomething() {
	return "foo";
	}
	}

	public class OneFooable : IFooable {
	public string DoSomething() {
	return "bar";
	}
	}

	public class TwoFooable : IFooable {
	public string DoSomething() {
	return "baz";
	}
	}

	public class DefaultFooable : IFooable {
	public string DoSomething() {
	return null;
	}
	}

	public class FooableFactory {
	public IFooable GetFooable(int foo) {
	switch (foo) {
	case 0: return new ZeroFooable();
	case 1: return new OneFooable();
	case 2: return new TwoFooable();
	}

	return new DefaultFooable();
	}
	}

	public class CrappySwitch {
	private readonly FooableFactory factory;

	public CrappySwitch(FooableFactory factory) {
	this.factory = factory;
	}

	public string DoSomething(int foo) {
	return factory.GetFooable(foo).DoSomething();
	}
	}
	]]></glacius:code>

	<p>
	So, obviously this is a lot more code. Particularly since our <code>case</code> statements were pretty much
	empty. But creating more code to mitigate potential disasters is a fair trade-off. To accomplish
	what previously would have taken a code grep, you must now:
	</p>

	<ol>
	<li>Add a <code>case</code> statement in the factory method</li>
	<li>Create a new implementation of <code>IFooable</code> corresponding to that case statement</li>
	</ol>

	<p>
	That's not so bad, right? The point is that the <code>switch</code> statements have been consolidated, and
	the potential for catastrophe has been mitigated. Probably. Unless you're an idiot.
	</p>

	<h3>Hash tables</h3>
	<p>
	The other way to refactor a <code>switch</code> statement is through the use of a hash table.
	Or dictionary. Or array if you're using PHP. Using this method, your <code>switch</code> statement
	disappears and in its place a hash table emerges. In JavaScript this time:
	</p>

	<glacius:code lang="javascript"><![CDATA[
	function doSomething(foo) {
	switch (foo) {
	case 'f': return 'foo';
	case 'b': return 'bar';
	case 'z': return 'baz';
	}

	return null;
	}
	]]></glacius:code>

	<p>becomes</p>

	<glacius:code lang="javascript"><![CDATA[
	var fooTable = {
	f: function() { return 'foo'; },
	b: function() { return 'bar'; },
	z: function() { return 'baz'; }
	};

	function doSomething(foo) {
	var fooable = fooTable[foo];
	if (fooable) {
	return fooable();
	}

	return null;
	}
	]]></glacius:code>

	<p>
	In this method, the analog of adding a <code>case</code> statement is adding another item
	to the hash table (or dictionary, or array, or whatever).
	</p>

	<h3>Wrapping up</h3>
	<p>
	It should be noted that in both cases the amount of code didn't necessarily decrease. The point
	of refactoring is rarely to decrease the amount of code, but rather to improve the design of
	the current code.
	</p>

	<p>
	Whichever method you choose is contingent upon the code you're working with. A couple common
	use cases are:
	</p>

	<ul>
	<li>
	If you're trying to remove duplicated <code>switch</code> statements in separate
	classes/objects/files/scopes, try to use polymorphism
	</li>
	<li>
	If you're trying to improve performance, use a hash table
	</li>
	<li>
	If you're trying to improve readablity only, use a hash table
	</li>
	<li>
	If your <code>switch</code> statement is embarrasingly huge and only used in one place,
	use a hash table (or figure out why it's so huge and fix that)
	</li>
	</ul>

Refactoring a switch statement[source]

Refactoring a switch statement_[source]