Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.

Thursday, March 29, 2012

#23 - Premature Optimization Part 2: Measure Twice Cut Once

Often when considering Knuth's "premature optimization" quote you will be tempted to take that as a warning to never optimize in code. Knuth's position wasn't that optimizations are evil, just doing them before you have any data or analysis to back them up is evil 97% of the time. What we thought may have been a quick optimization to reduce the number of modulus operations may actually backfire in many situations.

First though, we need to think through some of the numbers in the typical example of FizzBuzz. In that calculation there are potentially four modulus operations performed for each number. Multiply that by one hundred numbers and we get an upper bound of four hundred potential operations. The problem is that there are four different places that we may exit the loop, most of them result in less modulus operations. Let's consider the first exit point, numbers that are divisible by 3 and 5. In that case we exit the loop and we only do half as many operations. Multiply that by one hundred and we have a lower ideal bound of two hundred. So the total number of operations will be between two and four hundred.

Normally for a large program we cannot perfectly count the possible paths through the main calculation loop. Sometimes the choices are based on things external to the loop, like how much disk space we have left or how much data has been broadcast over the network. The next problem is usually that the problem space is so incredibly large there is no way we can possibly enumerate the entire solution. Usually a software engineer would have to resort to statistics to model what a typical scenario may be. For FizzBuzz all of loop has no external considerations, and the problem space is extraordinarily small for today's machines: only 100 different loops. For comparison your typical video game image has hundreds of calculations per pixel, and there nearly a million pixels on a screen. And at nearly 60 images per second one hundred iterations is tiny.

So for each of the steps of the loop we need to count how many times each is triggered. There are six different cases where the number is divisible by three and five. Then there are twenty cases where the number is divisible by five, but six have already been counted leaving us with fourteen. Then there are thirty-three numbers between one and one hundred divisible by three, discounting the six that are also divisible by five leaves us with twenty-seven cases remaining. The rest of number account for fifty-three numbers.
For each step a different amount of operations is also performed. For the first case, there are two in the unoptimized form. For the second, three, and for the third and fourth there are four.

6*2 + 14*3 + 27*4 + 53*4 = 374 operations.

Considering the case where we changed the first modulus to be fifteen we drop one modulus from each of the operations, yielding 100 less operations for a total of 274 operations, trimming 28% of the operations, but still 37% away from the ideal.

The irony of this premature operation is that if we let the compiler and the runtime do the optimizations and just code the initial version the compiler would have found the optimal solution requiring 200 calculations. The Java compiler that initially compiles the code will totally miss the optimization. But Java has two compilers, one that runs directly against the code and one that runs as late as possible, or Just In Time (the compiler can run even later than that, but the how and why are topics that are much to advanced for simple FizzBuzz solutions).

- |- -| - +

In this implementation of FizzBuzz we will recreate one possible compiler optimizations for FizzBuzz called a "Common Subexpression Elimination." Don't worry, this wont be on the test.

public class TypicalCommonSubexpression {

public static void main(String[] args) {
  for (int i = 1; i <= 100; i++) {
    boolean fizz = i % 3 == 0;
    boolean buzz = i % 5 == 0;
    if (fizz && buzz) {
      System.out.println("FizzBuzz");
    } else if (fizz) {
      System.out.println("Fizz");
    } else if (buzz) {
      System.out.println("Buzz");
    } else {
      System.out.println(i);
    }
  }
}

}

When Java's second "JIT" compiler does run it looks at every conceivable permutation of the code and tries to remove redundancies and re-order operations to make the microchip happy and zippy. One of the checks it does is looking for things that can never change, but are calculated multiple times. When it finds one of those cases it will remember the result of the calculation and use it again later. In this case if we had kept the original program then the compiler would have seen that we are calculating the modulus of 3 and 5, store the result, and then use it where it would have been before.

This program always executes both modulus operations per cycle through the loop, resulting in a total of 200 operations, which was my ideal minimum. By taking a clever shortcut early in the coding process and replacing the first test with a modulus of fifteen the developer would have actually prevented the compiler from doing an optimization that would have resulted in an optimal solution. The optimization made at the last possible moment would have been the correct one.

No comments:

Post a Comment