Evolving software and improving algorithms

Posted on 13. Jan, 2011 by Guilherme Silveira in agile

Recently Bob Martin has posted a series of transformations that aid developers whilst implementing an algorithm. Following steps and in order to improve our code and achieve something is key to refactoring, TDD and other tools are out there that gives up that kind of feedback.

Bob posts a few questions that are important for the community to find better ways to solve problems. One of them is whether it is possible to find an ordered set of simple transformations that, applied in a series, favoring the ones at the top whenever possible, until all tests are passing, gives us the best algorithm?

An important point is that simple steps might give the best solution but do not guarantee it, as it envolves knowledge from the developer that he might not have prior to that.

It is better (or simpler) to change a constant into a variable than it is to add an if statement. So when making a test pass, you try to do so with transformations that are simpler (higher on the list) than those that are more complex.

Simple and better are not the same concept, so it might be simpler to change a constant into a variable but it might be better to add the if statement. In an, at least, amazing example, Bob shows us how one priority last is capable of giving the quicksort result to a sorting problem, while not using it might lead the developer to bubble sort (known to be worse, although one can argue that it is simpler to understand).

The priority list used for this example favors stateless code, code that does not change a variable pointer, as variable assignment is the lowest priority on the list. Another question posed is whether this priority list is “the best one”, and Bob mentions that it probably is not, eventhough it led to a great solution to the sample problem. In order to find a counter example for this priority list, one needs to pick an algorithm where maintaining state is important for a better solution which again, might, might not be the simplest. Calculating fibonacci(n) or other dynamic programming problems have this characteristic.

Taking Fibonnaci as an example and the following set of tests:

  1. f(0) =1
  2. f(1) =1
  3. f(2) =2
  4. f(3) =3
  5. f(4) =5
  6. f(5) =8

Following the sample priority list and implementing the constant rule (2nd rule), both tests f(0) and f(1) pass:

int f(int k) {
  return 1;
}

Adding an if (6th conditional rule) solves f(2):

int f(int k) {
  if(k<2) return 1;
  return 2;
}

Looking closely on the list priority list, one will find that adding an if is top priority than doing recursion so the solution would be the usage of the 6th rule again, thus the solution for all prior tests and f(3) is now:

int f(int k) {
  if(k<2) return 1;
  if(k==2) return 2;
  return 3;
}

Following the same priority list, it turns out that adding extra ifs is always a perferable step than recursion. As a single step, adding an if is simpler than recursion, but seen as a series of steps we are aware it is not, that’s why refactoring is important – allowing the change of the simple step decisions that were made in the past. In this example, finding the simplest solution is not the same as following the steps from that specific priority list. Still, we can rearrange the list and check if favoring recursion would give us the best solution:

int f(int k) {
  return k<2 ? 1 : f(k-1) + f(k-2);
}

Using the one where recursion had precedence over the ifs, one is able to end up with a simple solution to all tests – possibly the simplest solution ever. But recursion is known not to be the fastest solution for fibonacci. Following simple steps and solving our problem gave us a solution which is not the best one.

That only means that following the two lists do not guarantee the best (or the simplest) solution. Those two lists guided the developer through simple steps, which is positive thing, together with prior developer knowledge of the problem, its possible solutions and refactoring, one can improve the resulting code.

Creating an array (7th rule), during a loop (10th rule) and attributing different values to it than the default one (12th rule) would give the best (as in fastest – in the same way as the quicksort x bubblesort) solution. That means using a completely different list of priorities, a more dynamic one.

One could argue that assigning value to the array, overriding its default value, is not the use of the 12th rule (although it has already broken the priority list when following the 7th and 10th). In this case, other dynamic programming problem which envolves reassigning values to an array, such as the knapsack, would result in sub-optimal solutions, because the optimal solution involes reassigning values to variables (copying arrays would be suboptimal, again).

What’s more, when you pose a test, you try to pose one that allows simpler transformations rather than complex transformations; since the more complexity required by the test the larger the risk you take to get that test to pass.

Simple steps differ from simple and better solutions: the simplest step to take might not lead the code neither to the simplest nor the best solution, but they aid the developer to not take a huge step which might be hard to step back from. Together with refactoring, as described in Bob’s first paragraph, and problem solving knowledge, a priority list is a good tool to help avoiding some pitfalls, as shown on the bubble sort example.

The functions that takes the current code, the current set of tests, and a new test case, giving us the best, simplest and simple step change that solves the new set of tests always depend on the current code.

The impasse problem is somehow related to how TDD is done during DOJO sessions, where the steps are so small that they might not give you the direction to the best and simplest algorithm and when the developer realizes it, it is too late, all the code needs to be rewritten. This could have been avoided if the steps were not so small; instead of baby ones, children ones, or if the code was under constant refactoring. Test first, refactor all the time and priorizing simple steps over complex ones are important for good software quality and algorithm implementation, but they still rely on the developer knowledge. Someone who never implemented the knapsack problem might not come up with the solution in a reasonable timeframe, but the three practices (suggested in different parts of time by Bob Martin) will be there to aid the developer find his way.

Tags: , , , , , , , , , , ,

3 Responses to “Evolving software and improving algorithms”

  1. James Hart

    02. Feb, 2011

    I’m still troubled by the idea that a series of specific tests can in any way lead you to the correct algorithm, without you bringing to bear at least some knowledge of the algorithm you’re trying to implement. The trouble is, that sequence of integers you’ve specified in your tests could result from any number of algorithms other than fibonacci (see http://oeis.org/search?q=1%2C1%2C2%2C3%2C5%2C8&language=english&go=Search for a list). Specific examples don’t automatically generalise in only one particular way.

  2. Guilherme Silveira

    02. Feb, 2011

    Hi James,

    Totally agreed. The example here is a counter-proof for the TPP as given on Uncle Bob’s first and second posts. Our example is in, no way, a proof that such a list exists. Sorry if it was not clear in the post.

    Mauricio Aniche mentioned the same idea: would someone that has never heard of quicksort implement it by following a specific set of rules, for a specific set of tests, for a specific language? And as you pointed out, would the resulting algorithm even be the correct one?

    As a mathematician, I can imagine that such a function might exist. Given the current code and the problem set, the function gives you back a new piece of code that solves the problem in the best way. Anything is possible. But the odds of finding such a function are probably really small…

    Regards.

  3. James Hart

    03. Feb, 2011

    Oh, I sensed you were trying to provide a counter-example. As I said, I’m fundamentally troubled that Bob Martin is proposing the existence of just such a function, and kind of disappointed that he seems to believe he’s onto something.

    The problem, I guess, is this fixation in unit testing on saying ‘test a few arbitrary inputs, test a few boundary conditions, and assume it works for everything else’. That’s just not good enough.

    One promising alternative is some sort of analysis technology like Pex (http://research.microsoft.com/en-us/projects/pex/). A while ago, Reg Braithwaite wrote an excellent post about the problem of the difference between the ‘best’ implementation of an algortihm and an ‘obviously correct’ implementation: https://github.com/raganwald/homoiconic/blob/master/2010/08/final.md – in which he said “It seems to me that it would make programs far more readable if you could use implementations as a kind of method contract: ‘All implementations of this method will behave just like this.’”

    Well, with a tool like Pex that actually becomes possible. Pex will explore the implementations looking for inputs that expose the boundary conditions and behavioral variations of two implementations and seek out inputs that result in different output. For a great example of this in action, try out some of the coding duels on Pex for Fun (http://www.pexforfun.com/).

    Maybe with some combination of a manually-coded ‘naive but obviously correct’ implementation, Pex to generate the next failing test, and Uncle Bob’s magic list of rules, you really could totally automate the process of generating the most optimal solution…

    … but I doubt it.

Leave a Reply