Saturday, October 3, 2009

On solving Project Euler - Part 3

In the first two parts of this post (first one, second one), we focused on Project Euler problems that could easily be solved using naive algorithms and adequate programming language. But this is just a small part of the problems available on the website. Fortunately, far more interesting problems are waiting for you in Project Euler challenges database, and you will need more mathematical and technical skills to solve them.

This time, I would like to focus on a category of problems almost as simple as the previous ones, but for which you will have to resort to dynamic programming in order to solve them. If you do not know what dynamic programming is, here's the definition from Wikipedia:
In mathematics and computer science, dynamic programming is a method of solving complex problems by breaking them down into simpler steps. It is applicable to problems that exhibit the properties of overlapping subproblems and optimal substructure.
Let's take the simple example of the Fibonacci numbers. The nth number of Fibonacci, is defined by the recurring function Fn = Fn-1 + Fn-2, with F0 = 0 and F1 = 1. Implementing such a function is very simple in any language, but unfortunately it is impractical due to performance issues. Indeed, the complexity of this algorithm is exponential. Thus, trying to compute values after F30 starts to take a lot of time...
By writing on paper the process to compute a small Fibonacci number such as the 5th one, you would quickly notice that intermediate values are computed several times, which is a loss of time. This is where dynamic programming comes to our help!

A first approach, the bottom-top one, would be to always remember Fn-1 and Fn-2, starting with 0 and 1. Then, we would compute the next value, and store it as the new Fn-1, and shift the previous Fn-1 in Fn-2. Repeating this process until Fn would give us its value in time complexity O(n), which is far better than the exponential complexity of the naive implementation we saw before. Here's how you could implement this approach using Clojure lazy infinite sequence:

Now that we have an efficient implementation of the Fibonacci sequence, the solution to problem 2 is straightforward:

Another approach to the Fibonacci function implementation is the top-bottom one. This approach basically consist in memoizing intermediate results so that they can be reused in later steps of the computation. Language such as Factor make this really easy, by providing native support of memoization with the keyword MEMO:, as you can see in the following code:

We can now make use of this function in order to solve problem 25 with the following code:

Now that we saw how dynamic programming could be used to solve trivial problems related to recursively defined number such as Fibonacci numbers, let's take a more interesting example with the problem 81, explained as follows:

In the 5 by 5 matrix below, the minimal path sum from the top left to the bottom right, by only moving to the right and down, is indicated in bold red and is equal to 2427.

Find the minimal path sum, in matrix.txt, a 31K text file containing a 80 by 80 matrix, from the top left to the bottom right by only moving right and down.

Considering the previous matrix as a graph, each cell being a node, and each possible right or down move being the edges, this problem becomes a shortest-path problem.
A naive approach would be to try every single path... and if you already solved the problem 15, you would know that there are 160!/(80!80!) of them! For information, it is a 47-digit number.
Obviously using this naive algorithm is a wrong way, as it doesn't scale at all.
But again, this is a field where dynamic programming can help us a lot, with Dijkstra's algorithm. This algorithm is a form a dynamic programming, as it calculates the shortest path from an intermediate node to another node reusing the previously calculated shortest path from the start node to this intermediate node.
Having the graph data as a matrix stored in a variable m, the implementation is quite obvious in Ruby:

Finally, the shortest path is given by the accumulated value in most the bottom right cell (m[79][79]).

We saw in this post how dynamic programming can radically improve the performance of certain type of algorithms, which wouldn't scale at all on large data sets. This method can be applied to several Project Euler challenges, as we saw in problems 2, 25 and 81. The basic way of proceeding is to find sub-problems which can reuse intermediate results leading naturally to the final result.
In the next post, I will introduce other general approach that can be used to solve Project Euler challenges.