Saturday, October 3, 2009

On solving Project Euler - Part 3

In the first two parts of this post (first one, second one), we focused on Project Euler problems that could easily be solved using naive algorithms and adequate programming language. But this is just a small part of the problems available on the website. Fortunately, far more interesting problems are waiting for you in Project Euler challenges database, and you will need more mathematical and technical skills to solve them.

This time, I would like to focus on a category of problems almost as simple as the previous ones, but for which you will have to resort to dynamic programming in order to solve them. If you do not know what dynamic programming is, here's the definition from Wikipedia:
In mathematics and computer science, dynamic programming is a method of solving complex problems by breaking them down into simpler steps. It is applicable to problems that exhibit the properties of overlapping subproblems and optimal substructure.
Let's take the simple example of the Fibonacci numbers. The nth number of Fibonacci, is defined by the recurring function Fn = Fn-1 + Fn-2, with F0 = 0 and F1 = 1. Implementing such a function is very simple in any language, but unfortunately it is impractical due to performance issues. Indeed, the complexity of this algorithm is exponential. Thus, trying to compute values after F30 starts to take a lot of time...
By writing on paper the process to compute a small Fibonacci number such as the 5th one, you would quickly notice that intermediate values are computed several times, which is a loss of time. This is where dynamic programming comes to our help!

A first approach, the bottom-top one, would be to always remember Fn-1 and Fn-2, starting with 0 and 1. Then, we would compute the next value, and store it as the new Fn-1, and shift the previous Fn-1 in Fn-2. Repeating this process until Fn would give us its value in time complexity O(n), which is far better than the exponential complexity of the naive implementation we saw before. Here's how you could implement this approach using Clojure lazy infinite sequence:

Now that we have an efficient implementation of the Fibonacci sequence, the solution to problem 2 is straightforward:

Another approach to the Fibonacci function implementation is the top-bottom one. This approach basically consist in memoizing intermediate results so that they can be reused in later steps of the computation. Language such as Factor make this really easy, by providing native support of memoization with the keyword MEMO:, as you can see in the following code:

We can now make use of this function in order to solve problem 25 with the following code:

Now that we saw how dynamic programming could be used to solve trivial problems related to recursively defined number such as Fibonacci numbers, let's take a more interesting example with the problem 81, explained as follows:

In the 5 by 5 matrix below, the minimal path sum from the top left to the bottom right, by only moving to the right and down, is indicated in bold red and is equal to 2427.

Find the minimal path sum, in matrix.txt, a 31K text file containing a 80 by 80 matrix, from the top left to the bottom right by only moving right and down.

Considering the previous matrix as a graph, each cell being a node, and each possible right or down move being the edges, this problem becomes a shortest-path problem.
A naive approach would be to try every single path... and if you already solved the problem 15, you would know that there are 160!/(80!80!) of them! For information, it is a 47-digit number.
Obviously using this naive algorithm is a wrong way, as it doesn't scale at all.
But again, this is a field where dynamic programming can help us a lot, with Dijkstra's algorithm. This algorithm is a form a dynamic programming, as it calculates the shortest path from an intermediate node to another node reusing the previously calculated shortest path from the start node to this intermediate node.
Having the graph data as a matrix stored in a variable m, the implementation is quite obvious in Ruby:

Finally, the shortest path is given by the accumulated value in most the bottom right cell (m[79][79]).

We saw in this post how dynamic programming can radically improve the performance of certain type of algorithms, which wouldn't scale at all on large data sets. This method can be applied to several Project Euler challenges, as we saw in problems 2, 25 and 81. The basic way of proceeding is to find sub-problems which can reuse intermediate results leading naturally to the final result.
In the next post, I will introduce other general approach that can be used to solve Project Euler challenges.

Sunday, September 27, 2009

On solving Project Euler - Part 2

In the previous post, we saw that some of the Project Euler challenges could be solved using naive brute-force algorithms directly transcribed from their description. Today, I would like to introduce another category of problems, which is an extension to the previous one: problems that can be easily solved using mathematical software.

Some of the Project Euler challenges target mathematical fields using very large numbers: for example factorials. As you might know, in programming languages such as C, all the built-in mathematical operations can only be performed on a limited set of numbers: int, long, etc. In order to perform operations on bigger numbers, you will have to rely on specific libraries such as GNU MP Bignum, and use a different syntax (no more "+" or "*" operators). This results in ugly code, and more trouble only to write simple algorithms.

For this category of problems, if you still want to implement naive algorithms, you should choose a programming language which naturally handles big numbers. Actually, most of the "recent" programming languages like Ruby, Clojure and Factor can handle them without you notice it.

Let's take the problem 16 as a first example:

215 = 32768 and the sum of its digits is 3 + 2 + 7 + 6 + 8 = 26.

What is the sum of the digits of the number 21000?

One can imagine that 2 raised to the 1000th power is a very large number: indeed, it actually has 603 digits! This is far more that what standard the standard long type can handle in C: 232 -1 (4,294,967,295).
Fortunately, it is a peace of cake for Ruby to handle these numbers. As you will see in the source code below, no special syntax is required to perform operations like additions on large numbers, you can use the usual operator "+":

I would like to give a special mention to the J programming language which, apart from handling large numbers as easily as Ruby, also provides a very concise syntax when it comes to number processing, as shown in the following code (solution to the previous problem):

Another problem that can be solved easily with languages that naturally handle large number is the problem 20:

n! means n x (n - 1) x ... x 3 x 2 x 1

Find the sum of the digits in the number 100!

The pattern of this challenge being the same as the 20th, the solutions in Ruby and J are quite straightforward again:

If you give a quick look at the challenges list, you will notice that many of them involve prime numbers. More than large number handling, these prime numbers are a big reason to choose one language over another in order to solve certain problems. For example, J provides built-in functions to compute efficiently prime numbers. Of course, you could use libraries like mathn in Ruby, or implement the functions that you need in Clojure, but it would just add some more trouble, and usually the efficiency can't be compared to the native functions provided by J, although it is not always an issue. So let's test the power of J by solving the problem 7:

By listing the first six prime numbers: 2, 3, 5, 7, 11, and 13, we can see that the 6th prime is 13.

What is the 10001st prime number?

J built-in function p: returns the nth prime number, so the solution can't be more simple than:

In the same field, Mathematica is incredibly powerful too, but unfortunately it is a proprietary and quite expensive system... It also provides a built-in function to get the nth prime number, and again the following code is enough to solve the previous problem:

The problem 10 is also one of those which requires prime number computation, and can be solved using a naive brute-force algorithm:

The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17.

Find the sum of all the primes below two million.

J provides the inverse function of p:, which is p:^:_1 (you should read it p:-1): the mathematical function called pi, which gives the number of primes less than a given number n. It can be used in the following way to solve this challenge:

Again, Mathematica provides several functions related to prime numbers, such as PrimeQ, a prime testing function, which can be used to solve the same problem in a slightly different way:

Mathematica useful functions are not limited to the field of prime numbers. The problem 65, dealing with continued fractions, is another nice example showing how good Mathematica is at solving Project Euler in a clean way. Here's the self-explained source code:

We saw in this post that the field related to the challenge can drive our choice of language to solve it in a very simple way, for example with the native and transparent support of big numbers, or with powerful built-in functions to deal with prime numbers and continued fractions.
In the next post, I would like to introduce some simple and common ways of solving problems more efficiently than naive brute-force.

Sunday, September 20, 2009

On solving Project Euler - Part 1

For those of you who have never heard about it, Project Euler is a website publishing almost every week new programming and mathematical challenges. It has been almost one year since I discovered it, and I must tell that I have really become addicted to it.
I would recommend it to any programmer who do not hate mathematics, as it can be used for several purposes: challenging your mathematical and programming skills, learning new mathematics concepts, and it is especially a good practice for learning a new programming language. Of course, it won't help you to become better at designing software, but it might help help you to improve your algorithm writing skills.

As I said, both mathematical and programming skills are needed to solve Project Euler problems. While some of them might be solved with just a simple brute force algorithm, or on the opposite, with a clever formula, most of the problems will require to use both your mathematical and algorithm implementation skills.

On the listing page, the problems are ordered by publishing date. Usually, trying to solve them in this order is better, as concepts introduced in older problems are often reused in newer problems, and it will make them easier to solve than just starting from zero.
The listing can also be sorted by the difficulty of the problems (you will need to register), which is determined by the number of people who solved them: indeed, once you solved a problem, you can input your answer in a form and verify if it is correct. In the case it is, you will then be able to access the forum thread of this problem, where people usually post their solution. This is a very interesting feature of Project Euler, because you can see what are the other algorithms that can be used, in several programming languages.

We could also classify the problems in different categories based on how they can be solved, and what are the most efficient languages to solved them.

A first category could be the one of problems which can be solved by implementing the obvious algorithm transcribed from the problem itself. An important point that I didn't mention before is that all the problems have been designed according to a "one-minute rule", meaning that there always exist an algorithm that can solve the problem in less that one minute even on modest hardware. Although this rule will usually lead you to rethink your first (slow) brute-force algorithm in order to make your solution comply to it, some problems can just be brute-forced as I said before.

The problem 1 is a nice and simple example from this category:

If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.

Find the sum of all the multiples of 3 or 5 below 1000.

An obvious brute-force algorithm in Ruby could be:

This simple algorithm comply with the one-minute rule, but might not scale well for bigger values, for which we might need a better algorithm or formula. In same same naive way, the following Clojure code gives the answer to the problem 6:

This problem can also be solved with the following Factor code:

We will see in the next posts the other categories of challenges available on Project Euler, and the way to solve them efficiently. Indeed, implementing brute-force algorithms is not the real fun of Project Euler, and is really not what you need to improve your problem solving skills!

As an exercise, you can try to solve the problem 187, which seem simple to solve with a naive brute-force algorithm, but there is little chance that it complies with the one-minute rule!
A more clever algorithm can solve this problem in seconds, and a more advanced one can solve it in an instant...