Blog: 2009

Saturday, October 3, 2009

On solving Project Euler - Part 3

In the first two parts of this post (first one, second one), we focused on Project Euler problems that could easily be solved using naive algorithms and adequate programming language. But this is just a small part of the problems available on the website. Fortunately, far more interesting problems are waiting for you in Project Euler challenges database, and you will need more mathematical and technical skills to solve them.

This time, I would like to focus on a category of problems almost as simple as the previous ones, but for which you will have to resort to dynamic programming in order to solve them. If you do not know what dynamic programming is, here's the definition from Wikipedia:

In mathematics and computer science, dynamic programming is a method of solving complex problems by breaking them down into simpler steps. It is applicable to problems that exhibit the properties of overlapping subproblems and optimal substructure.

Let's take the simple example of the Fibonacci numbers. The n^th number of Fibonacci, is defined by the recurring function F_n = F_n-1 + F_n-2, with F₀ = 0 and F₁ = 1. Implementing such a function is very simple in any language, but unfortunately it is impractical due to performance issues. Indeed, the complexity of this algorithm is exponential. Thus, trying to compute values after F₃₀ starts to take a lot of time...
By writing on paper the process to compute a small Fibonacci number such as the 5^th one, you would quickly notice that intermediate values are computed several times, which is a loss of time. This is where dynamic programming comes to our help!

A first approach, the bottom-top one, would be to always remember F_n-1 and F_n-2, starting with 0 and 1. Then, we would compute the next value, and store it as the new F_n-1, and shift the previous F_n-1 in F_n-2. Repeating this process until F_n would give us its value in time complexity O(n), which is far better than the exponential complexity of the naive implementation we saw before. Here's how you could implement this approach using Clojure lazy infinite sequence:

Now that we have an efficient implementation of the Fibonacci sequence, the solution to problem 2 is straightforward:

Another approach to the Fibonacci function implementation is the top-bottom one. This approach basically consist in memoizing intermediate results so that they can be reused in later steps of the computation. Language such as Factor make this really easy, by providing native support of memoization with the keyword MEMO:, as you can see in the following code:

We can now make use of this function in order to solve problem 25 with the following code:

Now that we saw how dynamic programming could be used to solve trivial problems related to recursively defined number such as Fibonacci numbers, let's take a more interesting example with the problem 81, explained as follows:

In the 5 by 5 matrix below, the minimal path sum from the top left to the bottom right, by only moving to the right and down, is indicated in bold red and is equal to 2427.
Find the minimal path sum, in matrix.txt, a 31K text file containing a 80 by 80 matrix, from the top left to the bottom right by only moving right and down.

Considering the previous matrix as a graph, each cell being a node, and each possible right or down move being the edges, this problem becomes a shortest-path problem.
A naive approach would be to try every single path... and if you already solved the problem 15, you would know that there are 160!/(80!80!) of them! For information, it is a 47-digit number.
Obviously using this naive algorithm is a wrong way, as it doesn't scale at all.
But again, this is a field where dynamic programming can help us a lot, with Dijkstra's algorithm. This algorithm is a form a dynamic programming, as it calculates the shortest path from an intermediate node to another node reusing the previously calculated shortest path from the start node to this intermediate node.
Having the graph data as a matrix stored in a variable m, the implementation is quite obvious in Ruby:

Finally, the shortest path is given by the accumulated value in most the bottom right cell (m[79][79]).

We saw in this post how dynamic programming can radically improve the performance of certain type of algorithms, which wouldn't scale at all on large data sets. This method can be applied to several Project Euler challenges, as we saw in problems 2, 25 and 81. The basic way of proceeding is to find sub-problems which can reuse intermediate results leading naturally to the final result.
In the next post, I will introduce other general approach that can be used to solve Project Euler challenges.

Sunday, September 27, 2009

On solving Project Euler - Part 2

In the previous post, we saw that some of the Project Euler challenges could be solved using naive brute-force algorithms directly transcribed from their description. Today, I would like to introduce another category of problems, which is an extension to the previous one: problems that can be easily solved using mathematical software.

Some of the Project Euler challenges target mathematical fields using very large numbers: for example factorials. As you might know, in programming languages such as C, all the built-in mathematical operations can only be performed on a limited set of numbers: int, long, etc. In order to perform operations on bigger numbers, you will have to rely on specific libraries such as GNU MP Bignum, and use a different syntax (no more "+" or "*" operators). This results in ugly code, and more trouble only to write simple algorithms.

For this category of problems, if you still want to implement naive algorithms, you should choose a programming language which naturally handles big numbers. Actually, most of the "recent" programming languages like Ruby, Clojure and Factor can handle them without you notice it.

Let's take the problem 16 as a first example:

2¹⁵ = 32768 and the sum of its digits is 3 + 2 + 7 + 6 + 8 = 26.

What is the sum of the digits of the number 2¹⁰⁰⁰?

One can imagine that 2 raised to the 1000^th power is a very large number: indeed, it actually has 603 digits! This is far more that what standard the standard long type can handle in C: 2³² -1 (4,294,967,295).
Fortunately, it is a peace of cake for Ruby to handle these numbers. As you will see in the source code below, no special syntax is required to perform operations like additions on large numbers, you can use the usual operator "+":

I would like to give a special mention to the J programming language which, apart from handling large numbers as easily as Ruby, also provides a very concise syntax when it comes to number processing, as shown in the following code (solution to the previous problem):

Another problem that can be solved easily with languages that naturally handle large number is the problem 20:

n! means n x (n - 1) x ... x 3 x 2 x 1

Find the sum of the digits in the number 100!

The pattern of this challenge being the same as the 20^th, the solutions in Ruby and J are quite straightforward again:

If you give a quick look at the challenges list, you will notice that many of them involve prime numbers. More than large number handling, these prime numbers are a big reason to choose one language over another in order to solve certain problems. For example, J provides built-in functions to compute efficiently prime numbers. Of course, you could use libraries like mathn in Ruby, or implement the functions that you need in Clojure, but it would just add some more trouble, and usually the efficiency can't be compared to the native functions provided by J, although it is not always an issue. So let's test the power of J by solving the problem 7:

By listing the first six prime numbers: 2, 3, 5, 7, 11, and 13, we can see that the 6^th prime is 13.

What is the 10001^st prime number?

J built-in function p: returns the n^th prime number, so the solution can't be more simple than:

In the same field, Mathematica is incredibly powerful too, but unfortunately it is a proprietary and quite expensive system... It also provides a built-in function to get the n^th prime number, and again the following code is enough to solve the previous problem:

The problem 10 is also one of those which requires prime number computation, and can be solved using a naive brute-force algorithm:

The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17.

Find the sum of all the primes below two million.

J provides the inverse function of p:, which is p:^:_1 (you should read it p:^-1): the mathematical function called pi, which gives the number of primes less than a given number n. It can be used in the following way to solve this challenge:

Again, Mathematica provides several functions related to prime numbers, such as PrimeQ, a prime testing function, which can be used to solve the same problem in a slightly different way:

Mathematica useful functions are not limited to the field of prime numbers. The problem 65, dealing with continued fractions, is another nice example showing how good Mathematica is at solving Project Euler in a clean way. Here's the self-explained source code:

We saw in this post that the field related to the challenge can drive our choice of language to solve it in a very simple way, for example with the native and transparent support of big numbers, or with powerful built-in functions to deal with prime numbers and continued fractions.
In the next post, I would like to introduce some simple and common ways of solving problems more efficiently than naive brute-force.

Sunday, September 20, 2009

On solving Project Euler - Part 1

For those of you who have never heard about it, Project Euler is a website publishing almost every week new programming and mathematical challenges. It has been almost one year since I discovered it, and I must tell that I have really become addicted to it.
I would recommend it to any programmer who do not hate mathematics, as it can be used for several purposes: challenging your mathematical and programming skills, learning new mathematics concepts, and it is especially a good practice for learning a new programming language. Of course, it won't help you to become better at designing software, but it might help help you to improve your algorithm writing skills.

As I said, both mathematical and programming skills are needed to solve Project Euler problems. While some of them might be solved with just a simple brute force algorithm, or on the opposite, with a clever formula, most of the problems will require to use both your mathematical and algorithm implementation skills.

On the listing page, the problems are ordered by publishing date. Usually, trying to solve them in this order is better, as concepts introduced in older problems are often reused in newer problems, and it will make them easier to solve than just starting from zero.
The listing can also be sorted by the difficulty of the problems (you will need to register), which is determined by the number of people who solved them: indeed, once you solved a problem, you can input your answer in a form and verify if it is correct. In the case it is, you will then be able to access the forum thread of this problem, where people usually post their solution. This is a very interesting feature of Project Euler, because you can see what are the other algorithms that can be used, in several programming languages.

We could also classify the problems in different categories based on how they can be solved, and what are the most efficient languages to solved them.

A first category could be the one of problems which can be solved by implementing the obvious algorithm transcribed from the problem itself. An important point that I didn't mention before is that all the problems have been designed according to a "one-minute rule", meaning that there always exist an algorithm that can solve the problem in less that one minute even on modest hardware. Although this rule will usually lead you to rethink your first (slow) brute-force algorithm in order to make your solution comply to it, some problems can just be brute-forced as I said before.

The problem 1 is a nice and simple example from this category:

If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.

Find the sum of all the multiples of 3 or 5 below 1000.

An obvious brute-force algorithm in Ruby could be:

This simple algorithm comply with the one-minute rule, but might not scale well for bigger values, for which we might need a better algorithm or formula. In same same naive way, the following Clojure code gives the answer to the problem 6:

This problem can also be solved with the following Factor code:

We will see in the next posts the other categories of challenges available on Project Euler, and the way to solve them efficiently. Indeed, implementing brute-force algorithms is not the real fun of Project Euler, and is really not what you need to improve your problem solving skills!

As an exercise, you can try to solve the problem 187, which seem simple to solve with a naive brute-force algorithm, but there is little chance that it complies with the one-minute rule!
A more clever algorithm can solve this problem in seconds, and a more advanced one can solve it in an instant...

Saturday, September 12, 2009

Takeshi's Komanechi University Mathematics

I suppose some of you might already know Takeshi Kitano as a filmmaker, who shot great movies like Sonatine and Kikujiro.

Here in Japan, he is very famous, but not for his movies. More that a filmmaker, he is popular as a comedian and appears a lot on TV, know as Beat Takeshi.

But what you might not know, even living in Japan, is the Math Kitano the mathematician. Indeed, during his childhood, Takeshi's mother was very strict concerning education. He went to the renown Meiji University and usually says that he would have liked to become a mathematician if he didn't become a comedian.

This passion for mathematics gave birth in 2006 to the TV program "Takeshi's Komanechi University Mathematics" (たけしのコマネチ大学数学科).
Aired every Thursday between 25:15 and 25:45 on Fuji Television, this program aims at showing the charm of mathematics while being as well an entertainment program.
Every week, three teams are trying to solve a problem (of university entry level): Takeshi, aka "Math Kitano", two pretty student girls from Tokyo University, and a a group of four comedians.

Surprisingly, Math Kitano performs very well and sometimes outperforms Tokyo University students, especially in geometry related problems, which must be one of the reasons why the way he shots his movies is so geometric.
On the other side, the team of four comedians always use very weird but funny ways to solve problems, and it is quite enjoying to watch, even from a mathematician point of view.

After discovering this TV program, I just became addicted, and found myself solving these problems every week. I think this is one good way to keep one's mathematics abilities, and I would recommend it to any people living in Japan.

To give you a taste of it, here is a link the some problems on my Picasa galleries:

たけしのコマネチ大学数学科

And here's a link to a YouTube search results page.

I hope I will find some time solve problems in the next posts.

Saturday, August 22, 2009

Think twice before naming your new programming language

Smart people invent new programming languages, with powerful features, that perform really well. What else do we need?

When you want to learn and use these new programming languages, you might want to buy books or to search for tutorials, code examples, blog articles, etc.

There comes the name: you might have plenty of good reasons for choosing a certain name for your programming language, but before deciding on it please think about people who are willing to use it.

I am one of those: recently, I am learning the Factor and J programming languages, or I should say "I am trying to learn". I suppose that you can imagine my frustration when I try to search for information using Google, or for books using Amazon.

Just think about what kind of results you will get when looking for something related to Factor on Google...

Of course you could tell me that there is the Factor documentation. But I don't want to limit myself to the official documentation: I want to read books if they exist, I want to read code snippets shared by other people, I want to read blog posts, search on GitHub (Factor is not registered yet), and the reality is that I just can't. This is so frustrating, especially for language such as J, which are even more difficult to search for, and I don't need to explain you why...

Some work have been done to make C and C++ easy to search on the internet, but nothing yet for J yet. I think the same goes for R, and if I read this wiki page, it seems that I could quote almost the whole alphabet!

So please, I beg you, the next time you create a great programming language such as Factor or J, think carefully about a name that will be easy to search for on the internet. Do not use a one-character or two-character name, only use characters from the Latin alphabet, do not use a common word, and just make it as unique as possible.

Sunday, July 26, 2009

Project Euler and OCaml prime number list generator

Half a year has passed... The main reason why I didn't write anything concerning Ruby golfing is mainly because I stopped practicing quite a while ago, so I do not have any fresh tips to write about now.

Recently, I am mostly spending my time learning the programming languages Clojure and Factor, using them to solve Project Euler problems. Clojure reminds of Scheme, but I am very new to the concatenative paradigm of Factor, as I never learned Forth or any similar language.
Providing a fairly succinct syntax, they perform very well, which is the main reason why I'm now using them over Ruby.

Some weeks ago, this article in which are analyzed the results of the programming language benchmarks game made me want to use again the OCaml language, that I had previously learned and used during my studies at the university. This analysis shows it as an almost ideal programming language in terms of code size vs performance trade-off.

Several problems of the Project Euler deal with prime numbers, so I needed some code to generate them. Unfortunately, I couldn't find any simple prime generating code on the internet: it seems that finding OCaml code snippets is not an easy task...
Here is quite simple implementation of the Sieve of Eratosthenes which is quite fast (even when running only bytecode) and should be enough for most of the problems that require to use prime numbers in the int range:

Which will for example produce the following result:

# let primes = primes_upto 4000000 in
  List.nth primes ((List.length primes) - 1);;
- : int = 3999971

For problems requiring bigger primes, but not an exhaustive list of them, using algorithms such as the Miller-Rabin primality test is a better choice.

Sunday, January 18, 2009

Code Golfing in Ruby - part 1

About two years ago, while looking for some material about Ruby programming, I came across some website related to golf. If this is the first time you hear about golfing as a subject related to programming, which we can shortly define as "trying to solve a coding problem using the least number of keystrokes". I only know two golf related websites still in activity:

- anarchy golf, which is quite active, where one can challenge the problems with any of the 66 supported languages,
- Codegolf, which supports only 4 languages (Perl, PHP, Python and Ruby). Even if my golfing experience is quite short, I still would like to introduce some tips for beginners to get started in golfing. Some people might say that code golfing is a bad practice, but I think there are some useful things to learn from it.

When you start solving a problem, you will firstly implement an obvious algorithm, and then try to shorten your code the most you can. Then you might start to look for alternative syntax with the same meaning, for useless processing, and maybe think about another way to solve the problem, etc.

During all this process, you will first learn a lot about the syntax of the language you are using to solve you problem. Do not misinterpret my words: I really do not recommend you to practice golfing in order to learn a new language! In this case, I would better recommend to solve problems from the Project Euler. What I am trying to say is that to solve a golfing problem, you will certainly have to explore the language syntax, and implement multiple algorithms, which is a profitable experience.

Now let's get to the point of this post, which is to introduce these ruby golfing tips.

In this first post, we will focus on making use of predefined variables.
You will find in this ruby language quick reference page that there are a few predefined variables with a short name starting by the dollar sign. Let's have a look at the following ones:

$/  # The input record separator, newline by default.
$*  # Command line arguments given for the script sans args.
$$  # The process number of the Ruby running this script.

As written in this documentation, $$ is a non-assignable variable containing a number which is usually bigger than 1000. Let's say that you need to iterate over a certain large number of times, and you need the number of the current iteration. In this case, you can use this already initialized variable as follows:

$$.times{|i|}

# here's the value I have on my current irb session:
rb(main):001:0> p$$
16434
=> nil

#You can also make a big range like this:
(0..$$)

Then suppose than you need to write a newline character into you script, using the predefined variable $/ will be one byte shorter than "\n".

irb(main):001:0> p$/
"\n"
=> nil

The predefined non-assignable variable $*, which contains the command line argument can be used when you need an empty array and can limit the number of calls to it. The advantage of this variable is that it contains an already initialized array. In a normal case, you would need to initialize an array like this: a=[]. Even if the variable name "$*" is 2 bytes long, it can be useful if you find a way to use it without having to name it too often. Look by yourself in irb:

irb(main):001:0> p$*
[]
=> nil

It was a little short for this first part, but I hope some people will find it useful. You can look for other predefined variables, and imagine the ways you can use them properly to shorten your code length, and I am pretty sure there is still a lot to do in this field.

See you soon for the second part of this article!

Thursday, January 15, 2009

First post

One of my new year's resolutions is to create this blog, and share my (still short) experience in the IT field with all of you. As the language I use the most is Ruby (my favorite one), it might be the main subject of my posts. Still, as I am quite a curious person, there will surely be exceptions, and I hope I will be able to produce some useful content.

To introduce myself in a few words, I am 24, French, currently living and working in Japan. My work consists basically in developing and maintaining a log management software in Java.