Solving the Google Code Jam "countPaths" problem in Python
Wednesday, 16. August 2006, 13:54:26
I found this article about Haskell:
Solving the Google Code Jam "countPaths" problem in Haskell
Recently Guido van Rossum announced that Python is one of the supported languages in the next Google Code Jam. I decided to write a solution to the puzzle in Python. I haven't looked at the Haskell code. It is too strange anyway.(*) Neither looked at the C code from the other site - too long. Here it is in Python in 25 lines. And it is very fast.
Original Problem Statement here
The author of the Haskell article Tom Moertel investigates this extreme case:
"to find a word composed of 50 “A” letters within a 50×50 grid of “A” cells.".
Let us see if we remove the check for exceeding 1 billion what will happen?
The result is 303835410591851117616135618108340196903254429200 and this is the same
value that Tom Moertel found.
The calculation took whole 8 seconds on oooold Athlon 1000Mhz with Win XP.
Nice.
Well not really. The Google Code Jam rules are as Tom Moertel pointed out "All submissions have a maximum of 2 seconds of runtime per test case". If Goggle is running the tests on 4Ghz Athlon we will be almost in limits. But lets not take chances. We have to stop the run earlier. That unfortunately will increase our code size. One very straightforward solution is with exceptions:
And the result on 1Ghz is the blazing speed of:
So the conclusion is that Python can be used in Google's Code Jam, but one must be carefull with the time limits!
(*) Update. Some people are commenting on reddit my ignorance of the Haskell code. I actually learned most of Haskell some time ago, until I got to Monads. Then I found 267 articles expalining what Monads are from which 27 just tutorials. At that moment I tought "Enough! Maybe I will read it later.". The so called "Maybe" monad. That was 1.5 years ago.
Solving the Google Code Jam "countPaths" problem in Haskell
Recently Guido van Rossum announced that Python is one of the supported languages in the next Google Code Jam. I decided to write a solution to the puzzle in Python. I haven't looked at the Haskell code. It is too strange anyway.(*) Neither looked at the C code from the other site - too long. Here it is in Python in 25 lines. And it is very fast.
class WordPath:
def howMany(self, (x,y), word):
if x <>=self.N or y<0>=self.M or self.grid[x][y] != word[0]:
return 0
if len(word) == 1:
return 1
s = 0
for a in (x-1, x, x+1):
for b in (y-1, y, y+1):
if not (a == x and b ==y):
if (a, b, word) not in self.cache:
self.cache[ (a,b,word) ] = self.howMany( (a,b), word[1:])
s += self.cache[ (a,b,word) ]
return s
def countPaths( self, grid, word):
self.grid = grid
self.N = len(grid)
self.M = len(grid[0])
self.cache = {}
s = sum ( [ self.howMany( (x,y), word) for x in range (self.N)
for y in range(self.M)] )
if s > 1000000000:
s = -1
return s
#
test = WordPath()
assert 1 == test.countPaths( ("ABC","FED","GHI"), "ABCDEFGHI")
assert 108 == test.countPaths( ("AA","AA"), "AAAA")
assert 2 == test.countPaths( ("ABC","FED","GAI"), "ABCDEA")
assert 0 == test.countPaths( ("ABC","DEF","GHI"), "ABCD")
assert 56448 == test.countPaths( ("ABABA","BABAB","ABABA","BABAB","ABABA"), "ABABABBA")
assert -1 == test.countPaths( ("AAAAA","AAAAA","AAAAA","AAAAA","AAAAA"), "AAAAAAAAAAA")
Original Problem Statement here
The author of the Haskell article Tom Moertel investigates this extreme case:
"to find a word composed of 50 “A” letters within a 50×50 grid of “A” cells.".
Let us see if we remove the check for exceeding 1 billion what will happen?
print test.countPaths( [ "A"*50 for a in range(50)], "A"*50)
The result is 303835410591851117616135618108340196903254429200 and this is the same
value that Tom Moertel found.
The calculation took whole 8 seconds on oooold Athlon 1000Mhz with Win XP.
Nice.
Well not really. The Google Code Jam rules are as Tom Moertel pointed out "All submissions have a maximum of 2 seconds of runtime per test case". If Goggle is running the tests on 4Ghz Athlon we will be almost in limits. But lets not take chances. We have to stop the run earlier. That unfortunately will increase our code size. One very straightforward solution is with exceptions:
import timeit
class WordPath:
def howMany(self, (x,y), word):
if x <>=self.N or y<0>=self.M or self.grid[x][y] != word[0]:
return 0
if len(word) == 1:
return 1
s = 0
for a in (x-1, x, x+1):
for b in (y-1, y, y+1):
if not (a == x and b ==y):
if (a, b, word) not in self.cache:
self.cache[ (a,b,word) ] = self.howMany( (a,b), word[1:])
s += self.cache[ (a,b,word) ]
if s > 1000000000:
raise OverflowError ('spam', 'eggs')
return s
def countPaths( self, grid, word):
self.grid = grid
self.N = len(grid)
self.M = len(grid[0])
self.cache = {}
try:
s = sum ( [ self.howMany( (x,y), word) for x in range (self.N)
for y in range(self.M)] )
if s > 1000000000:
s = -1
return s
except OverflowError :
return -1
#
t = timeit.Timer(stmt='WordPath().countPaths( [ "A"*50 for a in range(50)], "A"*50)',
setup = 'from __main__ import WordPath')
print "%.2f sec/pass" % (t.timeit(number=100)/100)
And the result on 1Ghz is the blazing speed of:
0.03 sec/pass
So the conclusion is that Python can be used in Google's Code Jam, but one must be carefull with the time limits!
(*) Update. Some people are commenting on reddit my ignorance of the Haskell code. I actually learned most of Haskell some time ago, until I got to Monads. Then I found 267 articles expalining what Monads are from which 27 just tutorials. At that moment I tought "Enough! Maybe I will read it later.". The so called "Maybe" monad. That was 1.5 years ago.
Comments
I do have one question though - it appears to me that the WordPath class is used only for running the two methods. Wouldn't it be cleaner (two fewer lines and no "selfs") to just have the two methods in the global name space? What does the class structure bring to the party?
Don't get me wrong; I'm not trying to trash the code. I'm sure I couldn't do as well. I'm just trying to improve my understanding of good python style.
Thanks.
By steve_g, # 17. August 2006, 01:35:01
The solution is implemented with a class because of the Google Code Jam rules. They give the class structure and the competitor have to implement it. For this task the Java interface was:
By ipeev, # 17. August 2006, 04:41:58
Runs a little faster for the tests (on my machine anyway) but starts running a lot faster if you remove the "> 1000000000" tests and use large test values.
EDIT: Made it a bit more efficient
By almostobsolete, # 17. August 2006, 14:48:40
After looking for several minutes at the code I think I understand now why and how this solution works.
But not sure why it is faster. It aparently doesn't use cache. Maybe because it doesn't use recursion. I wish Python supports better recursive functions some day.
Anyway. Measured 2.9 seconds on the same computer. Removing the check for "> 1000000000" doesn't improve the time from what I see.
Probably the trick with the early interruption can be used here too, because it is far behind the fastest 0.03 seconds solution with the exceptions.
By ipeev, # 17. August 2006, 17:04:33
By almostobsolete, # 17. August 2006, 19:05:31
I was coding this in the statistical language R (which is not fast enough, but comes surprisingly close with sparse transition matrix operations) and this got me thinking about the overflow exception case.
I don't think that all A's is the hardest case by a long way. Suppose you have a 50x50 grid of As with a B in the last entry and that the test word is 49 As and a B. You won't know until you look at the last entry whether there are $BIGNUM solutions or none.
By anonymous user, # 21. August 2006, 14:49:54
The first test is with all "A"s and the second is with "A"s and only 1 "B" at the end.
Here are the measured results:
We see that indeed the time increased about 100 times. But it is still under 1 second and much better than 2.4545 sec/pass for my first solution without the exceptions.
But let see how the solution provided by almostobsolete will handle this case. Running the same 2 tests gives:
His algorythm handles a little better the new extreme case. Apparently it is using time to sum all solutions in the "A"s only case.
By ipeev, # 22. August 2006, 06:53:36