Saturday, April 14, 2007

Peter Norvig's spell checker

I think that Peter Norvig changed slightly the implementation of his spell checker, because of me: http://www.norvig.com/spell-correct.html

That happened during discussion on the reddit's comments thread about his spell checker: http://programming.reddit.com/info/1gb59/comments



ipeev 2 points 4 days ago*

I see a small problem in the Norvig's code. This part is supposed to make 26n alterations:

[word[0:i]+c+word[i+1:] for i in range(n) for c in string.lowercase] + ## alteration

But not really. This is what string.lowercase returns on my computer:

import string

string.lowercase

'abcdefghijklmnopqrstuvwxyz\x83\x9a\x9c\x9e\xaa\xb5\xba\xdf\xe0 \xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0 \xf1\xf2\xf3\xf4\xf5\xf6\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'

Total of 65 characters. Same for insertion.

Is this a real problem? It could be! One has to be aware of the Python's battery included philosophy.


Then someone responds:



vetinari 1 point 2 days ago

Check your locale, mine is fine:

>>> import string
>>> [c for c in string.lowercase]
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
>>>


He is right too, but I am not going to change my locale, so I respond:


ipeev 1 point 2 days ago

My point is that the Norvig's code is not aware of the localization features of Python. If he wants his program to run on any computer, the simplest think to do is use the first 26 letters only:

string.lowercase[:26]



But Norvig didn't exactly listened to me and implemented different solution:


alphabet = 'abcdefghijklmnopqrstuvwxyz'

def edits1(word):
n
= len(word)
return set([word[0:i]+word[i+1:] for i in range(n)] + # deletion
[word[0:i]+word[i+1]+word[i]+word[i+2:] for i in range(n-1)] + # transposition
[word[0:i]+c+word[i+1:] for i in range(n) for c in alphabet] + # alteration
[word[0:i]+c+word[i:] for i in range(n+1) for c in alphabet]) # insertion






So instead of choosing my cryptic syntax he implemented it with explicit alphabet instead. Apparently he also thinks that explicit is better than implicit.

At the end of his page he writes:
Originally my program was 20 lines, but a reader pointed out that I had used string.lowercase, which in some locales in some versions of Python, has more characters than just the a-z I intended. So I added the variable alphabet to make sure.


I think I might have been that reader. Just for information the locale in question is Bulgarian. Sorry for causing the trouble with all those regional alphabets, I know it a pain in the ... . I like his site very much and enjoyed the "Tutorial on Good Lisp Programming Style" and even I have it printed on paper at home!

UPDATE: I've sent an email to Norvig and asked him about this and he responded that indeed, he did read my comment on reddit and thats why he changed the implementation. Now my name is there too which is cool!

Originally my program was 20 lines, but Ivan Peev pointed out that I had used string.lowercase, which in some locales in some versions of Python, has more characters than just the a-z I intended. So I added the variable alphabet to make sure. I could have used string.ascii_lowercase.

Tuesday, March 20, 2007

Solving the "Mr.S and Mr.P" puzzle by John McCarthy in Python

I have found recently an interesting article on reddit, about solving a logic puzzle in Haskel:
http://okmij.org/ftp/Haskell/Mr-S-P.lhs
While the technique presented is interesting itself, I find it transferable to Python with even improved readability.
And here is the original statement:

Formalization of two Puzzles Involving Knowledge
McCarthy, John (1987).
http://www-formal.stanford.edu/jmc/puzzles.html

We pick two numbers a and b, so that a>=b and both numbers are within
the range [2,99]. We give Mr.P the product a*b and give Mr.S the sum
a+b.

The following dialog takes place:

Mr.P: I don't know the numbers
Mr.S: I knew you didn't know. I don't know either.
Mr.P: Now I know the numbers
Mr.S: Now I know them too

Can we find the numbers a and b?



Follows the solution written in Python 2.5.

# all pairs of a,b
pairs = [ (a,b) for a in range(2,99+1) for b in range(2,99+1) if a>=b ]

# calculates map of solutions
def calc_map(oper):
M={}
for a,b in pairs:
m = oper(a,b)
if not m in M:
M[m] = []
M[m].append( (a,b) )
return M

# function that tests for single solution
single = lambda lx: len(lx) == 1

# maps that hold the sum and the product solutions,
# dictionaries with list values
S = calc_map(lambda a,b: a+b)
P = calc_map(lambda a,b: a*b)

# Rules list
rule_MrP_dont_know = lambda p: not single (P[p])

rule_MrS_dont_know = lambda s: not single (S[s])

rule_MrS_knew_MrP_doesnt_know = lambda s: all( [rule_MrP_dont_know( a*b )
for a,b in S[s] ] )

rule_MrP_now_knows = lambda p: single( [ (a,b) for a,b in P[p]
if rule_MrS_knew_MrP_doesnt_know(a+b) ])

rule_MrS_knows_MrP_now_know = lambda s: single([ (a,b) for a,b in S[s]
if rule_MrP_now_knows(a*b) ])
# Solve it
for a, b in pairs:
s,p = a+b, a*b
if rule_MrP_dont_know(p) \
and rule_MrS_dont_know(s) \
and rule_MrS_knew_MrP_doesnt_know(s)\
and rule_MrP_now_knows(p) \
and rule_MrS_knows_MrP_now_know(s):
print "Answer is:" , a,b



And if we run the program the answer we get is:


Answer is: 13 4



I used the "all" function to increase readability and this is the only reason a 2.5 version is needed. Use own "all" function for a 2.4 version.

Now what we have here is almost a short implementation of a rule engine with forward-chaining. It certainly misses a lot of functionality of a real rule engine, but the actual process of rules execution is probabbly the same. Is Python a good language for this kind of tasks? I think it is.

The Python solution does not follow exactly the Haskell route neither the the paper by McCarthy. MsCarthy I find rather difficult to read and Haskel has its own oddity that makes me screem sometimes in the middle of the night.

The Python rules try to be more directly mapped to the conditions in the task. Readability counts and of course beautiful is better than ugly.