5. ...compose us a piece of customized wordhacking, e.g.:
pithier
legible
emerald
alleged
spumoni
excerpt
legible
emerald
alleged
spumoni
excerpt
That sounds like an interesting task and here we will try solve it using Python and more specifically the intersection operation of the set type. For example one solution to the text "reddit is now wordhacking" is :
thrown
daemon
coders
iodide
weighs
catnap
stitch
lusaka
bandit
scorns
sewage
And another:
runways
echoing
doormat
deadsea
itching
tetanus
icecold
soakers
namings
oddness
wriggle
Follows the source. The function that do all the work is make_wordhacking. When run for the first time the program will get a dictionary with words from Internet. The program will print only one (the first) solution for each combination of columns and word size.
from __future__ import division
from urllib2 import urlopen
from collections import defaultdict
from itertools import combinations
from zipfile import ZipFile
import os
def iterate_in_group_by(num, cx):
"iterate_in_group_by(3, 'ABCDEFG') --> ABC DEF G"
c = 0
while c < len(cx):
yield(cx[c:c+num])
c += num
def get_words():
fn = "corncob_lowercase.zip"
if not os.path.isfile(fn):
data = urlopen( "http://www.mieliestronk.com/" + fn ).read()
with open(fn,"w") as fw:
fw.write(data)
fz = ZipFile(fn)
return fz.open(fz.namelist()[0]).read().split()
def make_sets(W):
S=defaultdict(lambda: defaultdict(set))
for word in W:
for i,c in enumerate(word):
S[c][i].add(word)
return S
def factors(n):
for k in range(2,n//2+1):
if n%k == 0:
yield k
def print_reddit_comment_solution(R,P):
print "---------------------------------------------------"
print "We have a solution"
print P
for v in R.values():
for w in [list(a) for a in v]:
for p in range(len(w)):
if p in P:
w[p] = "`[%s](http://)`"%w[p]
print "".join(("\n`%s`" % "".join(w)).split("``"))
break
def print_html_solution(R,P):
print """\n<br/><div style="font-family: monospace; text-align: center; font-size: x-large; line-height: 1em;">"""
for v in R.values():
for w in [list(a) for a in v]:
for p in range(len(w)):
if p in P:
w[p] = """<span style="font-family:monospace; font-weight: bold; background-color: rgb(255, 221, 221);">%s</span>"""%w[p]
else:
w[p] = """<span style="font-family:monospace; font-weight: bold; background-color: rgb(221, 255, 255);">%s</span>""" % w[p]
print "<br/>%s" % "".join(w)
break
print """</div>"""
def make_wordhacking(text,W):
text = [x for x in text if x.isalpha()]
for colnum in factors(len(text)):
print colnum
for w_len in range(colnum,15):
S = make_sets( (word for word in W if len(word)==w_len) )
T = list(iterate_in_group_by(len(text)//colnum,text))
for P in combinations(range(w_len),colnum):
R={}
for i in range(len(T[0])):
C = [col[i] for col in T]
R[i] = S[C[0]][P[0]]
for ind,pos in enumerate(P[1:]):
R[i] = R[i] & S[C[ind+1]][pos]
if not R[i]:
break
else :
print_reddit_comment_solution(R,P)
if __name__ == "__main__":
W =get_words()
text = "reddit is now wordhacking"
make_wordhacking(text,W)
Sometimes it will generate a lot of solutions, sometimes not at all. Try to choose an input text that has a length with a lot of factors.
Comments
I do have one question though - it appears to me that the WordPath class is used only for running the two methods. Wouldn't it be cleaner (two fewer lines and no "selfs") to just have the two methods in the global name space? What does the class structure bring to the party?
Don't get me wrong; I'm not trying to trash the code. I'm sure I couldn't do as well. I'm just trying to improve my understanding of good python style.
Thanks.
By steve_g, # 17. August 2006, 01:35:01
The solution is implemented with a class because of the Google Code Jam rules. They give the class structure and the competitor have to implement it. For this task the Java interface was:
By ipeev, # 17. August 2006, 04:41:58
Runs a little faster for the tests (on my machine anyway) but starts running a lot faster if you remove the "> 1000000000" tests and use large test values.
EDIT: Made it a bit more efficient
By almostobsolete, # 17. August 2006, 14:48:40
After looking for several minutes at the code I think I understand now why and how this solution works.
But not sure why it is faster. It aparently doesn't use cache. Maybe because it doesn't use recursion. I wish Python supports better recursive functions some day.
Anyway. Measured 2.9 seconds on the same computer. Removing the check for "> 1000000000" doesn't improve the time from what I see.
Probably the trick with the early interruption can be used here too, because it is far behind the fastest 0.03 seconds solution with the exceptions.
By ipeev, # 17. August 2006, 17:04:33
By almostobsolete, # 17. August 2006, 19:05:31
I was coding this in the statistical language R (which is not fast enough, but comes surprisingly close with sparse transition matrix operations) and this got me thinking about the overflow exception case.
I don't think that all A's is the hardest case by a long way. Suppose you have a 50x50 grid of As with a B in the last entry and that the test word is 49 As and a B. You won't know until you look at the last entry whether there are $BIGNUM solutions or none.
By anonymous user, # 21. August 2006, 14:49:54
The first test is with all "A"s and the second is with "A"s and only 1 "B" at the end.
Here are the measured results:
We see that indeed the time increased about 100 times. But it is still under 1 second and much better than 2.4545 sec/pass for my first solution without the exceptions.
But let see how the solution provided by almostobsolete will handle this case. Running the same 2 tests gives:
His algorythm handles a little better the new extreme case. Apparently it is using time to sum all solutions in the "A"s only case.
By ipeev, # 22. August 2006, 06:53:36