I'm sure you've all seen posts like
Olny srmat poelpe can raed tihs. I cdnuolt blveiee taht I cluod aulaclty uesdnatnrd waht I was rdanieg. The phaonmneal pweor of the hmuan mnid, ...
Now this is mildly amusing, but not all that interesting on itself. However, I began to wonder - is it possible to automatically unscramble texts like the above? Surprisingly, it (almost) is!
Of the 72945 unique words in my /usr/share/dict/words
(not counting possesive forms, eg "storekeeper's"), only 630 have a non-unique scrambled form, if you ignore plural forms. What do I mean by this? Well, if you take the letters in those words and sort them, there is more than one word with the same sequence of sorted letters. However, for 606 of those, there are just two possible options. Only for 24 words there are groups of 3 words with the same possible representation, 6 of which are quite similar:
bread, bared, beard
field, filed, flied
bleary, barely, barley
petard, parted, prated
staled, salted, slated
staling, salting, slating
storied, sortied, steroid
paternal, prenatal, parental
If you include plural forms (ie anything with "s" as last letter), you get considerably more duplicates - 9 groups of 4 words each, 22 groups of 3 and 585 of 2, for 1272 non-unique words in total.
It turns out that this is not quite enough to allow for naive unscrambling - the snippet from above is turned into this (thanks for "Zzzz" pointing out how to improve it a bit):
only smart people can read this i cloud not believe that i cloud actually understand what i was reading the phenomenal power of the human mind
Enough to get the meaning of the text, but not perfect. The accuracy would certainly improve if my algorithm always chose the most common word, but, alas, it's just a few lines of scala (also, my dictionary contains no information about word frequency):
object Scramble extends App {
val words = io.Source.fromFile("/usr/share/dict/words").getLines()
val uniques = words.filter(! _.contains("'")).toList
// group words by what you get when you sort their individual characters
val scrambled_lookup = uniques.map(w => w.toList.sorted.mkString).zip(uniques).groupBy(_._1).map {
case (scrambled, ws) => (scrambled, ws.map(_._2))
}
// words whose sorted versions collide
val non_uniques = scrambled_lookup.toList.map(_._2).sortWith(_.length > _.length).filter(_.length > 1)
// words which have non-unique sorted representation even if you fix first and last letter
val not_recognizable = non_uniques.map {
// remove the first filter to include plural words
case anagrams => anagrams.filter(_.last != 's').combinations(2).toList.filter {
case first :: second :: nil =>
first.head == second.head && first.last == second.last
} .flatten.groupBy(w => w).toList.map(_._1)
} .filter(_.length > 0)
// all the non-unique words
println(not_recognizable.sortWith(_.length > _.length))
// number of words with same sorted representation, grouped by size of group
println(not_recognizable.groupBy(l => l.length).toList.map {
case (length, list_of_ws) => (length, list_of_ws.length)
})
def unscramble(text: String) = text.replaceAll("[\\.,]", "").split(" ").map(_.toLowerCase).map {
case w => scrambled_lookup(w.toList.sorted.mkString).filter{
case p => p.head == w.head && p.last == w.last
}.head
}.mkString(" ")
val txt = "Olny srmat poelpe can raed tihs. I cluod not blveiee taht I cluod aulaclty uesdnatnrd waht I was rdanieg. The pheonmneal pweor of the hmuan mnid"
println(unscramble(txt))
}
Play around with it if you like - maybe you know of a better dictionary file?