Yesterday, I was shown the idea of Code Kata. Here we have a collection of interesting exercise that no self-respecting coder could resist having a go at. I had a spare hour, so I picked an interesting problem about anagrams, fired up my editor, and had a go. As I went I jotted down the thought processes – it was quite interesting.
I started by thinking as a person would when comparing two words to see if they were anagrams. I would work through the first word a letter at a time, ticking off the letters in the second. If they all matched, we had an anagram. This sounded difficult to code (a bit of experience, there), so I thought about it more programatically. It didn’t take long to hit on the idea of sorting the letters in each word, and seeing if the two sequences thus generated were the same. Easy to code, so off I went – four lines of Python.
So I can take two words and see if they’re anagrams. Good start. Obviously a couple of nested loops through the word list will find all the anagram pairs for me. The word list has 340,000 words; square that and you’re looking at a lot of comparisons. (I actually coded this quickly, for a laugh, and watched the poor computer struggle).
So I have to think of a completely different approach. One coffee later, and I’ve hit on what I might call an algorithm. I will look at every word in turn, and sort its letters into something I will call a signature. I will count up the number of times each signature occurs. And all the ones that occur more than once are anagrams, so for each of these I will look through the list again, picking out the matching words.
Now, this is much better, and looks as if it will actually run. But there must be a better way …