Sunday, February 28, 2010

The Elusive Nature of Randomness and Random shuffles

After a lengthy legal battle, Microsoft and the European antitrust officials recently agreed on the implementation of a so-called ballot or choice screen, to be included in the rollout of Windows 7.

This approach will give European Windows 7 users an opportunity to download their prferred browser from a list of Microsoft's rivals.

These choices include Google's Chrome, Apple's Safari, Mozilla's Firefox and Opera, as possible alternatives to Microsoft’s own Internet Explorer.

Randon Shuffle
The browser choice screen requires what we call a “random shuffle”. You start with an array of values and return those same values, but in a randomised order but how can you guarantee a 'truly random' result. Well, this computational problem has been known to programmers since the earliest days of computing.

Known approaches
There are 5 well-known approaches: 3 good solutions, 1 acceptable solution that is slower than necessary and 1 bad approach that doesn’t really work. Ubfortunately from this selection Microsoft appears to have picked the bad approach but it more likely to have been caused by inexperience or bad judgement rather than any ill-intention.

It is more in the nature of a “naive” algorithm, akin to the simple bubble sort. Something that inexperienced programmers inevitably fall headlong into when attempting to solve a given problem.

Inevitably, if we gave this same problem to 100 newly qualified computer scienctists, 1 or more of them would make the same basic mistake. Fortunately, with education and experience, programmers can learn about these things and certainly, one of the things they should learn about early on, is to reach for Donald Knuth's book on algorithms, programming and random shuffles.

The Art of Computer Programming
, Vol. 2, section 3.4.2 “Random sampling and shuffling” describes two solutions:
  1. If the number of items to sort is small, then simply put all possible orderings in a table and select one ordering at random. In our case, with 5 browsers, the table would need 5! = 120 rows.
  2. “Algorithm P” which Knuth attributes to Moses and Oakford (1963), but is now known to have been anticipated by Fisher and Yates (1938) so it is now called the Fisher-Yates Shuffle.

No comments:

Post a Comment