Thursday, October 12, 2006

Random does not imply Equally Distributed

Via Slashdot a few days ago: Steven Levy on the secrets of the iPod shuffle. Executive summary: the iPod "shuffle" feature really is random, but since our big hairless ape brains are supreme pattern finders, we see patterns where there are none.

Most interesting tidbit:

But the non-randomness illusion was so prevalent that ultimately Apple felt compelled to address it. In the version of iTunes rolled out in September 2005, there appeared a new feature: smart shuffle. It presents iPodders with a scroll bar that "allows you to control how likely you are to hear multiple songs in a row by the same artists or on the same album". If you pull the lever to the right, the iPod will mess with its usual distribution pattern, intentionally spacing out songs by a given artist. As Jobs explained it in his presentation the day the new iTunes rolled out, he gave what he hoped would be the last word on the Great iPod Randomness Controversy: "We're making it less random to make it feel more random."

This is something that's driven me nuts for years: random does not imply equally distributed. In fact, it's randomness that leads to us seeing patterns. To borrow an example from Stephen Jay Gould, think of constellations: evenly distributed stars would not have any patterns -- they'd be in a grid or something. It is, in fact, the bunchiness that comes from a random distribution that leads to patterns emerging.

An example:

While I was a grad student at Harvard, the Administration (headed by the Dean, who was a computer scientist [remember this]) decided to switch to pure random placement of non-freshmen into the Harvard Houses. The previous placement method had been a weird ad-hoc system of "non-ordered choice", with a random component to spice things up; this had in turn replaced "ordered choice", which had replaced "apply to the House and see if the upper class twits would accept you". Or some such wonderfully egalitarian system.

In any case, while I was there (and a Tutor in Leverett House, if you're curious), It Was Decided that non-ordered choice had to go. (My interpretation: under non-ordered choice, a significant number of black students managed to end up living in the same House. And that Was Not To Be Borne.) So they went for random choice. (Ironically, one of the major justifications for it was "It's the way that Yale does it", which strikes me as simply insane, since Harvard's justification for 95% of everything else it does is "We're Harvard, we're not like other places, and especially we're not like Yale.")

In the Random Choice method, each freshman would join a rooming group (of up to 20 students) and rooming groups would be randomly placed in a House. Details of The Placement Algorithm were tightly guarded. (Clearly it couldn't be pure randomness -- the Houses all had different capacities and whatever assignments were made had to fit the number of rooms.)

The first year after they implemented the Random Choice method, they ended up with Houses with drastically skewed gender ratios in their sophomore cohorts: the combination of the large rooming groups (which were largely single-sex) and the random placement meant that some Houses got a lot more men than others, and some few got equal numbers of men and women. (Harvard was one of the last universities in the country to reach gender parity in its student body, and it hadn't yet done so at that time, so no House got more women than men, as far as I know.)

There was a great kerfluffle. Perhaps even a hullaballoo. "How could the gender ratios gotten so far off?" came the cries, with much rending of garments. "It was random!"

Because, you see, random does not imply equally distributed. Especially for small numbers.

Now the Dean of Harvard at the time, you may recall (if you read carefully above -- you didn't realize there was going to be a quiz, did you?) was a computer scientist. One of the major problems in computer science is the generation of random (or more accurately pseudo-random) numbers. It undergirds all cryptography, after all. So my question for years has been: how the heck did Harvard's computer scientist Dean not understand this?

Clearly, being a Harvard professor may mean you're smart, but it doesn't mean you're wise.

No comments: