Appeasing My Inner Statistics Geek

I was about to start off this entry by asserting that instructional designers are well-served to have some level of proficiency with statistics, which is helpful for tasks like crunching course evaluation data. But, then, I think everyone would be well-served by an interest in statistics.

I learned to use SPSS in my graduate program a few years ago and so purchased a full copy at work when I graduated. Unfortunately, the DRM (digital rights management) on the software makes it an ordeal to reinstall on new computers when I upgrade, which I did recently. Not looking forward to going through that again, I decided to check out what alternatives exist nowadays.

I decided to give PSPP, an open source clone of SPSS, a try.

I sometimes find myself collecting statistics about the world around me, so I stuffed some into PSPP to give it a try. Recently, I found myself wondering which of two paths was the quicker one to work from my bus stop. It’s only a couple of blocks to work, so it’s not like the difference would be huge, but when the wind chill is -40, like it was earlier this week, seconds count! I started collecting data. I alternated days (one day one route, the next the other) and jotted down the results each day.

This was a simple comparison between two means, no sweat for PSPP. I found that one route was faster on average (2.48 minutes versus 2.71), but also had greater variability–a higher standard deviation (.46 versus .25).


(As you can see from the histogram, I should take the north route [in red*] unless I absolutely, positively have to be to work in under 4 minutes. The south route [blue] is more reliable, though typically a little longer.)

Unfortunately, my conclusions are only statistically significant at the 93% confidence level. If this was academic research, I’d have to claim that the means are not significantly different. But it’s good enough for me. Now I always take the (on average) shorter way.

To be sure, PSPP has problems. It has no undo. It’s not entirely stable. I have no control over formatting of histograms–which actually turned out to be kind of a good thing as it forced me to figure out how to build them in Excel, where I discovered it’s really easy to build two histograms into one for easy visual comparison of data spread.

Even with its shortcomings, I still like PSPP better than SPSS.

* My histogram has at least three design problems. One, I used color as a sole cue. Two, I did not label the bars. Three, the bins on the x-axis are not clearly labeled–they should say something like “2.00 to 2.09.”


