Tuesday, July 13, 2010

A spoonful of sugar?

A spoonful of sugar
I keep six honest serving-men
(They taught me all I knew);
Their names are What and Why and When
And How and Where and Why do we have to do statistics? I'm a Biochemist and I don't need to know all this stuff. We didn't do it that way at A level, we had to use a calculator. Will this be on the exam?

There was lots of very useful discussion after my post yesterday about statistics teaching, both from students and colleagues, via Twitter and Friendfeed.

As a result of suggestions, I went away and had another look at Stata. It's quite nice. For the uninitiated (and possibly uninterested), Stata is a commercial statistics package available for multiple platforms. Although the command line interface is prominent, there is a menu-driven graphical user interface (GUI) which also gives access to nearly all built-in commands. It's a bit pricey, but less so than other packages such as SPSS and SAS. (In the midst of yesterday's discussion but unconnected to it, a student retaking the module over the summer contacted me to ask how to get the Analysis Toolpak on Excel 2008. Another good reason to drop Excel as quickly as possible.) Discounts are available for students and the documentation is pretty good. Overall, it is easier to use than R. Which is why the rest of this post is about why Stata might not be the right solution for teaching statistics.

In the discussion yesterday, students raised questions about our blended module delivery consisting of online notes and screen capture videos demonstrating procedures within the software, supported by face to face help sessions. We know that statistics, like all areas of mathematics, is challenging for most of our students. They would prefer to be "taught statistics" - sit in a lecture theatre and emerge knowing how to do it, possibly revise it for the exam. But statistics doesn't work like that. Even if you know how to perform the procedures, you don't know when and why (or why not). If students are faced with a menu command saying "Histogram" or "t test", they click it. And why not? Except that they don't know why they clicked it, whether they should have done, and what the significance of the output is (in the context of their particular dataset). And that's why Stata is the wrong answer for us - it makes doing statistics too easy, without contributing to understanding.

On a personal level, my major problem is that "statistics" is a huge field, a degree course in itself. It doesn't fit into the 20 contact hours I'm allotted to "teach" it, so I'm only ever going to be scratching the surface. Ironically, using "difficult" software might make it easier for me to help students understand they're not going to be experts after one module, and that that's not a failure for them or me. Which raises an interesting question which also emerged from yesterdays discussions. Should we tell students "Statistics is difficult. R is difficult. We can't teach you everything in such a short time, but we plan to make a start.", or should I follow Neil's advice and say "Welcome to stats class. We'll be using R. Open R. Read CSV file. Couple of simple descriptive stats and a simple plot. That was lesson 1. Log out. Don't even mention that R is supposed to be hard - just do something with it, quickly and say "wasn't that easy"". I'm looking for input on this.

In practical terms, we won't be abandoning our blended model for the undergraduate module. My proposed session structures for the revised module look something like:
  • Mini-"lecture" on statistical principles consisting of VLE notes, short video, Friendfeed support for Q&A. The issue here is that we know many students will skip this and go directly to the assessment.
  • Assessed task consisting of dataset plus screencast of procedure being carried out.
  • Ongoing feedback and support via computer lab help sessions, Friendfeed.
  • Possibly rolling the whole thing into some sort of "handout" (format?) students can take away with them so they can continue to use it after they get shut out of the VLE.
To allow for the increased difficulty of supporting R, the topics covered will be scaled back to a more suitable introductory level we can adequately support in the time allotted for the first year module:
  1. Introduction - this module (ethos); Using R - why R?
  2. EDA with R - the normal frequency distribution
  3. Graphs: Boxplots, histograms, +?
  4. Descriptive statistics.
  5. Comparing groups - Students' t test and Chi square.
  6. Looking ahead: How you can become an R guru - online help and forums.

Comments welcome!