Friday, April 15, 2011

Statistics workflow: Use real data

R I'm currently writing a new statistics course (more on that shortly), and trying to follow the exhortation to "use real data for authenticity". I've tried this in the past and failed to find enough useful examples to illustrate procedures, but I have a new workflow which is working for me and I want to document it here:

  1. Go to PubMed (I'm writing for biomedical students, if you are writing for another discipline, substitute another database).
  2. Type in the name of the test, e.g. "Fisher's Exact test" or whatever.
  3. Click on "Free Full Text".
  4. Find candidate papers and see how statistics were used/reported. Find one with a good (engaging) scenario on which to base a question.
  5. It's rare to find the original data, usually just the summary statistics are published, but with these numbers (e.g. n, mean, SD), you can use R to reconstruct a suitable population, e.g. using rnorm(), rlnorm(), etc.
  6. Write the question based on the introduction to the paper (background, rationale), give the raw data, write feedback.
  7. Credit the authors and cite the publication while making it clear that the data used is only based on the original paper and has been modified for the test.

Works for me! Another bridge crossed.

    No comments:

    Post a Comment