Archive for the 'data analysis' Category

Assorted Links

Tuesday, April 26th, 2011

Thanks to Peter Spero and VeganKitten.

Andrew Gelman’s Top Statistical Tip

Tuesday, March 30th, 2010

Andrew Gelman writes:

If I had to come up with one statistical tip that would be most useful to you–that is, good advice that’s easy to apply and which you might not already know–it would be to use transformations. Log, square-root, etc.–yes, all that, but more! I’m talking about transforming a continuous variable into several discrete variables (to model nonlinear patterns such as voting by age) and combining several discrete variables to make something [more] continuous (those “total scores” that we all love). And not doing dumb transformations such as the use of a threshold to break up a perfectly useful continuous variable into something binary. I don’t care if the threshold is “clinically relevant” or whatever–just don’t do it. If you gotta discretize, for Christ’s sake break the variable into 3 categories.

I agree (and wrote an article about it). Transforming data is so important that intro stats texts should have a whole chapter on it — but instead barely mention it. A good discussion of transformation would also include use of principal components to boil down many variables into a much smaller number. (You should do this twice — once with your independent variables, once with your dependent variables.) Many researchers measure many things (e.g., a questionnaire with 50 questions, a blood test that measures 10 components) and then foolishly correlate all independent variables with all dependent variables. They end up testing dozens of likely-to-be-zero correlations for significance. Thereby effectively throwing all their data away — when you do dozens of such tests, none can be trusted.

My explanation why this isn’t taught differs from Andrew’s. I think it’s pure Veblen: professors dislike appearing useful and like showing off. Statistics professors, like engineering professors, do less useful research than you might expect, so they are less aware than you might expect of how useful transformations are. And because most transformations don’t involve esoteric math, writing about them doesn’t allow you to show off.

In my experience, not transforming your data is at least as bad as throwing half of it away, in the sense that your tests will be that much less sensitive.

Obesity and Your Commute

Thursday, November 19th, 2009

In the 1950s — before the invention of BMI (Body Mass Index) — Jean Mayer and others did a study of obesity at a factory in India. They divided workers by how much exertion their job required. Almost everyone, even desk clerks, was thin, with the exception of the most sedentary. It appeared that walking one hour per day (to and from work) was enough to get almost all the weight loss possible with exercise. Doing more had greatly diminished returns. A study with rats suggested the same thing. Bottom line: If you’re sedentary, you can easily lose weight via exercise, which can be as simple as walking to work. If not, it’s hard.

This month GOOD has a kind of update of that ancient study — a scatterplot, each point a different country, that shows percentage of obesity and fraction of commutes that are active (bike or walk). It supports what Mayer and others found — that how you get to work makes a difference. If you fitted a line to the data it would have a negative slope (more obesity, less active commutes). America has the most obesity and relatively few active commutes; Switzerland has the most active commutes and relatively little obesity. The graph also suggests that other factors matter a lot. Although Australia has less active commutes than America, it also has less obesity.

John Tukey and GPS

Saturday, July 11th, 2009

In this amusing article Emily Yoffe tells about her troubles with GPS. She fails, unfortunately, to look on the bright side — to say how flawed GPS is better than no GPS. After a talk by John Tukey, the statistician, at Berkeley, I told him that I had found the tools he wrote about in Exploratory Data Analysis to be really helpful. (For example, smoothing my data led me to discover that eating breakfast made me wake up too early.) Tukey replied that if the tools are helpful half the time, that’s good. It isn’t easy to make an interesting response to a compliment!

Something is better than nothing.

Self-Tracking: What I’ve Learned

Friday, June 12th, 2009

I want to measure, day by day, how well my brain is working. After I saw big fast effects of flaxseed oil, I realized how well my brain works (a) depends on what I eat and (b) can change quickly. Maybe other things besides dietary omega-3 matter. Maybe large amounts of omega-6 make my brain work worse, for example. Another reason for this project is that I’m interested in how to generate ideas, a neglected part of scientific methodology. Maybe this sort of long-term monitoring can generate new ideas about what affects our brains.

So I needed a brain task that I’ll do daily. When I set out to devise a good task, here’s what I already knew:

1. Many numbers, not one. A task that provides many numbers per test (e.g., many latencies) is better than a task that provides only one number (e.g., percent correct). Gathering many numbers per test allows me to look at their distribution and choose an efficient method of combining (i.e., averaging) them into one number. (E.g., harmonic mean, geometric mean, trimmed mean.) Gathering many numbers also allows me to calculate a standard error, which helps identify unusual scores.

2. Graded, not binary. Graded measures (e.g., latencies) are better than binary ones (e.g., right/wrong).

Every experimental psychologist knows this. What none of them know is how to make the task fun. If I’m going to do something every day, it matters a great deal whether I enjoy it or not. It might be the difference between possible and impossible. People enjoy video games, which is a kind of existence proof. Video games have dozens of elements; which matter? Here’s what I figured out by trial and error:

3. Hand-eye coordination. Making difficult movements that involve hand-eye coordination is fun. My bilboquet taught me this. Presumably this tendency originated during the tool-making hobbyist stage of human evolution; it caused people to become better and better at making tools. Ordinary typing involves skilled movement but not hand-eye coordination. This idea has worked. I led me to try one-finger typing (where I look at the keyboard while I type) instead of regular typing. And, indeed, I enjoy the one-finger typing task, whereas I didn’t enjoy the ordinary typing tasks I’ve tried.

4. Detailed problem-by-problem feedback. Right/wrong is the crudest form of feedback; it doesn’t do much. What I find is much more motivating is more graded feedback based on performance on the same problem.

5. Less than 5 minutes. The longer the task the more data, sure, but also the more reluctant I am to do it. Three minutes seems close to ideal: long enough for the task to be a pleasant break but not so long that it seems like a burden.

Experimental psychology is a hundred years old. Small daily tests is an unexplored ecology that might have practical benefits.