Hard Tweets Explained: Correlation Coefficient

I honestly didn’t think this was a hard tweet, but someone asked me to explain it, so here we go!

I assume the text content of the tweet is not the tricky part. Music impacts mood, and mood impacts driving. Whether I meander along at just the speed limit, or go faster than that is driven primarily by what music is playing. But you got that part already.

The joke here lies in the (r=0.867, σ=2.6). If you’ve ever read a research paper, you might have seen a notation like that. r is the letter we use for something called “correlation coefficient” and σ (that’s a Greek lowercase s, or “sigma”) is what statisticians use for standard deviation.

There is a message here, but I can't quite put my finger on it

There is a message here, but I can’t quite put my finger on it

Imagine that you have a bunch of data points on a graph, and they seem to be falling along a line. For example, suppose on one axis we plot time spent on twitter, and on the other axis we plot the number of times your spouse sighs audibly or rolls their eyes. And suppose that we have a lot of samples, and we plot them on a graph. And we notice that people who spend only a little time on twitter get few sighs and eye rolls, but people who spend a lot of time on twitter get a lot of sighs and eye rolls.

So these dots on the graph kind of form a line. But data samples are noisy, so we want to know how well these two things correlate. That’s what the correlation coefficient tells us. I’ll spare you the method of computation, but it boils line-ish-ness down to a single number. 1 means all the points are exactly on a line (perfect correlation), and 0 means they aren’t on a line at all (no correlation). The coefficient can also go negative which means that as one thing increase the other decreases. -1 means perfectly correlated, just in the opposite direction.

Correlation coefficients are useful for letting you know whether one thing can be used to predict something else. For example, my father once did a study where he compared the SAT scores of black student athletes against how well those kids did in college. The correlation coefficient was negative: the worse you did on the SAT, the better you would do in college. The College Board was not a fan of my father’s work.

In real scientific studies, correlation coefficients are typically pretty small. Correlations as low as r=0.2 might be considered significant, depending on the number of samples.

Correlation doesn’t tell you about causation. It just tells you if two things are moving in lock step. Either might be causing the other, or some other thing might be causing both, or it could just be a complete coincidence.

So that’s r in the tweet. I’m saying that the music and drive time are highly correlated. Putting σ in there was just to make sure the people who read research papers would understand what I meant by r. It really makes no sense at all in this context.

Homework: Use your newfound knowledge of statistics to mock people on Facebook. Here’s an example:


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s