### Tuesday, May 22, 2007

## Standard Deviations of the Mean

In my day-to-day life, I run experiments on Database Management Systems, trying to understand their performance characteristics.

Recently, I've been trying to create graphs of average query response times with "standard deviations" --- to ascribe some level of confidence to my data. Unfortunately, I've been doing it all wrong. Fortunately, I have ample company, many people don't quite get it, even in the medical community.

This is the situation: You run an experiment, with thousands of data points (x[i]'s). You want to average them together, and ascribe a confidence to your average.

My first (and incorrect) idea was to compute the mean and the standard deviation from my data points. In other words:

The problem here is that the stdev is an estimate for the variation within the population, and does not have anything to do with the accuracy of your estimate of the mean (mu). In other words, while you expect that your estimate of the mean should get better as you increase the number of data-points, your standard deviations will never get smaller, no matter how many samples you take.

What you really want is Standard Error of the Mean (aka SE or SEM). This is simply the sample standard deviation (above) divided by the square root of the number of samples:

Now you may use the SEM to provide the confidence interval for the mean of your x[i]'s. That is, a 68% likelihood that the true mean is within 1 SEM away from mu, a 95% likelihood that it is within 2 SEM's of mu, and so forth.

So in summary, there are two things of importance to you --- The Standard Deviation of your sample, and the Standard Error of the Mean --- these are very different beasts and should never be confused, no matter how tempted you are to do so.

Recently, I've been trying to create graphs of average query response times with "standard deviations" --- to ascribe some level of confidence to my data. Unfortunately, I've been doing it all wrong. Fortunately, I have ample company, many people don't quite get it, even in the medical community.

This is the situation: You run an experiment, with thousands of data points (x[i]'s). You want to average them together, and ascribe a confidence to your average.

My first (and incorrect) idea was to compute the mean and the standard deviation from my data points. In other words:

mu := sum(x[i])/N

var := sum( (x[i]-mu)^2 )/N

stdev := sqrt(var)

The problem here is that the stdev is an estimate for the variation within the population, and does not have anything to do with the accuracy of your estimate of the mean (mu). In other words, while you expect that your estimate of the mean should get better as you increase the number of data-points, your standard deviations will never get smaller, no matter how many samples you take.

What you really want is Standard Error of the Mean (aka SE or SEM). This is simply the sample standard deviation (above) divided by the square root of the number of samples:

sem := stdev/sqrt(N)

Now you may use the SEM to provide the confidence interval for the mean of your x[i]'s. That is, a 68% likelihood that the true mean is within 1 SEM away from mu, a 95% likelihood that it is within 2 SEM's of mu, and so forth.

So in summary, there are two things of importance to you --- The Standard Deviation of your sample, and the Standard Error of the Mean --- these are very different beasts and should never be confused, no matter how tempted you are to do so.