How to Calculate Variance

X

This article was co-written by Mario Banuelos, PhD. Mario Banuelos is an assistant professor of mathematics at California State University, Fresno. With over eight years of teaching experience, Mario specializes in mathematical biology, optimization, statistical modeling for genome evolution, and data science. Mario holds a bachelor’s degree in mathematics from California State University, Fresno, and a doctorate in applied mathematics from the University of California, Merced. Mario teaches at both the high school and college levels.

This article has been viewed 240,483 times.

Variance measures the dispersion of a data set. It is very useful in building statistical models: low variance can be a sign that you are describing random error or noise rather than an implicit relationship in the data. With this article, the wikiHow will teach you how to calculate variance.

Table of Contents

Calculating the variance of a sample

{“smallUrl”:”https://www.wikihow.com/images_en/thumb/d/d0/Calculate-Variance-Step-1-Version-4.jpg/v4-460px-Calculate-Variance-Step-1- Version-4.jpg”,”bigUrl”:”https://www.wikihow.com/images/thumb/d/d0/Calculate-Variance-Step-1-Version-4.jpg/v4-728px-Calculate- Variance-Step-1-Version-4.jpg”,”smallWidth”:460,”smallHeight”:345,”bigWidth”:728,”bigHeight”:546,”licensing”:”<div class=”mw-parser -output”></div>”}

Write your sample dataset. In most cases, statisticians only have information about a sample, or subset, of the population they are studying. For example, instead of analyzing the “cost of every car in Germany” population as a whole, a statistician might find the cost of a random sample of a few thousand cars in size. The statistician can use this sample to get a good estimate of the cost of cars in Germany. However, most likely it will not completely coincide with the actual numbers.

Example: When analyzing the number of muffins sold each day at a coffee shop, you take a random six-day sample and get the following results: 38, 37, 36, 28, 18, 14, 12, 11, 10.7, 9.9. This is a sample, not a population, because you don’t have data for all store opening days.
If there are every data point in the population, go to the method below.

{“smallUrl”:”https://www.wikihow.com/images_en/thumb/0/07/Calculate-Variance-Step-2-Version-4.jpg/v4-460px-Calculate-Variance-Step-2- Version-4.jpg”,”bigUrl”:”https://www.wikihow.com/images/thumb/0/07/Calculate-Variance-Step-2-Version-4.jpg/v4-728px-Calculate- Variance-Step-2-Version-4.jpg”,”smallWidth”:460,”smallHeight”:345,”bigWidth”:728,”bigHeight”:546,”licensing”:”<div class=”mw-parser -output”></div>”}

Write the formula for sample variance. The variance of a dataset indicates how scattered the data points are. The closer the variance is to zero, the closer the data points are grouped together. When working with sample data, use the following variance formula: ^{[1] XResearch Source}

$S$ $2$ ${displaystyle s^{2}}$ $s^{2}$ = ^[(^$x$^$i$^{${displaystyle x_{i}}$}^{$x_{i}$ – x̅)}^$2$^{${displaystyle ^{2}}$}^{$^{2}$ ]} / _{(n – 1)}
$S$ $2$ ${displaystyle s^{2}}$ $s^{2}$ is the variance. Variance is always measured in squared units.
$x$ $i$ ${displaystyle x_{i}}$ $x_{i}$ represents a value in your tuple.
∑, which means “sum”, tells you what parameters to follow for each value $x$ $i$ ${displaystyle x_{i}}$ $x_{i}$ , and then add them together.
x̅ is the mean of the sample.
n is the number of data points.

{“smallUrl”:”https://www.wikihow.com/images_en/thumb/c/ce/Calculate-Variance-Step-3-Version-4.jpg/v4-460px-Calculate-Variance-Step-3- Version-4.jpg”,”bigUrl”:”https://www.wikihow.com/images/thumb/c/ce/Calculate-Variance-Step-3-Version-4.jpg/v4-728px-Calculate- Variance-Step-3-Version-4.jpg”,”smallWidth”:460,”smallHeight”:345,”bigWidth”:728,”bigHeight”:546,”licensing”:”<div class=”mw-parser -output”></div>”}

Calculate the mean of the sample. The symbol x̅ or “x-horizontal” is used to indicate the sample mean. ^{[2] XResearch Source} Do the same calculation as you would for any mean: add up all the data points and divide the total by the number of points.

Example: First, add the data points together: 17 + 15 + 23 + 7 + 9 + 13 = 84
Next, divide the result obtained by the number of data points, in this case six: 84 ÷ 6 = 14.
Sample mean = x̅ = 14 .
You can think of the mean as the “center point” of the data. If the data is centered around the mean, the variance is low. If they are scattered far from the mean, the variance is high.

{“smallUrl”:”https://www.wikihow.com/images_en/thumb/d/d5/Calculate-Variance-Step-4-Version-4.jpg/v4-460px-Calculate-Variance-Step-4- Version-4.jpg”,”bigUrl”:”https://www.wikihow.com/images/thumb/d/d5/Calculate-Variance-Step-4-Version-4.jpg/v4-728px-Calculate- Variance-Step-4-Version-4.jpg”,”smallWidth”:460,”smallHeight”:345,”bigWidth”:728,”bigHeight”:546,”licensing”:”<div class=”mw-parser -output”></div>”}

Subtract the mean from each data point. Now is the time to count

x

i

{displaystyle x_{i}}

x_{i}

– x̅, in which

x

i

{displaystyle x_{i}}

x_{i}

is each point in your dataset. Each result obtained will indicate the deviation from the mean of each corresponding point, or simply put, the distance from it to the mean. ^{[3] XResearch Sources} .

For example:
$x$ $first$ ${displaystyle x_{1}}$ $x_{1}$ – x̅ = 17 – 14 = 3
$x$ $2$ ${displaystyle x_{2}}$ $x_{2}$ – x̅ = 15 – 14 = 1
$x$ $3$ ${displaystyle x_{3}}$ $x_{3}$ – x̅ = 23 – 14 = 9
$x$ $4$ ${displaystyle x_{4}}$ $x_{4}$ – x̅ = 7 – 14 = -7
$x$ $5$ ${displaystyle x_{5}}$ $x_{5}$ – x̅ = 9 – 14 = -5
$x$ $6$ ${displaystyle x_{6}}$ $x_{6}$ – x̅ = 13 – 14 = -1
It is very easy to check your calculation, because the results obtained must sum to 0. That is because by definition of the mean, the results are negative (distance from mean to small numbers). more) completely cancel out the positive result (distance from mean to larger numbers).

{“smallUrl”:”https://www.wikihow.com/images_en/thumb/0/02/Calculate-Variance-Step-5-Version-4.jpg/v4-460px-Calculate-Variance-Step-5- Version-4.jpg”,”bigUrl”:”https://www.wikihow.com/images/thumb/0/02/Calculate-Variance-Step-5-Version-4.jpg/v4-728px-Calculate- Variance-Step-5-Version-4.jpg”,”smallWidth”:460,”smallHeight”:345,”bigWidth”:728,”bigHeight”:546,”licensing”:”<div class=”mw-parser -output”></div>”}

Square all results. As noted above, the current offset list (

x

i

{displaystyle x_{i}}

x_{i}

– x̅) sums to 0. That is, the “mean deviation” will also always be zero and can’t say anything about the dispersion of the data. To solve this problem, we find the square of each deviation. As a result, all are positive, negative and positive values no longer cancel each other out and sum to zero. ^{[4] XResearch Sources}

For example:
( $x$ $first$ ${displaystyle x_{1}}$ $x_{1}$ – x̅) $2$ $=$ $3$ $2$ $=$ $9$ ${displaystyle ^{2}=3^{2}=9}$ $^{2}=3^{2}=9$
$($ $x$ $2$ ${displaystyle (x_{2}}$ $(x_{2}$ – x̅) $2$ $=$ $first$ $2$ $=$ $first$ ${displaystyle ^{2}=1^{2}=1}$ $^{2}=1^{2}=1$
9 ² = 81
(-7) ² = 49
(-5) ² = 25
(-1) ² = 1
Now you have ( $x$ $i$ ${displaystyle x_{i}}$ $x_{i}$ – x̅) $2$ ${displaystyle ^{2}}$ $^{2}$ for each data point in the sample.

{“smallUrl”:”https://www.wikihow.com/images_en/thumb/c/c1/Calculate-Variance-Step-6-Version-3.jpg/v4-460px-Calculate-Variance-Step-6- Version-3.jpg”,”bigUrl”:”https://www.wikihow.com/images/thumb/c/c1/Calculate-Variance-Step-6-Version-3.jpg/v4-728px-Calculate- Variance-Step-6-Version-3.jpg”,”smallWidth”:460,”smallHeight”:345,”bigWidth”:728,”bigHeight”:546,”licensing”:”<div class=”mw-parser -output”></div>”}

Find the sum of the squared values. Now it’s time to calculate the entire numerator of the formula: ∑[(

x

i

{displaystyle x_{i}}

x_{i}

– x̅)

2

{displaystyle ^{2}}

^{2}

]. Great Sigma, , requires you to add the following element value for each value

x

i

{displaystyle x_{i}}

x_{i}

. You have calculated (

x

i

{displaystyle x_{i}}

x_{i}

– x̅)

2

{displaystyle ^{2}}

^{2}

for each value

x

i

{displaystyle x_{i}}

x_{i}

in the sample, so all you have to do is add the results together.

For example: 9 + 1 + 81 + 49 + 25 + 1 = 166 .

{“smallUrl”:”https://www.wikihow.com/images_en/thumb/4/4d/Calculate-Variance-Step-7-Version-3.jpg/v4-460px-Calculate-Variance-Step-7- Version-3.jpg”,”bigUrl”:”https://www.wikihow.com/images/thumb/4/4d/Calculate-Variance-Step-7-Version-3.jpg/v4-728px-Calculate- Variance-Step-7-Version-3.jpg”,”smallWidth”:460,”smallHeight”:345,”bigWidth”:728,”bigHeight”:546,”licensing”:”<div class=”mw-parser -output”></div>”}

Divide by n – 1, where n is the number of data points. Long ago, when calculating sample variance, statisticians just divided by n. That division will give you the mean of the squared deviation, which exactly matches the variance of that sample. However, keep in mind that the sample is only an estimate of a larger population. If you take another random sample and do the same calculation, you will get a different result. Turns out, dividing by n -1 instead of n gives you a better estimate of the variance of the larger population – which you really care about. This correction is so popular that it is now the accepted definition of sample variance. ^{[5] XResearch Sources}

Example: There are six data points in the sample, so n = 6.
Sample Variance = $S$ $2$ $=$ $166$ $6$ $-$ $first$ $=$ ${displaystyle s^{2}={frac {166}{6-1}}=}$ $s^{2}={frac {166}{6-1}}=$ 33.2

{“smallUrl”:”https://www.wikihow.com/images_en/thumb/e/ee/Calculate-Variance-Step-8-Version-3.jpg/v4-460px-Calculate-Variance-Step-8- Version-3.jpg”,”bigUrl”:”https://www.wikihow.com/images/thumb/e/ee/Calculate-Variance-Step-8-Version-3.jpg/v4-728px-Calculate- Variance-Step-8-Version-3.jpg”,”smallWidth”:460,”smallHeight”:345,”bigWidth”:728,”bigHeight”:546,”licensing”:”<div class=”mw-parser -output”></div>”}

Understand variance and standard deviation. Note that, because there is a power in the formula, the variance is measured by the unit square of the original data. This is visually confusing. Instead, usually the standard deviation is quite useful. But you’re not wasting your time either, because the standard deviation is determined by the square root of the variance. That is why the sample variance is written as

S

2

{displaystyle s^{2}}

s^{2}

, and the standard deviation of a sample is

S

{displaystyle s}

S

.

For example, the standard deviation of the above sample = s = √33.2 = 5.76.

Calculating the variance of a population

{“smallUrl”:”https://www.wikihow.com/images_en/thumb/d/d4/Calculate-Variance-Step-9.jpg/v4-460px-Calculate-Variance-Step-9.jpg”,” bigUrl”:”https://www.wikihow.com/images/thumb/d/d4/Calculate-Variance-Step-9.jpg/v4-728px-Calculate-Variance-Step-9.jpg”,”smallWidth” :460,”smallHeight”:345,”bigWidth”:728,”bigHeight”:546,”licensing”:”<div class=”mw-parser-output”></div>”}

Start with the overall dataset. The term “population” is used to refer to the entire set of related observations. For example, if you are studying the age of residents of Hanoi, your total would include the age of all individuals living in Hanoi. Normally you would create a spreadsheet for a large dataset like this, but here is a smaller example dataset:

Example: In the room of an aquarium, there are exactly six aquariums. These six tanks contain the following number of fish respectively:
$x$ $first$ $=$ $5$ ${displaystyle x_{1}=5}$ $x_{1}=5$
$x$ $2$ $=$ $5$ ${displaystyle x_{2}=5}$ $x_{2}=5$
$x$ $3$ $=$ $8$ ${displaystyle x_{3}=8}$ $x_{3}=8$
$x$ $4$ $=$ $twelfth$ ${displaystyle x_{4}=12}$ $x_{4}=12$
$x$ $5$ $=$ $15$ ${displaystyle x_{5}=15}$ $x_{5}=15$
$x$ $6$ $=$ $18$ ${displaystyle x_{6}=18}$ $x_{6}=18$

{“smallUrl”:”https://www.wikihow.com/images_en/thumb/a/a7/Calculate-Variance-Step-10.jpg/v4-460px-Calculate-Variance-Step-10.jpg”,” bigUrl”:”https://www.wikihow.com/images/thumb/a/a7/Calculate-Variance-Step-10.jpg/v4-728px-Calculate-Variance-Step-10.jpg”,”smallWidth” :460,”smallHeight”:345,”bigWidth”:728,”bigHeight”:546,”licensing”:”<div class=”mw-parser-output”></div>”}

Write the formula for the population variance. Since the population contains all the data we need, this formula gives us the exact variance of the population. To distinguish it from the sample variance (which is just an estimate), statisticians use other variables: ^{[6] XResearch Source}

σ $2$ ${displaystyle ^{2}}$ $^{2}$ = ^(∑(^$x$^$i$^{${displaystyle x_{i}}$}^{$x_{i}$ – μ)}^$2$^{${displaystyle ^{2}}$}^{$^{2}$ )} / _n
σ $2$ ${displaystyle ^{2}}$ $^{2}$ = sample variance. This is the normal squared sima. Variance is measured by the square of the unit.
$x$ $i$ ${displaystyle x_{i}}$ $x_{i}$ represents an element in your tuple.
Elements within will be calculated for each value $x$ $i$ ${displaystyle x_{i}}$ $x_{i}$ , and then added together.
μ is the overall mean.
n number of data points in the population.

{“smallUrl”:”https://www.wikihow.com/images_en/thumb/5/5e/Calculate-Variance-Step-11.jpg/v4-460px-Calculate-Variance-Step-11.jpg”,” bigUrl”:”https://www.wikihow.com/images/thumb/5/5e/Calculate-Variance-Step-11.jpg/v4-728px-Calculate-Variance-Step-11.jpg”,”smallWidth” :460,”smallHeight”:345,”bigWidth”:728,”bigHeight”:546,”licensing”:”<div class=”mw-parser-output”></div>”}

Find the mean of the population. When analyzing the population, the symbol μ (“mu”) represents the arithmetic mean. To find the mean, add up all the data points, then divide by the number of points.

You can think of the mean as “average,” but be careful, because the word has many definitions in math.
Example: mean = μ = $5$ $+$ $5$ $+$ $8$ $+$ $twelfth$ $+$ $15$ $+$ $18$ $6$ ${displaystyle {frac {5+5+8+12+15+18}{6}}}$ ${frac {5+5+8+12+15+18}{6}}$ = 10.5

{“smallUrl”:”https://www.wikihow.com/images_en/thumb/e/ee/Calculate-Variance-Step-12.jpg/v4-460px-Calculate-Variance-Step-12.jpg”,” bigUrl”:”https://www.wikihow.com/images/thumb/e/ee/Calculate-Variance-Step-12.jpg/v4-728px-Calculate-Variance-Step-12.jpg”,”smallWidth” :460,”smallHeight”:345,”bigWidth”:728,”bigHeight”:546,”licensing”:”<div class=”mw-parser-output”></div>”}

Subtract the mean from each data point. Data points closer to the mean have a difference of closer to zero. Repeat the subtraction problem for all the data points, and you’ll probably start to get a feel for how scattered the data is.

For example:
$x$ $first$ ${displaystyle x_{1}}$ $x_{1}$ – μ = 5 – 10.5 = -5.5
$x$ $2$ ${displaystyle x_{2}}$ $x_{2}$ – μ = 5 – 10.5 = -5.5
$x$ $3$ ${displaystyle x_{3}}$ $x_{3}$ – μ = 8 – 10.5 = -2.5
$x$ $4$ ${displaystyle x_{4}}$ $x_{4}$ – μ = 12 – 10., = 1.5
$x$ $5$ ${displaystyle x_{5}}$ $x_{5}$ – μ = 15 – 10.5 = 4.5
$x$ $6$ ${displaystyle x_{6}}$ $x_{6}$ – μ = 18 – 10.5 = 7.5

{“smallUrl”:”https://www.wikihow.com/images_en/thumb/8/87/Calculate-Variance-Step-13.jpg/v4-460px-Calculate-Variance-Step-13.jpg”,” bigUrl”:”https://www.wikihow.com/images/thumb/8/87/Calculate-Variance-Step-13.jpg/v4-728px-Calculate-Variance-Step-13.jpg”,”smallWidth” :460,”smallHeight”:345,”bigWidth”:728,”bigHeight”:546,”licensing”:”<div class=”mw-parser-output”></div>”}

Square each sign. At this point, some results obtained from the previous step will be negative and some will be positive. If you visualize the data on an isometric line, these two items represent the numbers to the left and right of the mean. This will not help in calculating the variance, because the two groups will cancel each other out. Instead, square them all so they are all positive.

For example:
( $x$ $i$ ${displaystyle x_{i}}$ $x_{i}$ – μ) $2$ ${displaystyle ^{2}}$ $^{2}$ for each value of i running from 1 to 6:
(-5,5) $2$ ${displaystyle ^{2}}$ $^{2}$ = 30.25
(-5,5) $2$ ${displaystyle ^{2}}$ $^{2}$ = 30.25
(-2.5) $2$ ${displaystyle ^{2}}$ $^{2}$ = 6.25
(1.5) $2$ ${displaystyle ^{2}}$ $^{2}$ = 2.25
(4,5) $2$ ${displaystyle ^{2}}$ $^{2}$ = 20.25
(7.5) $2$ ${displaystyle ^{2}}$ $^{2}$ = 56.25

{“smallUrl”:”https://www.wikihow.com/images_en/thumb/0/05/Calculate-Variance-Step-14.jpg/v4-460px-Calculate-Variance-Step-14.jpg”,” bigUrl”:”https://www.wikihow.com/images/thumb/0/05/Calculate-Variance-Step-14.jpg/v4-728px-Calculate-Variance-Step-14.jpg”,”smallWidth” :460,”smallHeight”:345,”bigWidth”:728,”bigHeight”:546,”licensing”:”<div class=”mw-parser-output”></div>”}

Find the average of your results. You now have a value for each data point, related (not directly) to how far away that data point is from the mean. Take the average by adding them together and then dividing by the number of values you have.

For example:
Overall variance = $30$ $,$ $25$ $+$ $30$ $,$ $25$ $+$ $6$ $,$ $25$ $+$ $2$ $,$ $25$ $+$ $20$ $,$ $25$ $+$ $56$ $,$ $25$ $6$ $=$ $145$ $,$ $5$ $6$ $=$ ${displaystyle {frac {30,25+30,25+6,25+2,25+2,25+56,25}{6}}={frac {145,5}{6}}=}$ ${frac {30,25+30,25+6,25+2,25+2,25+56,25}{6}}={frac {145,5}{6}}=$ 24.25

{“smallUrl”:”https://www.wikihow.com/images_en/thumb/2/29/Calculate-Variance-Step-15.jpg/v4-460px-Calculate-Variance-Step-15.jpg”,” bigUrl”:”https://www.wikihow.com/images/thumb/2/29/Calculate-Variance-Step-15.jpg/v4-728px-Calculate-Variance-Step-15.jpg”,”smallWidth” :460,”smallHeight”:345,”bigWidth”:728,”bigHeight”:546,”licensing”:”<div class=”mw-parser-output”></div>”}

Contact formula. If you’re not sure how this fits with the formula given at the top of the method, write down the whole problem by hand, and don’t abbreviate it:

After finding the difference from the mean and squaring it, you have ( $x$ $first$ ${displaystyle x_{1}}$ $x_{1}$ – μ) $2$ ${displaystyle ^{2}}$ $^{2}$ , ( $x$ $2$ ${displaystyle x_{2}}$ $x_{2}$ – μ) $2$ ${displaystyle ^{2}}$ $^{2}$ , and so on until ( $x$ $n$ ${displaystyle x_{n}}$ $x_{n}$ – μ) $2$ ${displaystyle ^{2}}$ $^{2}$ , in there $x$ $n$ ${displaystyle x_{n}}$ $x_{n}$ is the last data point in the dataset.
To find the mean of these values, you add them up and divide by n: ( ( $x$ $first$ ${displaystyle x_{1}}$ $x_{1}$ – μ) $2$ ${displaystyle ^{2}}$ $^{2}$ + ( $x$ $2$ ${displaystyle x_{2}}$ $x_{2}$ – μ) $2$ ${displaystyle ^{2}}$ $^{2}$ + … + ( $x$ $n$ ${displaystyle x_{n}}$ $x_{n}$ – μ) $2$ ${displaystyle ^{2}}$ $^{2}$ ) / n
After rewriting the numerator in sigma notation, you have ^(∑(^$x$^$i$^{${displaystyle x_{i}}$}^{$x_{i}$ – μ)}^$2$^{${displaystyle ^{2}}$}^{$^{2}$ )} / _n , the variance formula.

Advice

Because variance is difficult to interpret, this value is often calculated as the starting point from which to find the standard deviation.
Using “n-1” instead of “n” in the denominator when analyzing samples is a technique known as the Bessel correction. The sample is only an estimate of a complete population, and the sample mean has a certain bias to match that estimate. This correction eliminates the upper bias. ^{[7] XResearch Source} It has to do with the fact that once n -1 data points are listed, the nth last point is already a constant, because only certain values are used to calculate the value. sample mean (x̅) in the variance formula. ^{[8] XResearch Sources}

Steps

Calculating the variance of a sample

Calculating the variance of a population

Advice

Related Posts