MEASURES OF CENTRAL TENDENCY
The Mean, Median and the Mode
(Grouped Data)
Once again, I welcome you, Power, to this refresher 'course' in preparation for our upcoming STAT131 test. :)
Before we begin solving questions with the electronic machine I programmed, ;D,
let's go through the formulas or formulae for a reminder and for comprehension.
So, let's get started. Click the next button or swipe if you're on mobile. ;)
(Grouped Data)
Formulae Explications:
Quick links:
The formulae for finding the mean, the median,
and the mode of any grouped data set are as given below:
1. The mean (coding method):
X̄ = A +
(
ΣfiUi
Σfi
)
× C
Where;
A = the assumed mean
(The assumed mean is the value of any class
mark chosen by you from the frequency table. Generally, it is advised
that you choose or take the median class mark as your assumed mean because
it makes finding Ui easier. For instance, if you
take an example, you'll notice
how the sequence of values in the Ui column of the table,
starting from the middle, 0 upwards, descends towards −∞
(negatively), and
ascends downwards towards +∞
(positively) with increments of 1 for each.
Note: Xi, the i'th class mark, is calculated by taking the average of the limits of the i'th class. As in this:
X4, i.e. the 4th class mark, will be equal to;
where UCL4 is the upper class limit of the 4th class, and
LCL4
is the lower class limit of the same. I hope you understand, Power?
Okay,
let's continue);
Note: Xi, the i'th class mark, is calculated by taking the average of the limits of the i'th class. As in this:
X4, i.e. the 4th class mark, will be equal to;
X4 =
UCL4 + LCL4;
2
fi = i'th class frequency
(Note: 'i' is a variable and is as a placeholder. As in the above
definition, the i'th class mark/frequency
in a table could be the 4th class mark/frequency or 6th class mark/frequency etc. in that
table.
Just know that wherever you see 'i', it is a variable; it changes, nothing more. For instance
f4,
(4 in place of 'i') is the frequency of the fourth class. Do you now comprende', Power?
Good. We're making progress. ;))
Ui =
Xi − A
C
(Xi is the i'th class mark, A
is the assumed mean, C is the class size);
C = class size...
(Calculated as: The difference between the upper class limit and the
lower class limit of any class + 1.
Mathematically:
(Ulimit − Llimit) + 1.
Get the logic? Nice. Moving forward.
Mathematically:
(Ulimit − Llimit) + 1.
Get the logic? Nice. Moving forward.
2. The median:
X̃ = Lmed +
(
N/2 − Σf<med
fmed
)
× C
Where;
Lmed = lower class boundary of median class
(The lower class boundary of the median class is obtained by substracting 0.5 from the lower class limit
of the same. For instance, say we have median class limits of 55 - 67, the lower class boundary
would then be 54.5);
N = Σf (read as summation-f or sigma-f)
(This is the same as the summation of the frequencies of all the classes);
Σf<med = cummulative frequency below median
class
(I like to call it "sigma-f below", i.e. summation of frequencies below median class);
fmed = frequency of median class
(This is obtained directly from the frequency distribution table);
C = class size
(Refer to class size under 'The mean' above).
3. The mode:
X̂ = Lmod +
(
D1
D1 + D2
)
× C
Where;
Lmod = lower class boundary of modal class
(The lower class boundary of the modal class is obtained by substracting 0.5 from the lower class limit
of
the same. See Lmed);
D1 = frequency difference between modal class and
next lower class
(This is obtained directly from the table. It is the difference between the frequencies of the modal
class
and the class just before it, the modal class.);
D2 = frequency difference between modal class and
next higher class
(This is also obtained directly from the table. It is the difference between the frequencies of the
modal
class and the class just after the modal class.);
C = class size
(Refer to class size under 'The mean' above).
Okay now, Power, having refreshed our minds on these formulas, let's move on to solve any questions you'll be providing. ;)
(Grouped Data)
The Distribution Computer:
In the input boxes above, for the first, input the class limits of
only the first class. You don't have to input the limits for all the classes. And for the
second, input the frequencies of all the classes respectively. Don't forget
to separate the values you input ('23' is not same as
'2 3', you know).
Note: For full width of table when result is displayed, rotate screen.;)
Note: For full width of table when result is displayed, rotate screen.;)
Let's solve:
Solving for the mean, from the values you provided, and from the table above, we have that:
A = A;
ΣfiUi = ΣfiUi;
Σfi = Σfi; and
C = C.
Therefore, using the mean formula:
We have,
The mean, therefore, of the frequency distribution is ≈ MEAN
.
A = A;
ΣfiUi = ΣfiUi;
Σfi = Σfi; and
C = C.
Therefore, using the mean formula:
X̄ = A +
(
ΣfiUi
Σfi
)
× C
X̄ = A +
(
ΣfiUi
Σfi
)
× C
X̄ =
A + (
EfU/Ef × C
)
In the same vein, solving for the median, we have:
Lmed = Lmed;
N = N;
Σf<med = Σf<med;
fmed = fmed;
C = C.
Therefore, using the median formula:
We also have that,
Hence, the median of the distribution is ≈ MEDIAN.
Lmed = Lmed;
N = N;
Σf<med = Σf<med;
fmed = fmed;
C = C.
Therefore, using the median formula:
X̃ = Lmed +
(
N/2 − Σf<med
fmed
)
× C
X̃ =
Lmed +
(
N/2 −
Σf<med
fmed
)
× C
X̃ =
Lmed +
(
N/2 × C
)
Finally, solving for the mode, we have that:
Lmod = Lmod;
D1 = D1;
D2 = D2; and
C = C.
Therefore, using the mode formula:
We have,
Consequentially, the mode of the distribution is ≈
MODE.
Alright, Power. Having understood all these, let's discuss a little bit
further.
Tap the 'next' button or swipe left. ;)
Lmod = Lmod;
D1 = D1;
D2 = D2; and
C = C.
Therefore, using the mode formula:
X̂ = Lmod +
(
D1
D1 + D2
)
× C
X̂ = Lmod
+
(
D1
D1 +
D2
)
× C
X̂ =
Lmod + (
D1/D2 × C )
(Grouped Data)
Further Discussion:
Alright. Now, Power? I'm sure you must have heard of these words that end with
"iles" (hmm...what about we call them "the iles"? Cool, right? ;)). Okay. They are
the
quartiles, the deciles,
and the percentiles.
When you meet them, do not let them scare you. They are easy to handle so long you can get the logic behind handling them well enough.
So, let's talk about them, about "the iles".
When you meet them, do not let them scare you. They are easy to handle so long you can get the logic behind handling them well enough.
So, let's talk about them, about "the iles".
1. The quartiles:
1. The quartiles:
The quartiles consist of four parts, (just like from the word, 'quarter' which implies a fourth
part of something), obviously
the first part, Q1, the second, Q2, the third, Q3 and the fourth,
Q4.
Now, we are going to consider only two of the parts here and discard the other two. I'll tell you why. Let's continue.
The two parts we'll consider are Q1 and Q3.
From the elaboration of quartiles and quarter above, we can say that Q1 is the 1'th part of the quartile (called as first-quartile) and Q3 is the 3'th part of the quartile (called as third-quartile). This means that Q1 = ¼ of Q4 and Q3 = ¾ of Q4. Notice I said, "...of Q4", because Q4 is itself the quartile. And also Q2, the 2'th part, would be 2/4 of Q4 which is equal to ½ of Q4 which is also the same as the median, because when we divide a dataset by 2 we get the median which we discussed earlier. So, Q2 is the median (half of the quartile) and Q4 is the quartile itself, the full package. Now, you see why we had to discard them both. Good. So, let's move forward.
Again, from the above illustration, we see how the quartiles are related to each other and the median.
Take this logic and never forget it, because you'll use the same when solving for deciles and percentiles:
If you take a look at the formula for finding the median of a grouped data, you'll notice there's a variable N involved in the calculation. This variable is the 'mother' quartile, I mean Q4; same with D10 for the deciles and P100 for the percentiles. Also, notice the variable, N, is divided by 2 which is the same as saying ½ of N which also implies ½ of Q4 which is, as well, equal to Q2, the median. Power? I hope you are grasping?
Now, using the same logic, we can compute the values of Q1 and Q3 by substituting '1/2' in the formula for '¼' and '¾' respectively since Q4 = N.
1.AFor Q1, the formula would look like this:
Now, we are going to consider only two of the parts here and discard the other two. I'll tell you why. Let's continue.
The two parts we'll consider are Q1 and Q3.
From the elaboration of quartiles and quarter above, we can say that Q1 is the 1'th part of the quartile (called as first-quartile) and Q3 is the 3'th part of the quartile (called as third-quartile). This means that Q1 = ¼ of Q4 and Q3 = ¾ of Q4. Notice I said, "...of Q4", because Q4 is itself the quartile. And also Q2, the 2'th part, would be 2/4 of Q4 which is equal to ½ of Q4 which is also the same as the median, because when we divide a dataset by 2 we get the median which we discussed earlier. So, Q2 is the median (half of the quartile) and Q4 is the quartile itself, the full package. Now, you see why we had to discard them both. Good. So, let's move forward.
Again, from the above illustration, we see how the quartiles are related to each other and the median.
Take this logic and never forget it, because you'll use the same when solving for deciles and percentiles:
If you take a look at the formula for finding the median of a grouped data, you'll notice there's a variable N involved in the calculation. This variable is the 'mother' quartile, I mean Q4; same with D10 for the deciles and P100 for the percentiles. Also, notice the variable, N, is divided by 2 which is the same as saying ½ of N which also implies ½ of Q4 which is, as well, equal to Q2, the median. Power? I hope you are grasping?
Now, using the same logic, we can compute the values of Q1 and Q3 by substituting '1/2' in the formula for '¼' and '¾' respectively since Q4 = N.
1.AFor Q1, the formula would look like this:
Q1 = LQ1 +
(
[¼ ×
N] − Σf<Q1
fQ1
)
× C
Where;
LQ1 = lower class boundary of the first-quartile
class
(Having known how to calculate for lower class boundaries, the quartile class in this case is the class
whose frequency has the result/value of [¼ × N] embedded in it. I.e. the value/result of [¼ × N] is in the range of values of the frequency of the class);
N = Q4 = Σf (read as summation-f or sigma-f)
(This is the same as the summation of the frequencies of all the classes);
Σf<Q1 = cummulative frequency below
Q1 class
(I.e. the summation of the frequencies below the frequecy of the quartile class, Q1 (the first
quartile class), in this case);
fQ1 = frequency of Q1 class
(This is obtained directly from the frequency distribution table having gotten the value of [¼ × N]);
C = class size
(This has already been discussed. Refer to class size under 'The mean' in the other page).
1.B The semi-interquartile-range:
1.B The semi-interquartile-range:
I'll try to make this as short as possible.
Power? I believe you know what 'range' is and how to calculate it?
The range is the difference between the highest value and the lowest value in a dataset...and also what 'semi' is, half of something, as in semi-circle.
Therefore, to compute the semi-interquartile-range, let's first break the expression into two ('interquartile-range' and 'semi', in that order). So, to compute the interquartile-range, we get the lowest of the inter-quartiles (notice I broke the word here again) and subtract it from the highest of the inter-quartiles.
The reason I broke the word, 'interquartile', is this:
You have to know that Q4 is not 'a part' of the quartile. It is not 'inter' the quartile. It is the quartile itself. It is not included in the 'inter' quartiles. So therefore, Q3 becomes the highest interquartile and Q1 becomes the lowest interquartile. Hence, the range of the interquartiles would be Q3 − Q1, and the semi-interquartile-range would then be the result of the interquartile-range halved. Mathematically:
Power? I believe you know what 'range' is and how to calculate it?
The range is the difference between the highest value and the lowest value in a dataset...and also what 'semi' is, half of something, as in semi-circle.
Therefore, to compute the semi-interquartile-range, let's first break the expression into two ('interquartile-range' and 'semi', in that order). So, to compute the interquartile-range, we get the lowest of the inter-quartiles (notice I broke the word here again) and subtract it from the highest of the inter-quartiles.
The reason I broke the word, 'interquartile', is this:
You have to know that Q4 is not 'a part' of the quartile. It is not 'inter' the quartile. It is the quartile itself. It is not included in the 'inter' quartiles. So therefore, Q3 becomes the highest interquartile and Q1 becomes the lowest interquartile. Hence, the range of the interquartiles would be Q3 − Q1, and the semi-interquartile-range would then be the result of the interquartile-range halved. Mathematically:
Qrange =
Q3 − Q1
2
2. The deciles:
2. The deciles:
Just like the quartiles, the deciles also take the form of the median.
But before we continue, take note of the term "decile", 'dec-' deals with a tenth (10th). For instances, as we know, 'dec-imal', 'dec-imeter', 'dec-ade' etc. They all refer to a '10th'.
Why I chipped that in is this: The i'th decile is calculated by just substituting '1/2' for 'i/2' accordingly just as we did with quartiles. As in this:
If we are asked to find the i'th decile, say 9th decile, of a grouped data, we simply do this: substitute '1/2' for '9/10'.
Note: For deciles; i: 0 < i < 10 (i.e. i is greater than 0 and is less than 10)'.
The formula would then look like this, for D9:
But before we continue, take note of the term "decile", 'dec-' deals with a tenth (10th). For instances, as we know, 'dec-imal', 'dec-imeter', 'dec-ade' etc. They all refer to a '10th'.
Why I chipped that in is this: The i'th decile is calculated by just substituting '1/2' for 'i/2' accordingly just as we did with quartiles. As in this:
If we are asked to find the i'th decile, say 9th decile, of a grouped data, we simply do this: substitute '1/2' for '9/10'.
Note: For deciles; i: 0 < i < 10 (i.e. i is greater than 0 and is less than 10)'.
The formula would then look like this, for D9:
D9 = LD9 +
(
[9/10 × N] −
Σf<D9
fD9
)
× C
Where;
LD9 = lower class boundary of the 9th-decile
class
(The 9th-decile class is the class whose frequency, having gotten the value/result
of
[9/10 × N], is in the range of values of
the
frequency of the class);
N = D10 = Σf (read as summation-f or
sigma-f)
(This is the same as the summation of the frequencies of all the classes);
Σf<D9 = cummulative frequency below
D9 class
(I.e. the summation of the frequencies below the frequecy of the decile class, D9 (the 9th decile
class), in this case);
fD9 = frequency of D9 class
(This is obtained directly from the frequency distribution table having gotten the value of [9/10
× D10]);
C = class size
(See class size under 'The mean' in the
other
page).
3. The percentiles:
3. The percentiles:
Alright. This will be the last of all the things I've been discussing so far. It's not been easy getting these
done. I'm really tired now. I've got to rest. :)
So, finally, Power, the percentiles...
Just use the same logic as discussed above.
We know that 'percentile' is a derivative of the word, 'percent' which means a hundred. So finding, for instance, say the 86th percentile, we just substitute '1/2' in the median formula for ' 86/100', then we solve. As easy as that. Mathematically:
Alright. So that will be all for now.
Thanks for taking time out and also for taking your time to study these for yourself. See you later, Power. Byeee! ;)
PS: Lest I forget...HAPPY NEW YEEEAR, 2018! ;D
So, finally, Power, the percentiles...
Just use the same logic as discussed above.
We know that 'percentile' is a derivative of the word, 'percent' which means a hundred. So finding, for instance, say the 86th percentile, we just substitute '1/2' in the median formula for ' 86/100', then we solve. As easy as that. Mathematically:
P86 = LP86 +
(
[86/100 × N]
− Σf<P86
fP86
)
× C
Alright. So that will be all for now.
Thanks for taking time out and also for taking your time to study these for yourself. See you later, Power. Byeee! ;)
PS: Lest I forget...HAPPY NEW YEEEAR, 2018! ;D