MEASURES OF CENTRAL TENDENCY

The Mean, Median and the Mode

(Grouped Data)

Once again, I welcome you, Power, to this refresher 'course' in preparation for our upcoming STAT131 test. :)

Before we begin solving questions with the electronic machine I programmed, ;D, let's go through the formulas or formulae for a reminder and for comprehension.
So, let's get started. Click the next button or swipe if you're on mobile. ;)

(Grouped Data)

Formulae Explications:

Quick links:

The mean
The median
The mode

The formulae for finding the mean, the median, and the mode of any grouped data set are as given below:

1. The mean (coding method):

X̄ = A +

( Σf_iU_i Σf_i )

× C

Where;

A = the assumed mean

(The assumed mean is the value of any class mark chosen by you from the frequency table. Generally, it is advised that you choose or take the median class mark as your assumed mean because it makes finding U_i easier. For instance, if you take an example, you'll notice how the sequence of values in the U_i column of the table, starting from the middle, 0 upwards, descends towards −∞ (negatively), and ascends downwards towards +∞ (positively) with increments of 1 for each.

Note: X_i, the i'th class mark, is calculated by taking the average of the limits of the i'th class. As in this:
X₄, i.e. the 4th class mark, will be equal to;

X₄ =

UCL₄ + LCL₄; 2

where UCL₄ is the upper class limit of the 4th class, and LCL₄ is the lower class limit of the same. I hope you understand, Power? Okay, let's continue);

f_i = i'th class frequency

(Note: 'i' is a variable and is as a placeholder. As in the above definition, the i'th class mark/frequency in a table could be the 4th class mark/frequency or 6th class mark/frequency etc. in that table. Just know that wherever you see 'i', it is a variable; it changes, nothing more. For instance f₄, (4 in place of 'i') is the frequency of the fourth class. Do you now comprende', Power? Good. We're making progress. ;))

U_i =

X_i − A C

(X_i is the i'th class mark, A is the assumed mean, C is the class size);

C = class size...

(Calculated as: The difference between the upper class limit and the lower class limit of any class + 1.
Mathematically:
(U_limit − L_limit) + 1.
Get the logic? Nice. Moving forward.

2. The median:

X̃ = L_med +

( ^N/₂ − Σf_<med f_med )

× C

Where;

L_med = lower class boundary of median class

(The lower class boundary of the median class is obtained by substracting 0.5 from the lower class limit of the same. For instance, say we have median class limits of 55 - 67, the lower class boundary would then be 54.5);

N = Σf (read as summation-f or sigma-f)

(This is the same as the summation of the frequencies of all the classes);

Σf_<med = cummulative frequency below median class

(I like to call it "sigma-f below", i.e. summation of frequencies below median class);

f_med = frequency of median class

(This is obtained directly from the frequency distribution table);

C = class size

(Refer to class size under 'The mean' above).

3. The mode:

X̂ = L_mod +

( D₁ D₁ + D₂ )

× C

Where;

L_mod = lower class boundary of modal class

(The lower class boundary of the modal class is obtained by substracting 0.5 from the lower class limit of the same. See L_med);

D₁ = frequency difference between modal class and next lower class

(This is obtained directly from the table. It is the difference between the frequencies of the modal class and the class just before it, the modal class.);

D₂ = frequency difference between modal class and next higher class

(This is also obtained directly from the table. It is the difference between the frequencies of the modal class and the class just after the modal class.);

C = class size

(Refer to class size under 'The mean' above).

Okay now, Power, having refreshed our minds on these formulas, let's move on to solve any questions you'll be providing. ;)

(Grouped Data)

The Distribution Computer:

In the input boxes above, for the first, input the class limits of only the first class. You don't have to input the limits for all the classes. And for the second, input the frequencies of all the classes respectively. Don't forget to separate the values you input ('23' is not same as '2 3', you know).
Note: For full width of table when result is displayed, rotate screen.;)

Let's solve:

Solving for the mean, from the values you provided, and from the table above, we have that:
A = A;
Σf_iU_i = Σf_iU_i;
Σf_i = Σf_i; and
C = C.

Therefore, using the mean formula:

X̄ = A +

( Σf_iU_i Σf_i )

× C

We have,

X̄ = A +

( Σf_iU_i Σf_i )

× C

X̄ = A + ( EfU/Ef × C )

The mean, therefore, of the frequency distribution is ≈ MEAN .

In the same vein, solving for the median, we have:
L_med = L_med;
N = N;
Σf_<med = Σf_<med;
f_med = f_med;
C = C.

Therefore, using the median formula:

X̃ = L_med +

( ^N/₂ − Σf_<med f_med )

× C

We also have that,

X̃ = L_med +

( ^N/₂ − Σf_<med f_med )

× C

X̃ = Lmed + ( N/2 × C )

Hence, the median of the distribution is ≈ MEDIAN.

Finally, solving for the mode, we have that:
L_mod = L_mod;
D₁ = D₁;
D₂ = D₂; and
C = C.

Therefore, using the mode formula:

X̂ = L_mod +

( D₁ D₁ + D₂ )

× C

We have,

X̂ = L_mod +

( D₁ D₁ + D₂ )

× C

X̂ = Lmod + ( D1/D2 × C )

Consequentially, the mode of the distribution is ≈ MODE.

Alright, Power. Having understood all these, let's discuss a little bit further. Tap the 'next' button or swipe left. ;)

(Grouped Data)

Further Discussion:

Quick links:

The quartiles
1. The quartile formula
2. The semi-interquartile-range
The deciles
1. The decile formula
The percentiles
1. The percentile formula

Alright. Now, Power? I'm sure you must have heard of these words that end with "iles" (hmm...what about we call them "the iles"? Cool, right? ;)). Okay. They are the quartiles, the deciles, and the percentiles.
When you meet them, do not let them scare you. They are easy to handle so long you can get the logic behind handling them well enough.
So, let's talk about them, about "the iles".

1. The quartiles:

The quartiles consist of four parts, (just like from the word, 'quarter' which implies a fourth part of something), obviously the first part, Q1, the second, Q2, the third, Q3 and the fourth, Q4.
Now, we are going to consider only two of the parts here and discard the other two. I'll tell you why. Let's continue.
The two parts we'll consider are Q1 and Q3.

From the elaboration of quartiles and quarter above, we can say that Q1 is the 1'th part of the quartile (called as first-quartile) and Q3 is the 3'th part of the quartile (called as third-quartile). This means that Q1 = ¼ of Q4 and Q3 = ¾ of Q4. Notice I said, "...of Q4", because Q4 is itself the quartile. And also Q2, the 2'th part, would be ²/₄ of Q4 which is equal to ½ of Q4 which is also the same as the median, because when we divide a dataset by 2 we get the median which we discussed earlier. So, Q2 is the median (half of the quartile) and Q4 is the quartile itself, the full package. Now, you see why we had to discard them both. Good. So, let's move forward.

Again, from the above illustration, we see how the quartiles are related to each other and the median.

Take this logic and never forget it, because you'll use the same when solving for deciles and percentiles:
If you take a look at the formula for finding the median of a grouped data, you'll notice there's a variable N involved in the calculation. This variable is the 'mother' quartile, I mean Q4; same with D10 for the deciles and P100 for the percentiles. Also, notice the variable, N, is divided by 2 which is the same as saying ½ of N which also implies ½ of Q4 which is, as well, equal to Q2, the median. Power? I hope you are grasping?
Now, using the same logic, we can compute the values of Q1 and Q3 by substituting '¹/₂' in the formula for '¼' and '¾' respectively since Q4 = N.

1.AFor Q1, the formula would look like this:

Q1 = L_Q1 +

( [¼ × N] − Σf_<Q1 f_Q1 )

× C

Where;

L_Q1 = lower class boundary of the first-quartile class

(Having known how to calculate for lower class boundaries, the quartile class in this case is the class whose frequency has the result/value of [¼ × N] embedded in it. I.e. the value/result of [¼ × N] is in the range of values of the frequency of the class);

N = Q4 = Σf (read as summation-f or sigma-f)

(This is the same as the summation of the frequencies of all the classes);

Σf_<Q1 = cummulative frequency below Q1 class

(I.e. the summation of the frequencies below the frequecy of the quartile class, Q1 (the first quartile class), in this case);

f_Q1 = frequency of Q1 class

(This is obtained directly from the frequency distribution table having gotten the value of [¼ × N]);

C = class size

(This has already been discussed. Refer to class size under 'The mean' in the other page).

1.B The semi-interquartile-range:

I'll try to make this as short as possible.

Power? I believe you know what 'range' is and how to calculate it?
The range is the difference between the highest value and the lowest value in a dataset...and also what 'semi' is, half of something, as in semi-circle.
Therefore, to compute the semi-interquartile-range, let's first break the expression into two ('interquartile-range' and 'semi', in that order). So, to compute the interquartile-range, we get the lowest of the inter-quartiles (notice I broke the word here again) and subtract it from the highest of the inter-quartiles.
The reason I broke the word, 'interquartile', is this:
You have to know that Q4 is not 'a part' of the quartile. It is not 'inter' the quartile. It is the quartile itself. It is not included in the 'inter' quartiles. So therefore, Q3 becomes the highest interquartile and Q1 becomes the lowest interquartile. Hence, the range of the interquartiles would be Q3 − Q1, and the semi-interquartile-range would then be the result of the interquartile-range halved. Mathematically:

Q_range =

Q3 − Q1 2

2. The deciles:

Just like the quartiles, the deciles also take the form of the median.
But before we continue, take note of the term "decile", 'dec-' deals with a tenth (10th). For instances, as we know, 'dec-imal', 'dec-imeter', 'dec-ade' etc. They all refer to a '10th'.
Why I chipped that in is this: The i'th decile is calculated by just substituting '¹/₂' for 'ⁱ/₂' accordingly just as we did with quartiles. As in this:
If we are asked to find the i'th decile, say 9th decile, of a grouped data, we simply do this: substitute '¹/₂' for '⁹/₁₀'.
Note: For deciles; i: 0 < i < 10 (i.e. i is greater than 0 and is less than 10)'.
The formula would then look like this, for D9:

D9 = L_D9 +

( [⁹/₁₀ × N] − Σf_<D9 f_D9 )

× C

Where;

L_D9 = lower class boundary of the 9th-decile class

(The 9th-decile class is the class whose frequency, having gotten the value/result of [⁹/₁₀ × N], is in the range of values of the frequency of the class);

N = D10 = Σf (read as summation-f or sigma-f)

(This is the same as the summation of the frequencies of all the classes);

Σf_<D9 = cummulative frequency below D9 class

(I.e. the summation of the frequencies below the frequecy of the decile class, D9 (the 9th decile class), in this case);

f_D9 = frequency of D9 class

(This is obtained directly from the frequency distribution table having gotten the value of [⁹/₁₀ × D10]);

C = class size

(See class size under 'The mean' in the other page).

3. The percentiles:

Alright. This will be the last of all the things I've been discussing so far. It's not been easy getting these done. I'm really tired now. I've got to rest. :)

So, finally, Power, the percentiles...
Just use the same logic as discussed above.
We know that 'percentile' is a derivative of the word, 'percent' which means a hundred. So finding, for instance, say the 86th percentile, we just substitute '¹/₂' in the median formula for '⁸⁶/₁₀₀', then we solve. As easy as that. Mathematically:

P86 = L_P86 +

( [⁸⁶/₁₀₀ × N] − Σf_<P86 f_P86 )

× C

Alright. So that will be all for now.
Thanks for taking time out and also for taking your time to study these for yourself. See you later, Power. Byeee! ;)

PS: Lest I forget...HAPPY NEW YEEEAR, 2018! ;D

Mean, Median and Mode

Solve Questions

Quartiles, Deciles and Percentiles

MEASURES OF CENTRAL TENDENCY

The Mean, Median and the Mode

(Grouped Data)

(Grouped Data)

Formulae Explications:

(Grouped Data)

The Distribution Computer:

(Grouped Data)

Further Discussion:

1. The quartiles:

1.B The semi-interquartile-range:

2. The deciles:

3. The percentiles: