Introduction to Measures of Spread - Percentiles
What are percentiles?
Lets understand percentiles by taking an example, suppose a student scored 55 out of 100 on a test. How would you rate this performance? Is it bad? What if the questions were hard and most of the students scored in similar lines?
How do we rate this performance?
The answer is we could use Percentiles. The p percentile of a sample is a value such that p percentage of the value in the data are less than or equal to this value.
Lets take scores of 25 students in sorted order
19, 28, 34, 36, 37, 43, 44, 44, 46, 46, 47, 48, 51, 52, 54, 55, 55 | 56 | 59 | 60, 62, 62, 65, 66, 68 |
---|---|---|---|
first 17th elements | 18th element | 19th element | last 6 elements |
What is the 70th Percentile of the score
Procedure for Computing Percentile
Process 1 to compute Percentile
β Sort the data
β Compute location of the pth percentile
\(L_p = \frac{p}{100}(n+1) = \frac{70}{100}(25+1)=18.2\)
β What does position 18.2 means?
As 18.2 is between 18 and 19, The 70th percentile should be between 56 and 59, greater than 56 but closer to 56 than 59.
Thus, \(56 + 0.2 *(59-56) = 56.6\)
β Integer part of \(L_p = i_p\)
βΎ If \(L_p \: is\: an\: integer : Y_p = x_{i_p}\)
β Fractional part of \(L_p = f_p\)
β Compute pth percentile as
\(Y_p = x_{i_p}+f_p*(x_{i_{p+1}}-x_{i_p})\)
Alternative 1 to compute Percentile
The p-th percentile is that value in the data such that at least p percentage of the value are less than or equal to it and at least (100-p) percentage of the values are greater than or equal to it.
β Sort the data
β Compute location of the pth percentile
\(L_p = \frac{p}{100}(n)\)
β Integer part of \(L_p = i_p\)
βΎ\(L_p \: is\: an\: integer : Y_p = \frac{x_{i_p}+x_{L_{p+1}}}{2}\)
βΎ\(L_p \: is\: not\: an\: integer : Y_p = x_{i_{p+1}}\)
For example, Let the example data be
19, 28, 34, 36, 37, 43, 44, 44, 46, 46, 47, 48, 51, 52, 54, 55, 55, 56, 59, 60, 62, 62, 65, 66, 68 |
---|
β At least 17.5 values should be less than or equal to it. (so the location should be 18 or higher)
β At least 7.5 values should be greater than or equal to it. (so the location should be 18 or lower)
Thus, location 18 (i.e. \(i_p +1\)) is the only location which satisfies both conditions
Similarly to find the 80th Percentile :
\(L_{80} = \frac{p}{100}(n) = \frac{80}{100}*25 = 20\)
Both locations 20 and 21 (i.e \(L_p\) & \(L_p\)+1) satisfy the above conditions so just take an average of these two values
\(Y_{80} = \frac{x_{20}+x_{21}}{2} = \frac{60+62}{2}=61\)
Alternative 2 to compute Percentile
β Sort the data
β Compute location of the pth percentile
\(L_p = \frac{p}{100}(n+1)\) (Same as 1st Process)
β Integer part of \(L_p = i_p\)
βΎ If \(L_p \: is\: an\: integer : Y_p = x_{i_p}\) (Same as 1st Process)
β Fractional part of \(L_p = f_p\)
β Compute pth percentile as
\(Y_p = x_{i_p}+0.5*(x_{i_{p+1}}-x_{i_p})\) (Same as 1st Process except that we use 0.5 instead of \(f_p\))
Itβs a loose approximation of the 1st process.
In general, all these approaches work well. In this course we will stick to the 1st Process to find the Percentile.
Frequently used Percentile
Quartiles
π Quartiles divide the data into four equal parts
19, 28, 34, 36, 37 | 43, 44, 44, 46, 46 | 47, 48, 51, 52, 54 | 55, 55, 56, 59, 60 |
---|---|---|---|
first 25 % data | second 25 % data | third 25 % data | last 25 % data |
π₯ Median is same as Q2 quantile.
Quintiles
π Quintiles divide the data into five equal parts
19, 28, 34, 36 | 37, 43, 44, 44 | 46, 46, 47, 48 | 51, 52, 54, 55 | 55, 56, 59, 60 |
---|---|---|---|---|
first 20 % data | second 20 % data | third 20 % data | fourth 20% data | last 20% data |
Deciles
π Deciles divide the data into ten equal parts
Compute the Percentile rank of a value in the data
β The percentile rank of a value is the percentage of data that are less than or equal to it.
\(PR_S = \frac{c_s+0.5*f_S}{n}*100\)
where \(PRs\) = percentile rank of the score s
\(c_s\) = number of values less than s
\(f_s\) = = number of values equal to s
for example, for below marks of a student
19, 28, 34, 36, 37, 43, 44, 44, 47, 57, 57, 58, 68, 73, 73, 75, 75, 76, 76, 88, 89, 89, 95, 95, 96, 97 |
---|
then, \(PR_{44} = \frac{6+0.5*2}{25}*100 =28\)
That means there are 28% of values less than 44.
Effect of Transformation on Percentiles
Lets say we have transformed the data (shifted by c and scaled by a)
then, new data will be
As discussed earlier,
The pth percentile is given by
\(Y_p = x_{i_p}+f_p*(x_{i_{p+1}}-x_{i_p})\)
thus, for the transformed data
\(Y_p^{new} = x_{i_p}^{new}+f_p*(x_{i_{p+1}}^{new}-x_{i_p}^{new})\)
Using \(x_{new} = a*x+c\)
\(Y_p^{new} = (a*x_{i_{p}}+c)+f_p*(a*x_{i_{p+1}}+c -(a*x_{i_{p}}+c))\)
\(= a*x_{i_{p}}+c+f_p*a*(x_{i_{p+1}} -x_{i_{p}})\)
\(= a*(x_{i_{p}}+f_p*(x_{i_{p+1}} -x_{i_{p}}))+c\)
\(Y_p^{new}= a*Y_p+c\)
note
Thus, the new Percentile is the scaled and shifted version of the old percentile.