Introduction to Measures of Spread - Percentiles

What are percentiles?

Lets understand percentiles by taking an example, suppose a student scored 55 out of 100 on a test. How would you rate this performance? Is it bad? What if the questions were hard and most of the students scored in similar lines?

How do we rate this performance?

The answer is we could use Percentiles. The p percentile of a sample is a value such that p percentage of the value in the data are less than or equal to this value.

Lets take scores of 25 students in sorted order

19, 28, 34, 36, 37, 43, 44, 44, 46, 46, 47, 48, 51, 52, 54, 55, 55	56	59	60, 62, 62, 65, 66, 68
first 17th elements	18th element	19th element	last 6 elements

What is the 70th Percentile of the score

Procedure for Computing Percentile

Process 1 to compute Percentile

✅ Sort the data

✅ Compute location of the pth percentile

\(L_p = \frac{p}{100}(n+1) = \frac{70}{100}(25+1)=18.2\)

⛔ What does position 18.2 means?

As 18.2 is between 18 and 19, The 70th percentile should be between 56 and 59, greater than 56 but closer to 56 than 59.

Thus, \(56 + 0.2 *(59-56) = 56.6\)

✅ Integer part of \(L_p = i_p\)

◾ If \(L_p \: is\: an\: integer : Y_p = x_{i_p}\)

✅ Fractional part of \(L_p = f_p\)

✅ Compute pth percentile as

\(Y_p = x_{i_p}+f_p*(x_{i_{p+1}}-x_{i_p})\)

Alternative 1 to compute Percentile

The p-th percentile is that value in the data such that at least p percentage of the value are less than or equal to it and at least (100-p) percentage of the values are greater than or equal to it.

✅ Sort the data

✅ Compute location of the pth percentile

\(L_p = \frac{p}{100}(n)\)

✅ Integer part of \(L_p = i_p\)

◾\(L_p \: is\: an\: integer : Y_p = \frac{x_{i_p}+x_{L_{p+1}}}{2}\)

◾\(L_p \: is\: not\: an\: integer : Y_p = x_{i_{p+1}}\)

For example, Let the example data be

19, 28, 34, 36, 37, 43, 44, 44, 46, 46, 47, 48, 51, 52, 54, 55, 55, 56, 59, 60, 62, 62, 65, 66, 68

✅ At least 17.5 values should be less than or equal to it. (so the location should be 18 or higher)

✅ At least 7.5 values should be greater than or equal to it. (so the location should be 18 or lower)

Thus, location 18 (i.e. \(i_p +1\)) is the only location which satisfies both conditions

Similarly to find the 80th Percentile :

\(L_{80} = \frac{p}{100}(n) = \frac{80}{100}*25 = 20\)

Both locations 20 and 21 (i.e \(L_p\) & \(L_p\)+1) satisfy the above conditions so just take an average of these two values

\(Y_{80} = \frac{x_{20}+x_{21}}{2} = \frac{60+62}{2}=61\)

Alternative 2 to compute Percentile

✅ Sort the data

✅ Compute location of the pth percentile

\(L_p = \frac{p}{100}(n+1)\) (Same as 1st Process)

✅ Integer part of \(L_p = i_p\)

◾ If \(L_p \: is\: an\: integer : Y_p = x_{i_p}\) (Same as 1st Process)

✅ Fractional part of \(L_p = f_p\)

✅ Compute pth percentile as

\(Y_p = x_{i_p}+0.5*(x_{i_{p+1}}-x_{i_p})\) (Same as 1st Process except that we use 0.5 instead of \(f_p\))

It’s a loose approximation of the 1st process.

In general, all these approaches work well. In this course we will stick to the 1st Process to find the Percentile.

Frequently used Percentile

Quartiles

💠 Quartiles divide the data into four equal parts

19, 28, 34, 36, 37	43, 44, 44, 46, 46	47, 48, 51, 52, 54	55, 55, 56, 59, 60
first 25 % data	second 25 % data	third 25 % data	last 25 % data

🔥 Median is same as Q2 quantile.

Quintiles

💠 Quintiles divide the data into five equal parts

19, 28, 34, 36	37, 43, 44, 44	46, 46, 47, 48	51, 52, 54, 55	55, 56, 59, 60
first 20 % data	second 20 % data	third 20 % data	fourth 20% data	last 20% data

Deciles

💠 Deciles divide the data into ten equal parts

Compute the Percentile rank of a value in the data

✅The percentile rank of a value is the percentage of data that are less than or equal to it.

\(PR_S = \frac{c_s+0.5*f_S}{n}*100\)

where \(PRs\) = percentile rank of the score s

\(c_s\) = number of values less than s

\(f_s\) = = number of values equal to s

for example, for below marks of a student

19, 28, 34, 36, 37, 43, 44, 44, 47, 57, 57, 58, 68, 73, 73, 75, 75, 76, 76, 88, 89, 89, 95, 95, 96, 97

then, \(PR_{44} = \frac{6+0.5*2}{25}*100 =28\)

That means there are 28% of values less than 44.

Effect of Transformation on Percentiles

Lets say we have transformed the data (shifted by c and scaled by a)

then, new data will be

\[ x_{new} = a*x+c \]

As discussed earlier,

The pth percentile is given by

\(Y_p = x_{i_p}+f_p*(x_{i_{p+1}}-x_{i_p})\)

thus, for the transformed data

\(Y_p^{new} = x_{i_p}^{new}+f_p*(x_{i_{p+1}}^{new}-x_{i_p}^{new})\)

Using \(x_{new} = a*x+c\)

\(Y_p^{new} = (a*x_{i_{p}}+c)+f_p*(a*x_{i_{p+1}}+c -(a*x_{i_{p}}+c))\)

\(= a*x_{i_{p}}+c+f_p*a*(x_{i_{p+1}} -x_{i_{p}})\)

\(= a*(x_{i_{p}}+f_p*(x_{i_{p+1}} -x_{i_{p}}))+c\)

\(Y_p^{new}= a*Y_p+c\)

note

Thus, the new Percentile is the scaled and shifted version of the old percentile.