Descriptive Statistics
Different types of data
graph TD
Data --> Qualitative --> Nominal
Qualitative --> Ordinal
Data --> Quantitative --> Discrete
Quantitative --> Continuous
Lets take an example of a collection of shirt π with different attributes, for example
β
Color π₯ π¦ π©
β Pattern π― βοΈ β³οΈ
β Size π
β Rating β | ββ | βββ
β Price 385 | 319.44 | 674.11
β Discount 7.5% | 30.5% | 20%
Even for this small dataset we have a lot of variety among the data.
The data above can be divided into different types as below
Qualitative Data
β Color π °οΈ π¦ π©
β Pattern π― βοΈ β³οΈ
β Size π
β Rating β ββ βββ
Qualitative or categorical attributes are those which describe the object under consideration using a finite set of discrete classes.
Nominal :
Now if we take just the example of Color and Pattern, there is no natural ordering in these attributes
Nominal attributes are those qualitative attributes in which there is no natural ordering in the values that an attribute can take.
Ordinal :
Whereas for Size and Rating attributes, there is a natural ordering in these attributes
Ordinal attributes are those qualitative attributes in which there is a natural ordering in the values that an attribute can take.
Letβs see an example of Ordinal and Nominal Diseases :
Nominal | Ordinal | |
---|---|---|
Employee ππ½ββοΈ | Gender (Male, Female, Other) | Income Range (low, med., High) |
Healthcare β | Disease (Non -) Communicable | Health Risk (Small, Med., Large) |
Agriculture π | Crop Type (Kharif, Rabi) | Farm Type (Small, Med. , Large) |
Government π¦ | Nationality (Indian, Nepalese etc.) | Opinion (Agree, Neutral, Disagree) |
Quantitative Data :
β Price 385 319.44 674.11
β Discount 7.5% 30.5% 20%
All the attributes have numerical values
Quick Recap of types of numbers
Quantitative attributes are those which have numerical values and which are used to count or measure certain properties of a population
Discrete :
β No of buttons π : 12 15 17
β Days of Delivery π : 1 4 6
Discrete attributes are those quantitative attributes which can take on only a finite numbers of numerical values (Integers).
Continuous :
β Price 385 319.44 674.11
β Discount 7.5% 30.5% 20%
Continuous attributes refer to quantitative attributes which can take on fractional values (Real Numbers).
It need not be fractional values all the times, as long as some of them are fractional, or the attribute can take fractional values.
Continuous | Discrete | |
---|---|---|
Employee ππ½ββοΈ | Income tax, gross salary | # Projects, #family members |
Healthcare β | Cholesterol level, sugar level | days of treatment, weeks of pregnancy |
Agriculture π | Total yield, acres | # of farmers, # of crops farmers |
Government π¦ | GDP, Tax rates | # of citizens, # of villages |
Ordinal V/S Discrete
Lets take an example of ratings where very poor is denoted by1, poor by 2 and so on.
Why is ratings not discrete (quantitative) ?
Although expressed as numbers the notion of distance is not well defined. ie. the differnce between very poor and poor, or poor and okay need not be same in notion.
In simple terms, the distance between good and very good may not be the same as the difference between good and okay, although the difference in the numeric rating may be the same.
But why bother about data types?
The type of statistical analysis depends on the type of variable
For example, let's look at below example to see how statistical analysis depends on the type of values.
Thus, in case of Qualitative Attributes, can we answer the below questions.
β What is the average color of all the shirts in my catalogue?
β What is the average nationality of students in this course?
β What is the frequency of the colour red.
Similarly based on the nature of data whether qualitative or quantitative we can perform certain tests. Thus for qualitative attributes we will learn some of these in later chapters.
β Regression Analysis
β Analysis of Variance (ANOVA)
β Chi-square test
Similarly in case of Quantitative Attributes we can answer below questions
Quantitative Attributes (Discrete)
β What is the average value in the dataset?
β What is the spread of the data?
β What is the frequency of a given value?
β Regression Analysis
Quantitative Attributes (Continuous)
β Regression Analysis
β What is the average value in the dataset?
β What is the spread of the data?
β What is the frequency of a given value?
How to describe Qualitative Data?
One way to describe qualitative data is to describe the frequency of the data.
Frequency of a value
β How many times does the color red appear?
The count of the total number of times a value appear in the data is called its frequency
Thus in a frequency plot,
β Horizontal Axis : Value of the categorical attribute
β Vertical Axis : Counts of these values
β Height of bar : Proportional to count
An example of a Frequency plot is a long-tailed distribution.
Frequency Plots (Long-Tailed Distributions)
-
A large number of tall bars are at the beginning of the plot
-
A large number of short bars at the end
-
Very common in many real world scenarios
Frequency Plots (Uniform Distribution)
- All values are equally likely
Relative Frequency Plots
What percentage of farms grow groundnut?
Relative frequencies are easier to interpret than absolute frequencies
Grouped Frequency Bar Charts
Has the farming pattern changed across years?
β Compare different sets of data
β Each bar corresponds to one set
We will learn a lot more about Plooting and types of distributions etc in later chapters in detail.