Statistics Test Questions and Answers
Q1. What is statistics and what are the stages of analysis?
Ans: Statistics is the branch of scientific method which deals with the data obtained by
counting or measuring the properties of populations of natural phenomena. The 'natural
phenomena' includes all the happenings of the external world, whether human or not.
OR
Statistics is the branch of mathematics where we collect, organize, analyze and represent
the data for batter decision making. We apply statistics to different problems. Deals with
Population, Sample Planning Designing.
There are two types of statistics
1) Descriptive Statistics
2) Inferential Statistics
Descriptive Statistics :
Descriptive Statistics is a summary that describes or summarizes the collection of
information/data. It summarizes the sample data rather than learning about the population
that sample data is representing.
Inferential Statistics :
Inferential Statistics is the process of data analysis where we make the conclusions about
population data using sample data.
Stages of Data Analytics
1) Descriptive Analysis : It means What is happened in the organization or firm?
2) Diagnostics Analysis : Why it is happening ?
3) Predictive Analysis: What will likely happen ?
4) Prescriptive Analysis : What can be done ?
Q2. Write and explain all the data types with two examples.
Ans: Different types of data
1) Quantitative Data
2) Qualitative Data
Qualitative Data
Information about something that can be sorted into different categories that can't be
described directly by numbers. With categorical data we can calculate statistics like
proportions.
Examples:
* Brands, Nationality
Quantitative Data
Information about something that is described by numbers. With numerical data we can
calculate statistics like the average.
Examples:
* Income, Age

Q3. State the case where the median is a better measure as compared to the mean.
Ans: When you have a symmetrical distribution for continuous data, the mean, median,
and mode are equal.
1) Both the mean and the median can be used to describe where the “center” of a dataset
is located.
2) It’s best to use the mean when the distribution of the data values is symmetrical and
there are no clear outliers.
3) It’s best to use the median when the the distribution of data values is skewed or when
there are clear outliers.
Q4. Look at the data given below. Plot the data, and find and remove the outliers.
Write down all the formulas required. Write five-number summary to plot BOX
plot.
Name of company Measure X
Allied Signal - 24.23
Bankers Trust - 25.53
General Mills - 25.41
ITT Industries - 24.14
JPMorgan & Co. - 29.62
Lehman Brothers - 28.25
Marriott - 25.81
MCI - 24.39
Merrill Lynch - 40.26
Microsoft - 32.95
Morgan Stanley - 91.36
Sun Microsystems - 25.99
Travelers - 39.42
US Airways - 26.71
Warner-Lambert - 35.00
Ans: The outlier formula also known as the 1.5 IQR rule , is a rule of thumb used for
identifying outliers. Outliers are extreme values that lie far from the other values in your
data set.

The outlier formula designates outliers based on an upper and lower boundary (you can
think of these as cutoff points). Any value that is 1.5 x IQR greater than the third quartile
is designated as an outlier and any value that is 1.5 x IQR less than the first quartile is
also designated as an outlier.
Anything above Q3+1.5 * IQR is an outlier
Anything below Q1+1.5 * IQR is an outlier
To find the outlier formula, we need to know what is quartiles (Q1, Q2, and Q3) and
the interquartile range (IQR).
Quartiles (Q1, Q2, Q3) divide a data set into four groups, each containing about 25% (or
a quarter) of the data points. There are three quartiles: Q1, Q2, and Q3. Q1 (also known
as the first quartile or lower quartile) is the 25th percentile of the data. Q2 (the second
quartile) is the 50th percentile or median of the data. Q3 (the third or upper quartile) is
the 75th percentile of the data.

Interquartile Range (IQR) : The Interquartile Range (IQR) is the distance between the
first and third quartile. Subtract the first quartile from the third quartile to find the
interquartile range.
IQR = Q3 - Q1
Now we, Know what is quartiles and the interquartile range, let’s go through a
step-by-step for finding outlier equation.
Sample Data (n =15)
List = [ 24.23, 25.53, 25.41, 24.14, 29.62, 28.25, 25.81, 24.39, 40.26, 32.95, 91.36,
25.99, 39.42, 26.71, 26.71, 35.00 ]
Arrange the data in order from smallest to largest.
List = [24.14, 24.23, 24.39, 25.41, 25.53, 25.81, 25.99, 26.71, 26.71, 28.25, 29.62, 32.95,
35.0, 39.42, 40.26, 91.36]
Find the first quartile, Q1
To find Q1, multiply 25/100 by the total number of data points (n). This will give you a
locator value, L. If L is a whole number, take the average of the Lth value of the data set
and the (L+1)th value. The average will be the first quartile. If L is not a whole number,
round L up to the nearest whole number and find the corresponding value in the data set.
That will be the first quartile.
L = (25/100)(n)
L=(25/100)* (15)
L = 3.75
3.75 is not a whole number , so round up the nearest whole number to get 4. The 4th
value is the data set is 25.41. Q1 = 25.41
Find the third quartile, Q3.
To find Q3, use the same method used to find Q1, except this time, multiply 75/100 by n
to get the locator value, L.
L = (75.100)(n) = (0.75)(15) = 11.25
11.25 is not a whole number, so round up the nearest whole number to get 11.
The 11th value in the data set is 29.62. Q3 = 29.62
Find the interquartile range IQR
The interquartile range is the difference between Q3 and Q1
IQR = Q3-Q1
IQR = 29.62 - 25.41 = 4.21
Now, Find the upper limit,
Upper limit = Q3 +1.5*IQR = 29.62 +1.5* 4.21 = 35.935
Now, Find the lower limit,
Lower limit = Q1 - 1.5*IQR = 25.41 - 1.5*4.21 = 19.095
Identify the outliers.
The outliers are any data points that lie above the upper limit or below the lower limit. In
this case, the outliers are 39.42, 40.26 and 91.36.
24.14, 24.23, 24.39, 25.41, 25.53, 25.81, 25.99, 26.71, 28.25, 29.62, 32.95, 35.00,
39.42, 40.26, 91.36
Q5. What is the 1st-moment business decision and describe where we use?
Ans: Measures of central tendency are also called as First moment of business decision.
first moment speaks about the center of the data point and indicates where the majority
of data points Lie. A measure of central tendency is a single value that attempts to
describe a set of data by identifying the central position within that set of data. As such,
measures of central tendency are sometimes called measures of central location. They
are also classed as summary statistics.
The mean (often called the average) is most likely the measure of central tendency ,but there are others, such as the median and the mode.
Mean: The mean (or average) is the most popular and well known measure of central
tendency. It can be used with both discrete and continuous data, although its use is most
often with continuous data (see our Types of Variable guide for data types). The mean is
equal to the sum of all the values in the data set divided by the number of values in the
data set
Median: The median is the middle score for a set of data that has been arranged in order of
magnitude. The median is less affected by outliers and skewed data.
Mode: The mode is the most frequent score in our data set. On a histogram it represents the
highest bar in a bar chart or histogram. You can, therefore, sometimes consider the
mode as being the most popular option.
Q7. What are population and sample in Inferential Statistics, and how are they different?
Ans: Inferential statistics helps a sample of data and make conclusions about its
population. A sample is a smaller data set drawn from a larger data set called the
population. If the sample does not represent the population, one cannot make accurate
estimations related to the latter. The purpose of inferential statistics is to infer the
behavior of a population.
Q8. What is kurtosis?
Ans: Kurtosis in statistics describes the distribution of the data set it is also called as 3rd
moment business decision. It depicts to what extent the data set points of a particular
distribution differ from the data of a normal distribution. In addition, one may use it to
determine whether a distribution contains extreme values.
There are three types of kurtosis: Mesokurtic, Leptokurtic, and Platykurtic
Mesokurtic: Distributions that are moderate in breadth and curves with a medium peaked height. When kurtosis is equal to 3, the distribution is mesokurtic. This means the kurtosis is the same as the normal distribution, it is mesokurtic (medium peak). The kurtosis of a
mesokurtic distribution is neither high nor low, rather it is considered to be a baseline for
the two other classifications.

Leptokurtic: More values in the distribution tails and more values close to the mean Positive excess values of kurtosis (>3) indicate that a distribution is peaked and possess thick tails.
Leptokurtic distributions have positive kurtosis values. A leptokurtic distribution has a
higher peak (thin bell) and taller (i.e. fatter and heavy) tails than a normal distribution.

Platykurtic: Fewer values in the tails and fewer values close to the mean (i.e. the curve has a flat peak and has more dispersed scores with lighter tails).
When kurtosis is equal to 0, the distribution is platykurtic.
A platykurtic distribution is flatter (less peaked) when compared with the normal
distribution, with fewer values in its shorter (i.e. lighter and thinner) tails. Negative
excess values of kurtosis (<3) indicate that a distribution is flat and has thin tails.
Platykurtic distributions have negative kurtosis values. A platykurtic distribution is flatter
(less peaked) when compared with the normal distribution, with fewer values in its
shorter (i.e. lighter and thinner) tails.