# Statistics Test Questions and Answers

**Q1. What is statistics and what are the stages of analysis?**

**Ans:** Statistics is the branch of scientific method which deals with the data obtained by

counting or measuring the properties of populations of natural phenomena. The 'natural

phenomena' includes all the happenings of the external world, whether human or not.

OR

Statistics is the branch of mathematics where we collect, organize, analyze and represent

the data for batter decision making. We apply statistics to different problems. Deals with

Population, Sample Planning Designing.

There are two types of statistics

1) Descriptive Statistics

2) Inferential Statistics

**Descriptive Statistics :**

Descriptive Statistics is a summary that describes or summarizes the collection of

information/data. It summarizes the sample data rather than learning about the population

that sample data is representing.

**Inferential Statistics :**

Inferential Statistics is the process of data analysis where we make the conclusions about

population data using sample data.

**Stages of Data Analytics**

1) Descriptive Analysis : It means What is happened in the organization or firm?

2) Diagnostics Analysis : Why it is happening ?

3) Predictive Analysis: What will likely happen ?

4) Prescriptive Analysis : What can be done ?

**Q2. Write and explain all the data types with two examples.**

**Ans:** Different types of data

1) Quantitative Data

2) Qualitative Data

**Qualitative Data**

Information about something that can be sorted into different categories that can't be

described directly by numbers. With categorical data we can calculate statistics like

proportions.

Examples:

* Brands, Nationality

**Quantitative Data**

Information about something that is described by numbers. With numerical data we can

calculate statistics like the average.

Examples:

* Income, Age

**Q3. State the case where the median is a better measure as compared to the mean.**

**Ans:** When you have a symmetrical distribution for continuous data, the mean, median,

and mode are equal.

1) Both the mean and the median can be used to describe where the “center” of a dataset

is located.

2) It’s best to use the mean when the distribution of the data values is symmetrical and

there are no clear outliers.

3) It’s best to use the median when the the distribution of data values is skewed or when

there are clear outliers.

**Q4. Look at the data given below. Plot the data, and find and remove the outliers.**

Write down all the formulas required. Write five-number summary to plot BOX

plot.

Name of company Measure X

Allied Signal - 24.23

Bankers Trust - 25.53

General Mills - 25.41

ITT Industries - 24.14

JPMorgan & Co. - 29.62

Lehman Brothers - 28.25

Marriott - 25.81

MCI - 24.39

Merrill Lynch - 40.26

Microsoft - 32.95

Morgan Stanley - 91.36

Sun Microsystems - 25.99

Travelers - 39.42

US Airways - 26.71

Warner-Lambert - 35.00

**Ans: **The outlier formula also known as the 1.5 IQR rule , is a rule of thumb used for

identifying outliers. Outliers are extreme values that lie far from the other values in your

data set.

The outlier formula designates outliers based on an upper and lower boundary (you can

think of these as cutoff points). Any value that is 1.5 x IQR greater than the third quartile

is designated as an outlier and any value that is 1.5 x IQR less than the first quartile is

also designated as an outlier.

Anything above Q3+1.5 * IQR is an outlier

Anything below Q1+1.5 * IQR is an outlier

To find the outlier formula, we need to know what is quartiles (Q1, Q2, and Q3) and

the interquartile range (IQR).

Quartiles (Q1, Q2, Q3) divide a data set into four groups, each containing about 25% (or

a quarter) of the data points. There are three quartiles: Q1, Q2, and Q3. Q1 (also known

as the first quartile or lower quartile) is the 25th percentile of the data. Q2 (the second

quartile) is the 50th percentile or median of the data. Q3 (the third or upper quartile) is

the 75th percentile of the data.

**Interquartile Range (IQR) :** The Interquartile Range (IQR) is the distance between the

first and third quartile. Subtract the first quartile from the third quartile to find the

interquartile range.

IQR = Q3 - Q1

Now we, Know what is quartiles and the interquartile range, let’s go through a

step-by-step for finding outlier equation.

Sample Data (n =15)

List = [ 24.23, 25.53, 25.41, 24.14, 29.62, 28.25, 25.81, 24.39, 40.26, 32.95, 91.36,

25.99, 39.42, 26.71, 26.71, 35.00 ]

Arrange the data in order from smallest to largest.

List = [24.14, 24.23, 24.39, 25.41, 25.53, 25.81, 25.99, 26.71, 26.71, 28.25, 29.62, 32.95,

35.0, 39.42, 40.26, 91.36]

Find the first quartile, Q1

To find Q1, multiply 25/100 by the total number of data points (n). This will give you a

locator value, L. If L is a whole number, take the average of the Lth value of the data set

and the (L+1)th value. The average will be the first quartile. If L is not a whole number,

round L up to the nearest whole number and find the corresponding value in the data set.

That will be the first quartile.

L = (25/100)(n)

L=(25/100)* (15)

L = 3.75

3.75 is not a whole number , so round up the nearest whole number to get 4. The 4th

value is the data set is 25.41. Q1 = 25.41

Find the third quartile, Q3.

To find Q3, use the same method used to find Q1, except this time, multiply 75/100 by n

to get the locator value, L.

L = (75.100)(n) = (0.75)(15) = 11.25

11.25 is not a whole number, so round up the nearest whole number to get 11.

The 11th value in the data set is 29.62. Q3 = 29.62

Find the interquartile range IQR

The interquartile range is the difference between Q3 and Q1

IQR = Q3-Q1

IQR = 29.62 - 25.41 = 4.21

Now, Find the upper limit,

Upper limit = Q3 +1.5*IQR = 29.62 +1.5* 4.21 = 35.935

Now, Find the lower limit,

Lower limit = Q1 - 1.5*IQR = 25.41 - 1.5*4.21 = 19.095

Identify the outliers.

The outliers are any data points that lie above the upper limit or below the lower limit. In

this case, the outliers are 39.42, 40.26 and 91.36.

24.14, 24.23, 24.39, 25.41, 25.53, 25.81, 25.99, 26.71, 28.25, 29.62, 32.95, 35.00,

**39.42, 40.26, 91.36**

**Q5. What is the 1st-moment business decision and describe where we use?**

**Ans:** Measures of central tendency are also called as First moment of business decision.

first moment speaks about the center of the data point and indicates where the majority

of data points Lie. A measure of central tendency is a single value that attempts to

describe a set of data by identifying the central position within that set of data. As such,

measures of central tendency are sometimes called measures of central location. They

are also classed as summary statistics.

The mean (often called the average) is most likely the measure of central tendency ,but there are others, such as the median and the mode.

**Mean: **The mean (or average) is the most popular and well known measure of central

tendency. It can be used with both discrete and continuous data, although its use is most

often with continuous data (see our Types of Variable guide for data types). The mean is

equal to the sum of all the values in the data set divided by the number of values in the

data set

**Median: **The median is the middle score for a set of data that has been arranged in order of

magnitude. The median is less affected by outliers and skewed data.

**Mode: **The mode is the most frequent score in our data set. On a histogram it represents the

highest bar in a bar chart or histogram. You can, therefore, sometimes consider the

mode as being the most popular option.

**Q7. What are population and sample in Inferential Statistics, and how are they different?**

**Ans:** Inferential statistics helps a sample of data and make conclusions about its

population. A sample is a smaller data set drawn from a larger data set called the

population. If the sample does not represent the population, one cannot make accurate

estimations related to the latter. The purpose of inferential statistics is to infer the

behavior of a population.

**Q8. What is kurtosis?**

**Ans:** Kurtosis in statistics describes the distribution of the data set it is also called as 3rd

moment business decision. It depicts to what extent the data set points of a particular

distribution differ from the data of a normal distribution. In addition, one may use it to

determine whether a distribution contains extreme values.

There are three types of kurtosis: Mesokurtic, Leptokurtic, and Platykurtic

**Mesokurtic: **Distributions that are moderate in breadth and curves with a medium peaked height. When kurtosis is equal to 3, the distribution is mesokurtic. This means the kurtosis is the same as the normal distribution, it is mesokurtic (medium peak). The kurtosis of a

mesokurtic distribution is neither high nor low, rather it is considered to be a baseline for

the two other classifications.

**Leptokurtic: **More values in the distribution tails and more values close to the mean Positive excess values of kurtosis (>3) indicate that a distribution is peaked and possess thick tails.

Leptokurtic distributions have positive kurtosis values. A leptokurtic distribution has a

higher peak (thin bell) and taller (i.e. fatter and heavy) tails than a normal distribution.

**Platykurtic: **Fewer values in the tails and fewer values close to the mean (i.e. the curve has a flat peak and has more dispersed scores with lighter tails).

When kurtosis is equal to 0, the distribution is platykurtic.

A platykurtic distribution is flatter (less peaked) when compared with the normal

distribution, with fewer values in its shorter (i.e. lighter and thinner) tails. Negative

excess values of kurtosis (<3) indicate that a distribution is flat and has thin tails.

Platykurtic distributions have negative kurtosis values. A platykurtic distribution is flatter

(less peaked) when compared with the normal distribution, with fewer values in its

shorter (i.e. lighter and thinner) tails.