What is Hypothesis Testing
It is a type of inferential statistics that involves extrapolating results from a sample (random) to the entire population. It is used to make decisions based on statistical tests and models that use the pvalue, also known as the Type I error or alpha error.
Type I Error : When we reject true null hypothesis then it is called
type I error
Type II Error : When we do not reject false null hypothesis then it is called type II error.
It can be done using parametric or nonparametric methods/models.
Parametric : They have certain assumptions about the data (model) and/or errors that must be validated before the results can be accepted.
Nonparametric : They are nonparametric because they make no assumptions about the data distribution (model) or mistakes.
Why to use parametric test?
Because they are based on the mean, standard deviation, and normal distribution, parametric tests are regarded “more powerful” than non parametric tests/models. Nonparametric tests are based on median, IQR, and nonnormal distributions, nonparametric tests are deemed “less powerful” than parametric tests/models.
Two Statistical Hypothesis
Null Hypothesis
: It is also known as hypothesis of no difference
Alternative Hypothesis
: It is complementary to the null hypothesis
also known as research hypotheis.
When to accept Null or Alternative Hypothesis
Accept (fail to reject) null hypothesis from parametric or nonparametric tests requires a Pvalue > 0.05. (Goodnessoffit tests)
To accept it from parametric or nonparametric testing (Research hypothesis tests! ), the Pvalue must be less than 0.05.
Some Commonly Used Parametric Test Using R
One Sample Z Test On Mtcars Data
In this blog I am only going to explain how to test one sample z test using R without explain what is ztest, how it work because I already explained it in my past blog.
# we need to define parameter
muO < 20
sigma < 6
xbar < mean(mtcars$mpg)
n < length(mtcars$mpg)
z <sqrt(n)*(xbarmuO)/sigma
p_value<2*pnorm(abs(z))
Let’s check z value and p value,
z
## [1] 0.08544207
Hence, we found value of z is 0.08544207
p_value
## [1] 0.9319099
We found pvale 0.9319099 which is > 0.05 hence we accpet null hypothesis, i.e means of sample and population are equal.
Why there is no one sample ztest in base R package?
Because the tdistribution behaves like the zdistribution for n>=30, the Ttest can be employed for both small and big samples. Thus, we don’t need onesample ztest in R!
One Sample ttest: We can work for small sample as well as for large sample
t.test(mtcars$mpg, mu =20)
##
## One Sample ttest
##
## data: mtcars$mpg
## t = 0.08506, df = 31, pvalue = 0.9328
## alternative hypothesis: true mean is not equal to 20
## 95 percent confidence interval:
## 17.91768 22.26357
## sample estimates:
## mean of x
## 20.09062
Hence we obtained pvalued 0.9328 it means we do not reject null hypothesis.
Two Sample Ttest
It is used to compare the means of a dependent variable with two categories of grouped independent variables. For instance, we can compare exam score (dependent variable) between male and female groups of students!
Assumptions

For each category, the dependent variable must follow the normal distribution (Test of normalityGOF)

The variance is homogeneous (i.e. equal) across independent variable categories (Test of equal varianceGOF)
What to do if variance across independent variable categories not equal
In this case we used Welch test.
Assumption

For each category, the dependent variable must follow the normal distribution (Test of normalityGOF)

Variance across independent variable categories are not homogenous i.e; not equal.
Let’s do narmality test on mtcars data
with(mtcars, shapiro.test(mpg[am == 0]))
##
## ShapiroWilk normality test
##
## data: mpg[am == 0]
## W = 0.97677, pvalue = 0.8987
Here, pvalue is 0.8987. Hence, we do not reject null hypothesis that means it follows normal distribution.
with(mtcars, shapiro.test(mpg[am == 1]))
##
## ShapiroWilk normality test
##
## data: mpg[am == 1]
## W = 0.9458, pvalue = 0.5363
It also follows normal distribution. Hence first condition is satisfied i.e; dependent variable mpg follows normal distribution.
Variance Check
var.test(mpg ~ am, data = mtcars)
##
## F test to compare two variances
##
## data: mpg by am
## F = 0.38656, num df = 18, denom df = 12, pvalue = 0.06691
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.1243721 1.0703429
## sample estimates:
## ratio of variances
## 0.3865615
We can see pvalue is 0.06691 which is grater than 0.05. Hence we can say variance across independent variable categories are same. Now we can use two sample student t test.
t.test(mpg ~ am, var.equal= T, data = mtcars)
##
## Two Sample ttest
##
## data: mpg by am
## t = 4.1061, df = 30, pvalue = 0.000285
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## 10.84837 3.64151
## sample estimates:
## mean in group 0 mean in group 1
## 17.14737 24.39231
Here we saw pvalue 0.000285 which is less then 0.05. Hence we reject ho that means milage (mpg) is statistically different among cars with automatic and manual transmission system.
Let’s check two sample student ttest result with simple linear regression model
summary(lm(mpg ~ am, data = mtcars))
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## 9.3923 3.0923 0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>t)
## (Intercept) 17.147 1.125 15.247 1.13e15 ***
## am 7.245 1.764 4.106 0.000285 ***
## 
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple Rsquared: 0.3598, Adjusted Rsquared: 0.3385
## Fstatistic: 16.86 on 1 and 30 DF, pvalue: 0.000285
This difference is statistically significant and the pvalue is same as given by the twosamples ttest.
What test should we used if we have to compare mean of more than two samples
If we need to compare mean of more than two samples we used 1way ANOVA test.
Assumption

Dependent variable must be “normally distributed”

Variance across categories must be same
1way ANOVA assumptions checks
Normality by categories
with(mtcars, shapiro.test(mpg[gear == 3]))
##
## ShapiroWilk normality test
##
## data: mpg[gear == 3]
## W = 0.95833, pvalue = 0.6634
Category 3 follows normal distribution.
with(mtcars, shapiro.test(mpg[gear == 4]))
##
## ShapiroWilk normality test
##
## data: mpg[gear == 4]
## W = 0.90908, pvalue = 0.2076
Category 4 also follows normal distribution.
with(mtcars, shapiro.test(mpg[gear == 5]))
##
## ShapiroWilk normality test
##
## data: mpg[gear == 5]
## W = 0.90897, pvalue = 0.4614
So, dependent variable follows normal distribution.
Let’s do variance test
In case of more than two samples case we do not use
var.test()
. For this we usedleveneTest()
avilable in car packages.
Let’s check. Before doing this we need to change our independent
variable into factor.
library(car)
## Loading required package: carData
leveneTest(mpg ~ as.factor(gear), data=mtcars)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 1.4886 0.2424
## 29
Here, we find pvalue grater than 0.2424. Hence variance across categories is same. So, we can now used one way calssical ANOVA test.
1Way Classical ANOVA test
summary(aov(mpg ~ gear, data = mtcars))
## Df Sum Sq Mean Sq F value Pr(>F)
## gear 1 259.7 259.75 8.995 0.0054 **
## Residuals 30 866.3 28.88
## 
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We find pvalue less than 0.05. Hence we reject null hypothesis that means sample means are not equal. This means, posthoc test or pairwise comparison is required. If alternative hypothesis is accepted we need to do posthoc test. For classical 1way ANOVA TukeyHSD posthoc test is best. Let’s used it.
TukeyHSD(aov(mpg ~ as.factor(gear), data = mtcars))
## Tukey multiple comparisons of means
## 95% familywise confidence level
##
## Fit: aov(formula = mpg ~ as.factor(gear), data = mtcars)
##
## $as.factor(gear)
## diff lwr upr p adj
## 43 8.426667 3.9234704 12.929863 0.0002088
## 53 5.273333 0.7309284 11.277595 0.0937176
## 54 3.153333 9.3423846 3.035718 0.4295874
Let’s check this result with simple linear model
summary(lm(mpg ~ gear, data = mtcars))
##
## Call:
## lm(formula = mpg ~ gear, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## 10.240 2.793 0.205 2.126 12.583
##
## Coefficients:
## Estimate Std. Error t value Pr(>t)
## (Intercept) 5.623 4.916 1.144 0.2618
## gear 3.923 1.308 2.999 0.0054 **
## 
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.374 on 30 degrees of freedom
## Multiple Rsquared: 0.2307, Adjusted Rsquared: 0.205
## Fstatistic: 8.995 on 1 and 30 DF, pvalue: 0.005401
pairwise.t.test(mtcars$mpg, mtcars$gear, p.adj= "none")
##
## Pairwise comparisons using t tests with pooled SD
##
## data: mtcars$mpg and mtcars$gear
##
## 3 4
## 4 7.3e05 
## 5 0.038 0.218
##
## P value adjustment method: none
gear = 3 category is omitted from the result because R automatically creates 3 dummy variables for 3 categories of gear variable i.e. 3, 4 and 5 and uses only last two of them in the model and takes the first one as reference.