t test
History About Student t Distribution
tstatic:
tstatic was first discovered by Englishman W.S Gosset with his nick name student who published it in 1908 in his research paper entitled " The probable error of the sample mean". Therefore, this tstatistic is called student's tstatics. Later on Prof. R.A Fisher developed and defined the tstatistic in 1926 and called Fisher's tstatistic. (Source: From the Book PROBABILIY and INFERENCE page 273)
Definition of Student's tStatistic
Let a random samples $(x1,x2,...,xn)$ of size $n$ be drawn from normal population with mean $\mu$ and variance $\delta^2$. Then student's tstatistic is defined as, $t= \frac{\bar(x)\mu}{\frac{s}{\sqrt(n)}}$
Which follows student's tdistribution with $(n1)$ degree of freedom, where $bar(x)=\frac{ \sum(x_i)}{n}$ is the sample mean and $s^2 = \frac{1}{n1}\sum(x_i\bar(x))^2$ is the sample variance.
When to use t test
Ttest is parametric test hence, it approximates to the normal distribution as sample size increase. When choosing a ttest, we need to consider two things: whether the groups being compared come from a single population or two different populations, and whether you want to test the difference in a specific direction. From
Types of ttest
 Onesample, twosample, or paired ttest: If the groups come from a single population.
 Twosample ttest: If the groups come from two different populations.
 Multisample ttest: If the groups come from the several populations.
Before Diving into Ttest
We need to know,
 Types of Hypothesis
 pvalue
 Confidence Interval
Hypothesis Testing with ttest
There are two types of hypothesis,
 $H_0$ : Null hypothesis is also defined as hypothesis of no difference. It should be simple hypothesis. It is tested for possible rejection on the basis of sample observations drawn from the population. Which is accepted when pvalue is greater than $alpha$ (usually 0.05).
 $H_1$ : Any hypothesis which is mutually exclusive and complementary to the null hypothesis is called alternative hypothesis. Which is accepted when pvalue is less than $alpha$ (usually 0.05).
They are tested like,

$H_0$ : $bar(x)$ = $mu$ (Sample mean and populations are equal or sample is taken from same population.)

$H_1$: $bar(x) \neq \mu$ (Sample mean not equal to population mean or sample is not taken from same population.)
pvalue
In statistics, the pvalue is the probability of obtaining results at least as extreme as the observed results of a statistical hypothesis test, assuming that the null hypothesis is correct. A smaller pvalue means that there is stronger evidence in favor of the alternative hypothesis.
Confidence Interval
A confidence interval displays the probability that a parameter will fall between a pair of values around the mean.
T Statistics
$ t = \frac{\bar{x}  \mu}{\frac{S} {\sqrt{n}}} $
Where, $bar{x}$ is sample mean, $mu$ is population mean and $S$ is sample standard deviation and $n$ is sample size.
If the test statistics is above the value from the T table, we reject the null hypothesis. Which is opposite to the pvalue.
R code for testing the hypothesis with t test
Here I have used builtin data mtcars
to test hypothesis.
A data frame with 32 observations on 11 (numeric) variables.
[, 1] mpg Miles/(US) gallon
[, 2] cyl Number of cylinders
[, 3] disp Displacement (cu.in.)
[, 4] hp Gross horsepower
[, 5] drat Rear axle ratio
[, 6] wt Weight (1000 lbs)
[, 7] qsec 1/4 mile time
[, 8] vs Engine (0 = Vshaped, 1 = straight)
[, 9] am Transmission (0 = automatic, 1 = manual)
[,10] gear Number of forward gears
[,11] carb Number of carburetors
One Sample T test
I want to hypothesize that the mean mpg
is very near to 20. And to find out whether I can claim it or not, I will be using one sample ttest. We are going to use level of significance 0.05. Which simply is the percentage of error that we can tolerate.
#load the data as dataframe.
df < data.frame(mtcars)
t.test(df$mpg,mu = 20)
One Sample ttest
data: df$mpg
t = 0.08506, df = 31, pvalue = 0.9328
alternative hypothesis: true mean is not equal to 20
95 percent confidence interval:
17.91768 22.26357
sample estimates:
mean of x
20.09062
Above code is the example of One Sample ttest
where $mu$ = 20 is our hypothesized mean. Looking over different parameters:
df = 31
: We have degree of freedom 31 (i.e. we have 32 rows and df= n1).t = 0.08506
: Our level of significance is 0.05. Now we go to ttable and find the value associated with it. We look into df as 31, alpha as 0.05 and the value was 2.042 (taken from here). Since our calculated t statistics is smaller than tabulated value, we conclude that there is no evidence of two means being different. Or in other words, we were unable to reject the null hypothesis.pvalue = 0.9328
: We compare this pvalue with our level of significance and this value resembles level of error we could accept. Since this is huge than our level of significance, we conclude that there is not a strong evidence against null hypothesis.
In conclusion, there is not a significant difference between hypothesized mean and data's mean.
Two Sample T test
We use two sample T test to test whether two means come from same population or not.
Group the mtcars
's mpg on the basis of am
column. Our goal is to find whether there is difference in average of mpg in each group of am
. We have 0 as automatic and 1 as manual for am
. See above table for more info.
t.test(mpg~am,var.equal= T, data = mtcars)
Two Sample ttest
data: mpg by am
t = 4.1061, df = 30, pvalue = 0.000285
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
10.84837 3.64151
sample estimates:
mean in group 0 mean in group 1
17.14737 24.39231
Here we saw value for t test is 4.1061, degree of freedom = 30, and p value is 0.000285 from this p value we have strong evidence to reject $H_0$. We accept $H_1$ which means that two samples are not taken from same population.
Since, pvalue is smaller than our level of significance, we can reject the null hypothesis and conclude that there is difference in average
mpg
in each group.