t test
History About Student t Distribution
tstatic:
tstatic was first discovered by Englishman W.S Gosset with his nick name student who published it in 1908 in his research paper entitled " The probable error of the sample mean". Therefore, this tstatistic is called student's tstatics. Later on Prof. R.A Fisher developed and defined the tstatistic in 1926 and called Fisher's tstatistic. (Source: From the Book PROBABILIY and INFERENCE page 273)
Definition of Student's tStatistic
Let a random samples $(x1,x2,...,xn)$ of size $n$ be drawn from normal population with mean $\mu$ and variance $\delta^2$. Then student's tstatistic is defined as,
$latex <t= \frac{\bar(x)\mu}{\frac{s}{\sqrt(n)}}>$
Which follows student's tdistribution with $(n1)$ degree of freedom, where $bar(x)=\frac{ \sum(x_i)}{n}$ is the sample mean and $s^2 = \frac{1}{n1}\sum(x_i\bar(x))^2$ is the sample variance.
When to use t test
The ttest is a parametric test; therefore, it approximates the normal distribution as the sample size increases. When choosing a ttest, two considerations need to be taken into account: whether the groups being compared come from a single population or two different populations, and whether you want to test the difference in a specific direction. From
Types of ttest
 Onesample, twosample, or paired ttest: If the groups come from a single population.
 Twosample ttest: If the groups come from two different populations.
 Multisample ttest: If the groups come from the several populations.
Before Diving into Ttest
We need to know,
 Types of Hypothesis
 pvalue
 Confidence Interval
Hypothesis Testing with ttest
There are two types of hypotheses:

Null Hypothesis ($$H_0$$): The null hypothesis is also defined as the hypothesis of no difference. It should be a simple hypothesis and is tested for possible rejection based on sample observations drawn from the population. It is accepted when the pvalue is greater than $$alpha$$ (usually 0.05).

Alternative Hypothesis ($$H_1$$): Any hypothesis that is mutually exclusive and complementary to the null hypothesis is called the alternative hypothesis. It is accepted when the pvalue is less than $$alpha$$ (usually 0.05).
They are tested like,

$H_0$ : $bar(x)$ = $mu$ (Sample mean and populations are equal or sample is taken from same population.)

$H_1$: $bar(x) \neq \mu$ (Sample mean not equal to population mean or sample is not taken from same population.)
pvalue
In statistics, the pvalue represents the probability of obtaining results at least as extreme as the observed results of a statistical hypothesis test, assuming that the null hypothesis is correct. A smaller pvalue indicates stronger evidence in favor of the alternative hypothesis.
Confidence Interval
A confidence interval represents the range of values around the mean within which a parameter is likely to fall with a specified level of confidence.
T Statistics
$$ t = \frac{\bar{x}  \mu}{\frac{S} {\sqrt{n}}}$$
Where, $bar{x}$ is sample mean, $mu$ is population mean and $S$ is sample standard deviation and $n$ is sample size.
If the test statistics is above the value from the T table, we reject the null hypothesis. Which is opposite to the pvalue.
R code for testing the hypothesis with t test
Here I have used builtin data mtcars
to test hypothesis.
A data frame with 32 observations on 11 (numeric) variables.
[, 1] mpg Miles/(US) gallon
[, 2] cyl Number of cylinders
[, 3] disp Displacement (cu.in.)
[, 4] hp Gross horsepower
[, 5] drat Rear axle ratio
[, 6] wt Weight (1000 lbs)
[, 7] qsec 1/4 mile time
[, 8] vs Engine (0 = Vshaped, 1 = straight)
[, 9] am Transmission (0 = automatic, 1 = manual)
[,10] gear Number of forward gears
[,11] carb Number of carburetors
One Sample T test
we want to hypothesize that the mean 'mpg' is very close to 20. To determine whether I can make this claim or not, I will use a onesample ttest. We will use a significance level of 0.05, which represents the percentage of error that we can tolerate.
#load the data as dataframe.
df < data.frame(mtcars)
t.test(df$mpg,mu = 20)
One Sample ttest
data: df$mpg
t = 0.08506, df = 31, pvalue = 0.9328
alternative hypothesis: true mean is not equal to 20
95 percent confidence interval:
17.91768 22.26357
sample estimates:
mean of x
20.09062
Above code is the example of One Sample ttest
where $mu$ = 20 is our hypothesized mean. Looking over different parameters:
df = 31
: We have degree of freedom 31 (i.e. we have 32 rows and df= n1).t = 0.08506
: Our level of significance is 0.05. Now we go to ttable and find the value associated with it. We look into df as 31, alpha as 0.05 and the value was 2.042 (taken from here). Since our calculated t statistics is smaller than tabulated value, we conclude that there is no evidence of two means being different. Or in other words, we were unable to reject the null hypothesis.pvalue = 0.9328
: We compare this pvalue with our level of significance and this value resembles level of error we could accept. Since this is huge than our level of significance, we conclude that there is not a strong evidence against null hypothesis.
In conclusion, there is not a significant difference between hypothesized mean and data's mean.
Two Sample T test
We use two sample T test to test whether two means come from same population or not.
Group the mtcars
's mpg on the basis of am
column. Our goal is to find whether there is difference in average of mpg in each group of am
. We have 0 as automatic and 1 as manual for am
. See above table for more info.
t.test(mpg~am,var.equal= T, data = mtcars)
Two Sample ttest
data: mpg by am
t = 4.1061, df = 30, pvalue = 0.000285
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
10.84837 3.64151
sample estimates:
mean in group 0 mean in group 1
17.14737 24.39231
Here we saw value for t test is 4.1061, degree of freedom = 30, and p value is 0.000285 from this p value we have strong evidence to reject $H_0$. We accept $H_1$ which means that two samples are not taken from same population.
As the pvalue is smaller than our chosen level of significance (0.05), we have sufficient grounds to reject the null hypothesis. Therefore, we can conclude that there is a statistically significant difference in the average 'mpg' between the two groups.