T-Test for Statistics and Brief Introduction in R

# t- test

## History About Student t- Distribution

t-static:

t-static was first discovered by Englishman W.S Gosset with his nick name student who published it in 1908 in his research paper entitled " The probable error of the sample mean". Therefore, this t-statistic is called student's t-statics. Later on Prof. R.A Fisher developed and defined the t-statistic in 1926 and called Fisher's t-statistic. (Source: From the Book PROBABILIY and INFERENCE page 273)

## Definition of Student's t-Statistic

Let a random samples $(x1,x2,...,xn)$ of size $n$ be drawn from normal population with mean $\mu$ and variance $\delta^2$. Then student's t-statistic is defined as, $t= \frac{\bar(x)-\mu}{\frac{s}{\sqrt(n)}}$

Which follows student's t-distribution with $(n-1)$ degree of freedom, where $bar(x)=\frac{ \sum(x_i)}{n}$ is the sample mean and $s^2 = \frac{1}{n-1}\sum(x_i-\bar(x))^2$ is the sample variance.

## When to use t- test

T-test is parametric test hence, it approximates to the normal distribution as sample size increase. When choosing a t-test, we need to consider two things: whether the groups being compared come from a single population or two different populations, and whether you want to test the difference in a specific direction. From

## Types of t-test

• One-sample, two-sample, or paired t-test: If the groups come from a single population.
• Two-sample t-test: If the groups come from two different populations.
• Multisample t-test: If the groups come from the several populations.

## Before Diving into T-test

We need to know,

• Types of Hypothesis
• p-value
• Confidence Interval

### Hypothesis Testing with t-test

There are two types of hypothesis,

• $H_0$ : Null hypothesis is also defined as hypothesis of no difference. It should be simple hypothesis. It is tested for possible rejection on the basis of sample observations drawn from the population. Which is accepted when p-value is greater than $alpha$ (usually 0.05).
• $H_1$ : Any hypothesis which is mutually exclusive and complementary to the null hypothesis is called alternative hypothesis. Which is accepted when p-value is less than $alpha$ (usually 0.05).

They are tested like,

• $H_0$ : $bar(x)$ = $mu$ (Sample mean and populations are equal or sample is taken from same population.)

• $H_1$: $bar(x) \neq \mu$ (Sample mean not equal to population mean or sample is not taken from same population.)

### p-value

In statistics, the p-value is the probability of obtaining results at least as extreme as the observed results of a statistical hypothesis test, assuming that the null hypothesis is correct. A smaller p-value means that there is stronger evidence in favor of the alternative hypothesis.

### Confidence Interval

A confidence interval displays the probability that a parameter will fall between a pair of values around the mean.

## T Statistics

$t = \frac{\bar{x} - \mu}{\frac{S} {\sqrt{n}}}$

Where, $bar{x}$ is sample mean, $mu$ is population mean and $S$ is sample standard deviation and $n$ is sample size.

If the test statistics is above the value from the T table, we reject the null hypothesis. Which is opposite to the p-value.

# R code for testing the hypothesis with t -test

Here I have used builtin data mtcars to test hypothesis.

A data frame with 32 observations on 11 (numeric) variables.

[, 1]   mpg Miles/(US) gallon
[, 2]   cyl Number of cylinders
[, 3]   disp    Displacement (cu.in.)
[, 4]   hp  Gross horsepower
[, 5]   drat    Rear axle ratio
[, 6]   wt  Weight (1000 lbs)
[, 7]   qsec    1/4 mile time
[, 8]   vs  Engine (0 = V-shaped, 1 = straight)
[, 9]   am  Transmission (0 = automatic, 1 = manual)
[,10]   gear    Number of forward gears
[,11]   carb    Number of carburetors

## One Sample T test

I want to hypothesize that the mean mpg is very near to 20. And to find out whether I can claim it or not, I will be using one sample t-test. We are going to use level of significance 0.05. Which simply is the percentage of error that we can tolerate.

#load the data as dataframe.
df <- data.frame(mtcars)
t.test(df$mpg,mu = 20)  One Sample t-test data: df$mpg
t = 0.08506, df = 31, p-value = 0.9328
alternative hypothesis: true mean is not equal to 20
95 percent confidence interval:
17.91768 22.26357
sample estimates:
mean of x
20.09062 

Above code is the example of One Sample t-test where $mu$ = 20 is our hypothesized mean. Looking over different parameters:

• df = 31: We have degree of freedom 31 (i.e. we have 32 rows and df= n-1).
• t = 0.08506: Our level of significance is 0.05. Now we go to t-table and find the value associated with it. We look into df as 31, alpha as 0.05 and the value was 2.042 (taken from here). Since our calculated t statistics is smaller than tabulated value, we conclude that there is no evidence of two means being different. Or in other words, we were unable to reject the null hypothesis.
• p-value = 0.9328: We compare this p-value with our level of significance and this value resembles level of error we could accept. Since this is huge than our level of significance, we conclude that there is not a strong evidence against null hypothesis.

In conclusion, there is not a significant difference between hypothesized mean and data's mean.

## Two Sample T test

We use two sample T test to test whether two means come from same population or not.

Group the mtcars's mpg on the basis of am column. Our goal is to find whether there is difference in average of mpg in each group of am. We have 0 as automatic and 1 as manual for am. See above table for more info.

t.test(mpg~am,var.equal= T, data = mtcars)
    Two Sample t-test

data:  mpg by am
t = -4.1061, df = 30, p-value = 0.000285
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
-10.84837  -3.64151
sample estimates:
mean in group 0 mean in group 1
17.14737        24.39231 

Here we saw value for t test is -4.1061, degree of freedom = 30, and p value is 0.000285 from this p value we have strong evidence to reject $H_0$. We accept $H_1$ which means that two samples are not taken from same population.

Since, p-value is smaller than our level of significance, we can reject the null hypothesis and conclude that there is difference in average mpg in each group.