T-test when population mean is unknown

T-test when population mean is unknown

In statistics, the t-test is often used in research when the researcher wants to know if there is a significant difference between the mean of sample and the population, or whether there is a significant difference between the means of two different groups. There are two types of t-tests: the one sample t-test and the two samples t-test. As data scientists, it is important for us to understand the concepts of t-test and how to use it in our data analysis. In this blog post, we will focus on the one sample t-test and explain with formula and examples.

  • What is one-sample T-test?
  • One-sample T-test: Example
  • T-score / T-statistics for Estimating Population Mean
  • Summary

What is one-sample T-test?

One-sample T-test is a statistical hypothesis testing technique in which the mean of a sample is tested against a hypothesized value, e.g., a population mean. The t-test is used to determine whether the difference between the sample mean and the hypothesized value, e.g., the population mean is statistically significant or not. T-test is used for hypothesis testing of one-sample mean when the population standard deviation is unknown and the sample size is small. The distribution used is T-distribution with certain degrees of freedom. A sample of size lesser than 30 observations is considered as a small sample.

T = (X̄ – μ) / S/√n

Where, is the sample mean, μ is the hypothesized population mean, S is the standard deviation of the sample and n is the number of sample observations.

When working with T-test, T-distribution is used in place of the normal distribution. The t-distribution is a family of curves that are symmetrical about the mean, and have increasing variability as the degrees of freedom increase. The t-test statistic (T) follows a t-distribution with n – 1 degrees of freedom, where n is the number of observations in the sample.

One-sample T-test: Example

Suppose a claim is made that the average number of days a person spends on vacation is more than or equal to 5 days (hypothesized population mean) based on a sample of 16 people whose mean came out to be 9 days. As a first step, we will formulate the null and alternate hypothesis.

Null hypothesis, H0: There is no difference between the sample mean and the population mean; What has occured with a sample is just an instance of chance occurrence.

Alternate hypothesis, Ha: There is a significant difference between the sample mean and the population mean.

We will use one-sample t-test to test this hypothesis. A right-tailed test will be performed.

T = (X̄ – μ) / S/√n

Where, X̄ is the sample mean, μ is the hypothesized population mean, S is the standard deviation of the sample and n is the number of observations in the sample.

A sample size of 16 persons is taken. The mean number of days spent on vacation by the persons in sample is found to be 9 days with a sample standard deviation is found to be 3 days.

T = (X̄ – μ) / S/√n

= (9 – 5)/(3/ √16)

= 5.33

At a level of significance of 0.05, the T-value for a right-tailed test comes out to be 1.75305. Since the calculated T-value of 5.33 is much larger than the critical value of 1.75305, the null hypothesis can be rejected. Thus, there is a statistically significant difference between sample mean and the population mean. You can use this T-value calculator to calculate the critical value of T for a given level of significance and degrees of freedom.

Another way to test is to calculate the p-value for getting the T-statistics of 5.33. You can use this P-value calculator to calculate p-value for a given T-value, degrees of freedom and the types of tail-test (one-tailed or two-tailed test). For a T-statistics of 5.33, the p-value came out to be 0.000042. This means that there is a probability of only 0.000042 to get this kind of sample given the null hypothesis holds good. As this value is less than 0.05, one can reject the null hypothesis given the evidence of current sample.

T-score / T-statistics for Estimating Population Mean

The population mean can be estimated as a function of the t-score using the following equation:

Population mean = Sample mean + T*(Standard error of the mean)

Where T is a statistic that has a T-distribution with known properties. The standard error of the mean (SE) is an estimate of the standard deviation of the sampling distribution of the t-statistic. The T-statistic can be used to calculate confidence intervals for population means given the sample size is small and the population standard deviation is unknown. When the population standard deviation is know, we use Z-statistics and Z-distribution instead of T-statistics.

The value of standard error of the mean can be calculated as :

SE of the mean = S/√n

Where, S is the standard deviation of the sample and n is the number of observations in the sample.

Summary

The one-sample t-test is a statistical test that can be used to determine whether there is a significant difference between the sample mean and the population mean. The t-test statistic (T) follows a t-distribution with n – 1 degrees of freedom, where n is the number of observations in the sample. T-statistics can be used to estimate the population mean when the population standard deviation is unknown. The t-test can be used to calculate confidence intervals for population means when the sample size is small and the population standard deviation is unknown.

  • Author
  • Recent Posts

T-test when population mean is unknown

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. For latest updates and blogs, follow us on Twitter. I would love to connect with you on Linkedin.

Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking

T-test when population mean is unknown

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. For latest updates and blogs, follow us on Twitter. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking

Do you need to know population mean for t

The one sample t test compares the mean of your sample data to a known value. For example, you might want to know how your sample mean compares to the population mean. You should run a one sample t test when you don't know the population standard deviation or you have a small sample size.

What do you do if the population standard deviation is unknown?

Population Standard Deviation Unknown If the population standard deviation, sigma is unknown, then the mean has a student's t (t) distribution and the sample standard deviation is used instead of the population standard deviation. . The t here is the t-score obtained from the Student's t table.

Why t

In the situation where you have a sample and would like to know if the population represented by the sample has a mean different than some specification, then this is the test for you.

When should you use the t

When to use a t-test. A t-test can only be used when comparing the means of two groups (a.k.a. pairwise comparison). If you want to compare more than two groups, or if you want to do multiple pairwise comparisons, use an ANOVA test or a post-hoc test.