How to compute a confidence interval for a population mean
Description
If we have a set of data that seems normally distributed, how can we compute a confidence interval for the mean? Assume we have some confidence level already chosen, such as $\alpha=0.05$.
We will use the $t$-distribution because we have not assumed that we know the population standard deviation, and we have not assumed anything about our sample size. If you know the population standard deviation or have a large sample size (typically at least 30), then you can use $z$-scores instead; see how to compute a confidence interval for a population mean using z-scores.
Related tasks:
- How to compute a confidence interval for a population mean using z-scores
- How to do a two-sided hypothesis test for a sample mean
- How to do a two-sided hypothesis test for two sample means
- How to compute a confidence interval for a mean difference (matched pairs)
- How to compute a confidence interval for a regression coefficient
- How to compute a confidence interval for a single population variance
- How to compute a confidence interval for the difference between two means when both population variances are known
- How to compute a confidence interval for the difference between two means when population variances are unknown
- How to compute a confidence interval for the difference between two proportions
- How to compute a confidence interval for the expected value of a response variable
- How to compute a confidence interval for the population proportion
- How to compute a confidence interval for the ratio of two population variances
Solution, in Julia
When applying this technique, you would have a series of data values for which you needed to compute a confidence interval for the mean. But in order to provide code that runs independently, we create some fake data below. When using this code, replace our fake data with your real data.
1
2
3
4
5
6
alpha = 0.05 # replace with your chosen alpha (here, a 95% confidence level)
data = [ 435,542,435,4,54,43,5,43,543,5,432,43,36,7,876,65,5 ] # fake
# Compute the confidence interval:
using HypothesisTests
confint( OneSampleTTest( data ), level=1-alpha, tail=:both )
1
(70.2984781107082, 350.05446306576243)
Note: The solution above assumes that the population is normally distributed, which is a common assumption in introductory statistics courses, but we have not verified that assumption here.
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Using SciPy, in Python
This solution uses a 95% confidence level, but you can change that in the
first line of code, by specifing a different alpha
.
When applying this technique, you would have a series of data values for which you needed to compute a confidence interval for the mean. But in order to provide code that runs independently, we create some fake data below. When using this code, replace our fake data with your real data.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
alpha = 0.05 # replace with your chosen alpha (here, a 95% confidence level)
data = [ 435,542,435,4,54,43,5,43,543,5,432,43,36,7,876,65,5 ] # fake
# We will use NumPy and SciPy to compute some of the statistics below.
import numpy as np
import scipy.stats as stats
# Compute the sample mean, as an estimate for the population mean.
sample_mean = np.mean( data )
# Compute the Standard Error for the sample Mean (SEM).
sem = stats.sem( data )
# The margin of error then has the following formula.
moe = sem * stats.t.ppf( 1 - alpha / 2, len( data ) - 1 )
# The confidence interval is centered on the mean with moe as its radius:
( sample_mean - moe, sample_mean + moe )
1
(70.29847811072423, 350.0544630657464)
Note: The solution above assumes that the population is normally distributed, which is a common assumption in introductory statistics courses, but we have not verified that assumption here.
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Solution, in R
When applying this technique, you would have a series of data values for which you needed to compute a confidence interval for the mean. But in order to provide code that runs independently, we create some fake data below. When using this code, replace our fake data with your real data.
1
2
3
4
5
6
7
8
9
10
alpha <- 0.05 # replace with your chosen alpha (here, a 95% confidence level)
data <- c( 435,542,435,4,54,43,5,43,543,5,432,43,36,7,876,65,5 ) # fake
# If you need the two values stored in variables for later use, do:
answer <- t.test( data, conf.level=1-alpha )
lower_bound <- answer$conf.int[1]
upper_bound <- answer$conf.int[2]
# If you just need to see the results in a report, do this alone:
t.test( data, conf.level=1-alpha )
1
2
3
4
5
6
7
8
9
10
One Sample t-test
data: data
t = 3.1853, df = 16, p-value = 0.005753
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
70.29848 350.05446
sample estimates:
mean of x
210.1765
Note: The solution above assumes that the population is normally distributed, which is a common assumption in introductory statistics courses, but we have not verified that assumption here.
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Topics that include this task
Opportunities
This website does not yet contain a solution for this task in any of the following software packages.
- Excel
If you can contribute a solution using any of these pieces of software, see our Contributing page for how to help extend this website.