Next: Testing Hypotheses Up: Statistical Definitions Previous: Sample Distributions Contents

Subsections

Estimation

Some definitions

A point estimate of some population parameter $\theta$ is a single value $\hat \theta$ of a statistic $\Theta$ .
The statistic $\Theta$ is termed the estimator
An unbiased estimator of the parameter $\theta$ is a statistic $\Theta$ such that

$\displaystyle \mu_{\Theta} = E(\Theta) = \theta$
If we have two estimators $\Theta_{1}$ and $\Theta_{2}$ for a certian population parameter $\theta$ then the more efficient estimator is the one with lower variance

Even efficient estimators will have some error when they estimate the value of a population parameter. In many cases it is better to know an interval within which the population parameter can be found. This type of estimate is termed as an interval estimate, ie we say that the population parameters lies within the interval

$\displaystyle \hat \theta_{L} < \theta < \hat \theta_{U}$

Some features of interval estimates

$\hat \theta_{L}$ and $\hat \theta_{U}$ will depend on the estimator (ie statistic) $\Theta$ and the sampling distribution of $\Theta$
Different sample yeild different values of $\Theta$ and so the upper and lower limits will differ. So we can say that $\hat \theta_{L}$ and $\hat \theta_{U}$ are values of the random variables $\Theta_{L}$ and $\Theta_{U}$
If for a given $\hat \theta_{L}$ and $\hat \theta_{U}$ we see that

$\displaystyle P(\Theta_{L} < \theta < \Theta_{U}) = 1 - \alpha$
for $0 < \alpha < 1$ this implies that the probability of selecting a random sample leading to an interval containing $\theta$ is $1 - \alpha$
The interval $\theta_{L} < \theta < \theta_{U}$ calculated from the sample is called the $(1-\alpha)100\%$ confidence interval. Thus $\alpha = .05$ is a 95% confidence interval and so on.
As an example a 95% confidence interval for a given point estimate of a parameter, $\theta$ indicates the bounds of $\theta$ and also says that 95% of the time the difference between the point estimate and the parameter will be less than some error.
Exactly how we reach the value of the lower and upper bounds and the error depends on the parameter we are estimating and the sample distribution

Estimating the Mean

$\sigma$ known

Since $\overline X$ has a sampling distribution centered at $\mu$ (the population mean) it has a variance smaller than other estimators of $\mu$ . Furthermore since the variance of $\overline X$ is defined as $\sigma^{2}_{\overline X} = \sigma^{2} / n$ , the variance decreases with larger sample sizes. Thus $\overline x$ is the best point estimate of $\mu$

Considering interval estimates we know that the Central Limit Theorem says that the sampling distribution of $\overline X$ is approximately normal with mean $\mu_{\overline X} = \mu$ and standard deviation $\sigma_{\overline X} = \sigma / n$ . Thus if $z_{\alpha/2}$ is the z value above which we get an area of $\alpha / 2$ then we can write

$\displaystyle P( -z_{\alpha/2} < Z < z_{\alpha/2}) = 1 - \alpha$

and since Z is defined as

$\displaystyle Z = \frac{\overline X - \mu}{\sigma / \sqrt{n}}$

we can substitute and manipulate to get

$\displaystyle P( \overline x - z_{\alpha/2} \frac{\sigma}{\sqrt{n}} < \mu < \overline x + z_{\alpha/2} \frac{\sigma}{\sqrt{n}}) = 1 - \alpha$

Essentaially this says that if $\overline x$ is the mean of a random sample of size

taken from a population with mean $\mu$ and standard deviation $\sigma$ then the $(1-\alpha)100\%$ confidence interval for $\mu$ is given by

$\displaystyle \overline x - z_{\alpha/2} \frac{\sigma}{\sqrt{n}} < \mu < \overline x + z_{\alpha/2} \frac{\sigma}{\sqrt{n}}$

Thus the values of $\theta_{L}$ and $\theta_{U}$ are the left and right sides of the inequality.

Thus for the mean we can say that if $\overline x$ is an estimator of $\mu$ then we can be $(1-\alpha)100\%$ sure that the error (ie $\overline x - \mu$ ) will not exceed $z_{\alpha/2} \frac{\sigma}{\sqrt{n}}$

The number of members in a sample required to achieve a $(1-\alpha)100\%$ confidence level for an error (ie $(1-\alpha)100\%$ sure that the error will not exceed ) is given by

$\displaystyle n = \left( \frac{z_{\alpha/2} \sigma}{e} \right)^{2}$

$\sigma$ unknown

When we have a sample from a normal distribution with an unknown standard deviation then the variable defined as

$\displaystyle T = \frac{\overline X - \mu}{S / \sqrt{n}}$

has a t distribution with

degrees of freedom. Proceeding as above we can conclude that if $\overline x$ and

are the sample mean and standard deviation of a sample from a normal population with unknown variance then a $(1-\alpha)100\%$ confidence interval for $\mu$ is given by

$\displaystyle \overline x - t_{\alpha/2} \frac{s}{\sqrt{n}} < \mu < \overline x + t_{\alpha/2} \frac{s}{\sqrt{n}}$

where $t_{\alpha/2}$ is a t value with $\nu = n - 1$ degrees of freedom.

An important feature is when $\sigma$ is known we can use the Central Limit Theorem (ie a normal distribution) and when $\sigma$ is unknown we use the sampling distribution of T (ie a t distribution). In many cases when $\sigma$ is unknown and can be used instead of $\sigma$ to give the interval

$\displaystyle \overline x \pm z_{\alpha/2} \frac{s}{\sqrt{n}}$

This is termed as the large sample confidence interval

Standard Error of a Point Estimate

We know that the variance of the estimator $\overline X$ is

$\displaystyle \sigma_{\overline X}^{2} = \frac{\sigma^{2}}{n}$

The standard deviation of $\overline X$ is also termed as the standard error. Thus confidence intervals can also be written as

$\displaystyle \overline x \pm z_{\alpha/2} \mathrm{s.e.}(\overline x)$

Width of the confidence interval depends on the standard error of the estimate
Alternatively the width of the confidence interval depends on the quality of the estimate

Tolerance Limits

Confidence intervals describe intervals within which we can expect to find population parameter (say $\mu$ ) with a certain % of confidence. Thus if all samples of size are selected from a normally distributed population the 95% of the confidence intervals defined by $\overline x \pm 1.96/\sqrt{n}$ will contain $\mu$
However we also might want to know confidence intervals for specific measurements. In this scenario we establish confidence intervals for a fixed proprtion of the measurements. Such confidence intervals are termed as tolerance limits
For a sample of unknown mean and standard deviation such an interval is defined by

$\displaystyle \overline x \pm ks$
where is determined (from tables) so that we can say with $100(1-\gamma)\%$ confidence that the given limits contain $(1-\alpha)100\%$ of the measurements.

Estimating the Variance

When using $S^{2}$ as an estimator for the population $\sigma^{2}$ we can get an interval estimate of $\sigma^{2}$ by the statistic

$\displaystyle X^{2} = \frac{(n-1)S^{2}}{\sigma^{2}}$

which has a $\chi^{2}$ distribution with

degrees of freedom (when the samples are taken from a normal population). Rearranging and proceeding as before we get

if $s^{2}$ is the variance of a random sample of size from a normal population then the $(1-\alpha)100\%$ confidence interval for $\sigma^{2}$ is given by

$\displaystyle \frac{(n-1)s^{2}}{\chi^{2}_{\alpha/2}} < \sigma^{2} < \frac{(n-1)s^{2}}{\chi^{2}_{1-\alpha/2}}$
where $\chi^{2}_{\alpha/2}$ and $\chi^{2}_{\alpha/2}$ are $\chi^{2}$ values with $\nu = n - 1$ degrees of freedom

Next: Testing Hypotheses Up: Statistical Definitions Previous: Sample Distributions Contents

2003-08-29