Next: Estimation Up: Statistical Definitions Previous: Principal Components Analysis Contents

Subsections

Sample Distributions

The probability distribution of a statistic is called the sampling distribution. Thus if $\overline X$ is the sample mean then the the probability distribution of $\overline X$ is the sampling distribution of the mean

Sampling Distributions of the Mean

For n observations taken from a normal distribution with mean $\mu$ and variance $\sigma^{2}$ , each observation $X_{i}$ of the radnom sample will have the same normal distribution of the population sampled. So

$\displaystyle \overline X = \frac{X_{1} + X_{2} + \cdots + X_{n}}{n}$

will have a normal distribution with mean

$\displaystyle \mu_{\overline X} = \frac{\mu + \mu + \cdots + \mu}{n} = \mu$

and variance

$\displaystyle \sigma_{\overline X}^{2} = \frac{\sigma^{2} + \sigma^{2} + \cdots + \sigma^{2}}{n^{2}} = \frac{\sigma^{2}}{n}$

Central Limit Theorem

If $\overline X$ is the mean of a random sample of size n taken from a population with mean $\mu$ and variance $\sigma^{2}$ then the limiting form of the distribution of

$\displaystyle Z = \frac{\overline X - \mu}{\sigma / \sqrt{n}}$

as $n \rightarrow \infty$ is the standard normal distribution

In general when using tables to look up areas under the normal curve we should convert our $\overline X$ values to values (ie values for the standard normal distribution)
The expression $P(\overline X < 775)$ means the area of the curve to the left of $\overline X = 775$

Sampling Distribution of $(n-1)S^{2}/\sigma ^{2}$

$s^{2}$ is the sample variance for a sample of size

from a normal population with mean $\mu$ and varaince $\sigma^{2}$ and is the value of the statistic $S^{2}$ . Then

$\displaystyle X^{2} = \frac{(n-1)S^{2}}{\sigma^{2}}$

is a statistic that has a chi squared distribution with $\nu = n - 1$ degrees of freedom.

The values of the random variable $X^{2}$ can be calculated from each sample by

$\displaystyle X^{2} = \frac{(n-1)s^{2}}{\sigma^{2}}$
The probability that a sample produces a $\chi^{2}$ value that is greater than a specified value is the area under the curve to the right of the specified value. We denote $\chi^{2}_{\alpha}$ to be the $\chi^{2}$ value to the right of which we find the area $\alpha$
The $\chi^{2}$ distribution is not symmetric and hence

$\displaystyle \chi^{2}_{\alpha} \not= 1 - \chi^{2}_{\alpha}$
Exactly 95% of a $\chi^{2}$ distribution lies between $\chi^{2}_{0.975}$ and $\chi^{2}_{0.025}$
A $\chi^{2}$ value to the right of $\chi^{2}_{0.025}$ is not likely unles our assumed $\sigma^{2}$ is too small and a $\chi^{2}$ value to the left of $\chi^{2}_{0.975}$ is not likely unless our assumed $\sigma^{2}$ is too large

t Distribution

We know that if a random sample is taken from the normal distribution (see pages 219 and page 192 of Walpole & Myers) then the random variable defined as

$\displaystyle \sum_{i = 1}^{n} \frac{\left( X_{i} - \mu \right)^{2}}{\sigma^{2}}$

has a $\chi^{2}$ distribution with n degrees of freedom. However if we do not know the variance $\sigma^{2}$ of the population we could replace it with $S^{2}$ in the statistic

$\displaystyle \frac{\overline X - \mu}{\sigma / \sqrt{n}}$

(where $\sigma / \sqrt{n}$ is the sample variance) to get

$\displaystyle \frac{\overline X - \mu}{S / \sqrt{n}}$

We thus now deal with the statistic defined as

$\displaystyle T = \frac{\overline X - \mu}{S / \sqrt{n}}$

which can be written as

$\displaystyle T$	$\displaystyle =$	$\displaystyle \frac{(\overline X - \mu) / (\sigma / \sqrt{n})} {\sqrt{S^{2}/\sigma^{2}}}$
	$\displaystyle =$	$\displaystyle \frac{Z}{\sqrt{V / (n - 1)}}$

where

has the standard normal distribution and

$\displaystyle V = \frac{(n - 1)S^{2}}{\sigma^{2}}$

has a $\chi^{2}$ distributionn with

degrees of freedom. The distribution of the T statistic is termed the t distribution

Features of the t distribution include

Similar to the normal distribution (bell shaped) but has more variance since it depends on the fluctuation of two quantities, $\overline X$ and $S^{2}$ .
variance of T depends on n and is always more than 1
As $n \rightarrow \infty$ the t distribution becomes identical to the normal distribution
the probability that a random sample produces a t value equal to

$\displaystyle \frac{\overline x - \mu}{s/\sqrt{n}}$
falling between two specified values if the area under the curve between the twordinates (x values) corresponding to those specified values
We denote $t_{\alpha}$ to be the t value above which we find the area $\alpha$
$t_{1- \alpha} = - t_{\alpha}$
Exactly 95% of a t distribution with $\nu = n - 1$ degrees of freedom lies between $t_{-0.025}$ and $t_{0.025}$ . t values below $t_{-0.025}$ or above $t_{0.025}$ is possible but is more indicative of an error in $\mu$

F distribution

The F statistic is defined as

$\displaystyle F = \frac{ U / \nu_{1}}{V / \nu_{2}}$

where

are independant random variables with $\chi^{2}$ distributions and degrees of freedom $\nu_{1}$ and $\nu_{2}$ respectively.

Note that when referring to the distribution function or look up tables the degrees of freedom of the numerator come first and then that of the denominator
$f_{\alpha}$ is the f value above which we find an area equal to $\alpha$
$\displaystyle f_{1-\alpha}(\nu_{1}, \nu_{2}) = \frac{1}{f_{\alpha}(\nu_{2}, \nu_{1})}$

Finally we have the theorem that if $S_{1}^{2}$ and $S_{2}^{2}$ are variances of independant random samples of size $n_{1}$ and $n_{2}$ from normal populations (note - we consider two different populations) with variances $\sigma_{1}^{2}$ and $\sigma_{2}^{2}$ the we have

$\displaystyle F = \frac{ S_{1}^{2} / \sigma_{1}^{2} }{S_{2}^{2} / \sigma_{2}^{2}}$

Empirical Distribution

A cummulative distribution function (CDF) give the probability than a random variable

is less that a given value

. ( $F(x) = Pr{X \leq x}$ ). An empirical distribution function is quite similar, the only difference being that we work from data rather than theorectical functions. To build an empirical distribution function:

Collect n (say 50) observations from the (say, service) process you want to observe.
Enter your observations in a single column in a spread sheet.
Sort the observations in increasing order.
In the next column enter 1/n in line 1, 2/n in line 2 and so forth. (This is the probability that the next observation is less than or equal to the corresponding value.)
If you want to compare your empirical data to a theorecticl distribution enter the corresponding theoretical probabilities in column 3

Next: Estimation Up: Statistical Definitions Previous: Principal Components Analysis Contents

2003-08-29