next up previous contents
Next: Estimation Up: Statistical Definitions Previous: Principal Components Analysis   Contents

Subsections


Sample Distributions

The probability distribution of a statistic is called the sampling distribution. Thus if $\overline X$ is the sample mean then the the probability distribution of $\overline X$ is the sampling distribution of the mean


Sampling Distributions of the Mean

For n observations taken from a normal distribution with mean $\mu$ and variance $\sigma^{2}$, each observation $X_{i}$ of the radnom sample will have the same normal distribution of the population sampled. So

$\displaystyle \overline X = \frac{X_{1} + X_{2} + \cdots + X_{n}}{n}
$

will have a normal distribution with mean

$\displaystyle \mu_{\overline X} = \frac{\mu + \mu + \cdots + \mu}{n} = \mu
$

and variance

$\displaystyle \sigma_{\overline X}^{2} = \frac{\sigma^{2} + \sigma^{2} + \cdots +
\sigma^{2}}{n^{2}} = \frac{\sigma^{2}}{n}
$


Central Limit Theorem

If $\overline X$ is the mean of a random sample of size n taken from a population with mean $\mu$ and variance $\sigma^{2}$ then the limiting form of the distribution of

$\displaystyle Z = \frac{\overline X - \mu}{\sigma / \sqrt{n}}
$

as $n \rightarrow \infty$ is the standard normal distribution


Sampling Distribution of $(n-1)S^{2}/\sigma ^{2}$

$s^{2}$ is the sample variance for a sample of size $n$ from a normal population with mean $\mu$ and varaince $\sigma^{2}$ and is the value of the statistic $S^{2}$. Then

$\displaystyle X^{2} = \frac{(n-1)S^{2}}{\sigma^{2}}
$

is a statistic that has a chi squared distribution with $\nu = n - 1$ degrees of freedom.


t Distribution

We know that if a random sample is taken from the normal distribution (see pages 219 and page 192 of Walpole & Myers) then the random variable defined as

$\displaystyle \sum_{i = 1}^{n} \frac{\left( X_{i} - \mu \right)^{2}}{\sigma^{2}}
$

has a $ \chi^{2}$ distribution with n degrees of freedom. However if we do not know the variance $\sigma^{2}$ of the population we could replace it with $S^{2}$ in the statistic

$\displaystyle \frac{\overline X - \mu}{\sigma / \sqrt{n}}
$

(where $\sigma / \sqrt{n}$ is the sample variance) to get

$\displaystyle \frac{\overline X - \mu}{S / \sqrt{n}}
$

We thus now deal with the statistic defined as

$\displaystyle T = \frac{\overline X - \mu}{S / \sqrt{n}}
$

which can be written as
$\displaystyle T$ $\displaystyle =$ $\displaystyle \frac{(\overline X - \mu) / (\sigma / \sqrt{n})} {\sqrt{S^{2}/\sigma^{2}}}$  
  $\displaystyle =$ $\displaystyle \frac{Z}{\sqrt{V / (n - 1)}}$  

where $Z$ has the standard normal distribution and

$\displaystyle V = \frac{(n - 1)S^{2}}{\sigma^{2}}
$

has a $ \chi^{2}$ distributionn with $ n - 1$ degrees of freedom. The distribution of the T statistic is termed the t distribution

Features of the t distribution include


F distribution

The F statistic is defined as

$\displaystyle F = \frac{ U / \nu_{1}}{V / \nu_{2}}
$

where $U$ & $V$ are independant random variables with $ \chi^{2}$ distributions and degrees of freedom $\nu_{1}$ and $\nu_{2}$ respectively. Finally we have the theorem that if $S_{1}^{2}$ and $S_{2}^{2}$ are variances of independant random samples of size $n_{1}$ and $n_{2}$ from normal populations (note - we consider two different populations) with variances $\sigma_{1}^{2}$ and $\sigma_{2}^{2}$ the we have

$\displaystyle F = \frac{ S_{1}^{2} / \sigma_{1}^{2} }{S_{2}^{2} /
\sigma_{2}^{2}}
$

Empirical Distribution

A cummulative distribution function (CDF) give the probability than a random variable $X$ is less that a given value $ x$. ( $ F(x) = Pr{X \leq
x}$). An empirical distribution function is quite similar, the only difference being that we work from data rather than theorectical functions. To build an empirical distribution function:
  1. Collect n (say 50) observations from the (say, service) process you want to observe.
  2. Enter your observations in a single column in a spread sheet.
  3. Sort the observations in increasing order.
  4. In the next column enter 1/n in line 1, 2/n in line 2 and so forth. (This is the probability that the next observation is less than or equal to the corresponding value.)
  5. If you want to compare your empirical data to a theorecticl distribution enter the corresponding theoretical probabilities in column 3

next up previous contents
Next: Estimation Up: Statistical Definitions Previous: Principal Components Analysis   Contents
2003-08-29