next up previous contents
Next: Regression Diagnostics Up: Statistical Definitions Previous: Correlation Coefficient   Contents

Subsections

Regression Statistics


B & Beta Coefficients

In regression, B coefficients are the raw regression coefficients. They represent the independent contributions of each independent variable to the prediction of the dependent variable. However, their values may not be comparable between variables because they depend on the units of measurement or ranges of the respective variables. Beta coefficients arcients are the regression coefficients you would have obtained had you first standardized all of your variables to a mean of 0 and a standard deviation of 1. Thus, the advantage of Beta coefficients (as compared to B coefficients which are not standardized) is that the magnitude of these Beta coefficients allow you to compare the relative contribution of each independent variable in the prediction of the dependent variable.

Residual Sum of Squares

This is also denoted by $ \chi^{2}$ - and in general is termed as the weighted sums of squared residuals. In the case of ADAPT the fitting is non weighted and so it condenses to sum of squared residuals. Defined as:

$\displaystyle \chi^{2} = \left( \mathbf{y - X c} \right)^{T} \mathbf{W} \left( \mathbf{ y - X c} \right)
$

where $\mathbf{y}$ is the observations vector (ie dependant variable values), $\mathbf{X}$ is the $N$ by $P$ matrix of predictor variables (ie values for P descriptors for N molecules), $\mathbf{W}$ is the weight matrix, which for ADAPT is the identity matrix and $\mathbf{c}$ is the vector of $P$ unknown best fit parameters (ie the regression coefficients which we are trying to find). Basically the best fit is found by minimizing the value of $ \chi^{2}$.

Overall F Statistic

The overall F statistic tests the null hypothesis

$\displaystyle \beta_{1} = \beta_{2} = \cdots = \beta_{p-1} = 0
$

where p is the number of parameters. Note that that above equation does not include the intercept coefficient ($\beta_{0}$). It is defined by the equation

$\displaystyle F = \frac{\Sigma (\hat Y_{i} - \overline{Y})^{2} / (p-1)}
{\Sigma (\hat Y_{i} - \overline{Y})^{2} / (n - p)}
$

where

$\displaystyle \overline{Y} = \frac{1}{n} \Sigma Y_{i}
$

t Statistic

If $\beta_{i}$ is the i'th parameter then t statistic tests the null hypothesis $\beta_{i} = 0$. It is calculated by,

$\displaystyle t_{i} = \frac{\beta_{i}}{\sqrt{D(\beta)_{ii}}}
$

where $\beta_{i}$ is defined above and $D(\beta)_{ii}$ is the diagonal element of the covariance matrix corresponding to the i'th parameter. The statistic is assumed to follow the T distribution with (n-p) degrees of freedom (n is the number of observations and p is the number of parameters). (Source).
next up previous contents
Next: Regression Diagnostics Up: Statistical Definitions Previous: Correlation Coefficient   Contents
2003-08-29