Introduction
In econometric modeling, a regression equation is used to express the relationship between a dependent variable and one or more independent variables. However, this relationship is not always perfect. There are many factors that influence the dependent variable which are either unobserved or not included in the model. To account for these unknown influences, an error term is added to the regression equation. This error term plays a crucial role in estimation and inference.
Why is an Error Term Added to the Regression Model?
The error term, also called disturbance term or residual, captures the effect of all variables not included in the regression model that influence the dependent variable.
For example, the basic simple linear regression model is:
Yi = α + βXi + ui
Where:
- Yi is the dependent variable
- Xi is the independent variable
- α and β are parameters to be estimated
- ui is the error term
Reasons to include an error term:
- To capture omitted variables
- To reflect measurement errors in the variables
- To account for randomness and unpredictable variation
- To represent model misspecification (e.g., wrong functional form)
Assumptions about the Error Term (Classical Linear Regression Model)
In order for the Ordinary Least Squares (OLS) estimators to be valid and reliable, several assumptions are made about the error term:
1. Zero Mean
E(ui) = 0
This means that the average value of the error term is zero. This ensures that the regression line passes through the mean of the data.
2. Homoscedasticity
Var(ui) = σ²
The variance of the error term is constant for all observations. If not, the model suffers from heteroscedasticity.
3. No Autocorrelation
Cov(ui, uj) = 0 for i ≠ j
Error terms of different observations are uncorrelated. Violation leads to autocorrelation, common in time series data.
4. No Correlation with Independent Variables
Cov(ui, Xi) = 0
The error term must be uncorrelated with the explanatory variables. If this assumption is violated, OLS estimators become biased and inconsistent.
5. Normality (for inference)
ui ~ N(0, σ²)
This is required for conducting t-tests and F-tests reliably. It’s especially important in small samples.
Implications of These Assumptions
When all the above assumptions are satisfied, the OLS estimators possess the following desirable properties:
- Unbiasedness: On average, the estimated coefficients equal the true population parameters.
- Efficiency: The OLS estimators have the minimum variance among all linear unbiased estimators (BLUE – Best Linear Unbiased Estimators).
- Consistency: As the sample size increases, the estimators converge to the true parameter values.
What Happens When Assumptions Are Violated?
1. If E(ui) ≠ 0:
The regression model is misspecified. The intercept and possibly the slope estimates will be biased.
2. If Var(ui) is not constant (Heteroscedasticity):
The OLS estimators remain unbiased but are no longer efficient. Standard errors become incorrect, leading to unreliable hypothesis testing.
3. If ui are autocorrelated:
OLS estimates are still unbiased but not efficient. Standard errors are underestimated, increasing the chances of incorrect significance tests.
4. If ui is correlated with Xi (Endogeneity):
This is the most serious violation. OLS estimators become biased and inconsistent. Common causes include omitted variables or simultaneity.
5. If ui is not normally distributed:
OLS estimators are still unbiased and consistent (in large samples), but the t-tests and F-tests may not be valid in small samples.
Conclusion
The error term is a crucial component in any regression model, capturing the effects of all omitted or unknown variables. The assumptions about the error term ensure that the OLS estimates are reliable, unbiased, and efficient. Violating these assumptions can lead to misleading interpretations, invalid statistical inference, and poor forecasting. Therefore, testing for violations and applying corrective measures such as transformation or alternative estimators (e.g., GLS, IV) is necessary to improve model performance.