Explain why an error term is added to the regression model. What assumptions are made about the error term? What are the implications of such assumptions? What will happen to the estimators of the parameters of the regression model, if these assumptions are violated?

Introduction

In econometric modeling, a regression equation is used to express the relationship between a dependent variable and one or more independent variables. However, this relationship is not always perfect. There are many factors that influence the dependent variable which are either unobserved or not included in the model. To account for these unknown influences, an error term is added to the regression equation. This error term plays a crucial role in estimation and inference.

Why is an Error Term Added to the Regression Model?

The error term, also called disturbance term or residual, captures the effect of all variables not included in the regression model that influence the dependent variable.

For example, the basic simple linear regression model is:

Y_i = α + βX_i + u_i

Where:

Y_i is the dependent variable
X_i is the independent variable
α and β are parameters to be estimated
u_i is the error term

Reasons to include an error term:

To capture omitted variables
To reflect measurement errors in the variables
To account for randomness and unpredictable variation
To represent model misspecification (e.g., wrong functional form)

Assumptions about the Error Term (Classical Linear Regression Model)

In order for the Ordinary Least Squares (OLS) estimators to be valid and reliable, several assumptions are made about the error term:

1. Zero Mean

E(u_i) = 0
This means that the average value of the error term is zero. This ensures that the regression line passes through the mean of the data.

2. Homoscedasticity

Var(u_i) = σ²
The variance of the error term is constant for all observations. If not, the model suffers from heteroscedasticity.

3. No Autocorrelation

Cov(u_i, u_j) = 0 for i ≠ j
Error terms of different observations are uncorrelated. Violation leads to autocorrelation, common in time series data.

4. No Correlation with Independent Variables

Cov(u_i, X_i) = 0
The error term must be uncorrelated with the explanatory variables. If this assumption is violated, OLS estimators become biased and inconsistent.

5. Normality (for inference)

u_i ~ N(0, σ²)
This is required for conducting t-tests and F-tests reliably. It’s especially important in small samples.

Implications of These Assumptions

When all the above assumptions are satisfied, the OLS estimators possess the following desirable properties:

Unbiasedness: On average, the estimated coefficients equal the true population parameters.
Efficiency: The OLS estimators have the minimum variance among all linear unbiased estimators (BLUE – Best Linear Unbiased Estimators).
Consistency: As the sample size increases, the estimators converge to the true parameter values.

What Happens When Assumptions Are Violated?

1. If E(u_i) ≠ 0:

The regression model is misspecified. The intercept and possibly the slope estimates will be biased.

2. If Var(u_i) is not constant (Heteroscedasticity):

The OLS estimators remain unbiased but are no longer efficient. Standard errors become incorrect, leading to unreliable hypothesis testing.

3. If u_i are autocorrelated:

OLS estimates are still unbiased but not efficient. Standard errors are underestimated, increasing the chances of incorrect significance tests.

4. If u_i is correlated with X_i (Endogeneity):

This is the most serious violation. OLS estimators become biased and inconsistent. Common causes include omitted variables or simultaneity.

5. If u_i is not normally distributed:

OLS estimators are still unbiased and consistent (in large samples), but the t-tests and F-tests may not be valid in small samples.

Conclusion

The error term is a crucial component in any regression model, capturing the effects of all omitted or unknown variables. The assumptions about the error term ensure that the OLS estimates are reliable, unbiased, and efficient. Violating these assumptions can lead to misleading interpretations, invalid statistical inference, and poor forecasting. Therefore, testing for violations and applying corrective measures such as transformation or alternative estimators (e.g., GLS, IV) is necessary to improve model performance.

Introduction

Why is an Error Term Added to the Regression Model?

Assumptions about the Error Term (Classical Linear Regression Model)

1. Zero Mean

2. Homoscedasticity

3. No Autocorrelation

4. No Correlation with Independent Variables

5. Normality (for inference)

Implications of These Assumptions

What Happens When Assumptions Are Violated?

1. If E(ui) ≠ 0:

2. If Var(ui) is not constant (Heteroscedasticity):

3. If ui are autocorrelated:

4. If ui is correlated with Xi (Endogeneity):

5. If ui is not normally distributed: