Linear regression Part

Prove mean of predicted values in OLS regression is equal to the mean of original values

In OLS estimation, we can summarize response \(y\) as \[ y_i=\hat{y}_i+e_i \] where residual \(e_i\) is assumed to follow a normal distribution \(N(0,\sigma^2)\) \[ \sum e_i=\overline{e}=0 \] thereby, we have \[ \sum_{i=1}^ny_i=\sum_{i=1}^n(\hat{y}_i+e_i)\\ =\sum_{i=1}^n\hat{y}_i \]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
> iris_model <- lm(Petal.Width~Sepal.Length+Sepal.Width+Petal.Length,data = iris)
> summary(iris_model)

Call:
lm(formula = Petal.Width ~ Sepal.Length + Sepal.Width + Petal.Length,
data = iris)

Residuals:
Min 1Q Median 3Q Max
-0.60959 -0.10134 -0.01089 0.09825 0.60685

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.24031 0.17837 -1.347 0.18
Sepal.Length -0.20727 0.04751 -4.363 2.41e-05 ***
Sepal.Width 0.22283 0.04894 4.553 1.10e-05 ***
Petal.Length 0.52408 0.02449 21.399 < 2e-16 ***
---
Signif. codes: 0***0.001**0.01*0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.192 on 146 degrees of freedom
Multiple R-squared: 0.9379, Adjusted R-squared: 0.9366
F-statistic: 734.4 on 3 and 146 DF, p-value: < 2.2e-16

> mean(iris_model$fitted.values)
[1] 1.199333
> mean(iris$Petal.Width)
[1] 1.199333