C Smoothing parameter \(\lambda\) for smoothing splines
A suitable criterion to choose \(\lambda\) can be the mean-square error:
\[MSE = \frac{1}{n}\sum_{i=1}^n(\hat{s}_i - s_i)^2,\]
However \(s\) is unknown so the \(MSE\) cannot be used directly but it is possible to derive an estimate of \(\mathbb{E}(MSE) + \sigma^2\), which is the expected squared error in predicting a new variable. We define the ordinary cross validation score as
\[CV_o = \frac{1}{n}\sum_{i=1}^n(\hat{s}^{[-i]} - y_i)^2\]
Substituting \(y_i = s_i + \epsilon_i\), \[\begin{aligned} CV_o &= \frac{1}{n}\sum_{i=1}^n(\hat{s}_i^{[-i]} - s_i - \epsilon_i)^2 \\ &= \frac{1}{n}\sum_{i=1}^n(\hat{s}_i^{[-i]} - s_i)^2 - (\hat{s}_i^{[-i]} - s_i)\epsilon_i + \epsilon_i^2.\end{aligned}\]
Since \(\mathbb{E}(\epsilon_i)\) \(= 0\) , and that \(\epsilon_i\) and \(\hat{f}^{[-i]}\) are independent, the second term in the summation vanishes if expectations are taken: \[\mathbb{E}(CV_o) = \frac{1}{n}\mathbb{E}\left( \sum_{i=1}^n(\hat{s}_i^{[-i]} - s_i)^2\right) + \sigma^2.\]
\(\hat{s}^{[-i]} \approx \hat{s}\) with equality in the large sample limit, so \(\mathbb{E}(CV_o) \approx \mathbb{E}(MSE) + \sigma^2\) also with equality in the large sample limit. Choosing \(\lambda\) in order to minimize \(CV_o\) is known as ordinary cross validation.
It can be shown that ordinary leave-one-out cross validation is defined as follow \[LVOCV_o = \frac{1}{n}\sum_{i=1}^{n}(y_i-\hat{s}_i)^2/(1-A_{ii})^2,\] where \(\hat{s}\) is the estimate from fitting all the data and \(\mathbf{A}\) is the corresponding influence matrix. In practice the weights, \(1 - A_{ii}\), are often replaced the mean weight, \(\text{trace}(\mathbf{I - A)}/n\) in order to get the generalized cross validation score \[GCV = \frac{n\sum_{i=1}^n(y_i-\hat{s}_i)^2}{\text{trace}(\mathbf{I-A})^2}.\]