exercise:Cdbf78bb07

Jun 12'23

Exercise

Consider a linear regression problem with data points [math](\feature,\truelabel)[/math] characterized by a scalar feature [math]\feature[/math] and a numeric label [math]\truelabel[/math].

Assume data points are realizations of independent and identically distributed (iid) random variable (RV)s whose common probability distribution is multivariate normal with zero-mean and covariance matrix [math]\mathbf{C} = \begin{pmatrix} \sigma^2_{\feature} & \sigma_{\feature,\truelabel} \\ \sigma_{\feature,\truelabel} & \sigma^{2}_{\truelabel} \end{pmatrix}[/math].

The entries of this covariance matrix are the variance [math]\sigma^2_{\feature}[/math] of the (zero-mean) feature, the variance [math]\sigma^2_{\feature}[/math] of the (zero-mean) label and the covariance between feature and label of a random data point.

How many data points do we need to include in a validation set such that with probability of at least [math]0.8[/math] the validation error of a given hypothesis [math]h[/math] does not deviate by more than [math]20[/math] percent from its expected loss?

Add answer Add answer