exercise:Dde5163422

Jun 12'23

Exercise

Consider the linear hypothesis space consisting of linear maps parameterized by weights [math]\weights[/math]. We try to find the best linear map by minimizing the regularized average squared error loss (empirical risk) incurred on a training set

[[math]]\dataset \defeq \big \{ (\featurevec^{(1)},\truelabel^{(1)}),(\featurevec^{(2)},\truelabel^{(2)}),\ldots,(\featurevec^{(\samplesize)},\truelabel^{(\samplesize)}) \big \}.[[/math]]

Ridge regression augments the average squared error loss on [math]\dataset[/math] by the regularizer [math]\| \weights \|^{2}[/math], yielding the following learning problem

[[math]] \min_{\weights \in \mathbb{R}^{\featurelen}} f(\weights) = (1/\samplesize)\sum_{\sampleidx=1}^{\samplesize}\big( \truelabel^{(\sampleidx)} - \weights^{T} \featurevec^{(\sampleidx)} \big) + \regparam \| \weights \|^{2}_{2}.[[/math]]

Is it possible to rewrite the objective function [math]f(\weights)[/math] as a convex quadratic function [math]f(\weights) = \weights^{T} \mathbf{C} \weights + \vb \weights + c[/math]?

If this is possible, how are the matrix [math]\mathbf{C}[/math], vector [math]\vb[/math] and constant [math]c[/math] related to the feature vectors and labels of the training data ?

Add answer Add answer