Exercises

Jun 12'23

Discuss the computational complexity of linear regression. How much computation do we need to compute the linear predictor that minimizes the average squared error on a training set?

AAdmin

Jun 12'23

The key computational step of principal component analysis (PCA) amounts to an eigenvalue decomposition (EVD) of the positive semi-definite (psd) matrix. Consider an arbitrary initial vector [math]\eigvecCov^{(\itercntr)}[/math] and the sequence obtained by iterating

[[math]] \begin{equation} \eigvecCov^{(\itercntr+1)} \defeq \mQ \eigvecCov^{(\itercntr)} / \big\| \mQ \eigvecCov^{(\itercntr)} \big\|. \end{equation} [[/math]]

What (if any) conditions on the initialization [math]\eigvecCov^{(\itercntr)}[/math] ensure that the sequence [math]\eigvecCov^{(\itercntr)}[/math] converges to the eigenvector [math]\eigvecCov^{(1)}[/math] of [math]\mQ[/math] that corresponds to its largest eigenvalue [math]\eigval{1}[/math]?

AAdmin

Jun 12'23

Consider a training set [math]\dataset[/math] consisting of [math]\samplesize=10^{10}[/math] labeled data points [math]\big( \rawfeaturevec^{(1)}, \truelabel^{(1)} \big), \ldots, \big( \rawfeaturevec^{(\samplesize)}, \truelabel^{(\samplesize)} \big)[/math] with raw feature vectors [math]\rawfeaturevec^{(\sampleidx)} \in \mathbb{R}^{4000}[/math] and binary labels [math]\truelabel^{(\sampleidx)} \in \{-1,1\}[/math].

Assume we have used a feature learning method to obtain the new features [math]\featurevec^{(\sampleidx)} \in \{0,1\}^{\featurelen}[/math] with [math]\featurelen=\samplesize[/math] and such that the only non-zero entry of [math]\featurevec^{(\sampleidx)}[/math] is [math]\feature^{(\sampleidx)}_{\sampleidx} = 1[/math], for [math]\sampleidx=1,\ldots,\samplesize[/math].

Can you find a linear classifier that perfectly classifies the training set?