ABy Admin
May 25'23

Exercise

You apply 2-means clustering to a set of five observations with two features. You are given the following initial cluster assignments:

Observation X1 X2 Initial cluster
1 1 3 1
2 0 4 1
3 6 2 1
4 5 2 2
5 1 6 2


Calculate the total within-cluster variation of the initial cluster assignments, based on Euclidean distance measure.

  • 32.0
  • 70.3
  • 77.3
  • 118.3
  • 141.0

Copyright 2023. The Society of Actuaries, Schaumburg, Illinois. Reproduced with permission.

ABy Admin
May 26'23

Key: C

The means for cluster 1 are (1 + 0 + 6)/3 = 2.3333 for X1 and (3 + 4 + 2)/3 = 3 for X2 and the variation is

(1 − 2.3333)2 + (3 − 3)2 + (0 − 2.3333)2 + (4 − 3)2 + (6 − 2.3333)2 + (2 − 3)2 = 22.6667.

The means for cluster 2 are (5 + 1)/2 = 3 for X1 and (2 + 6)/2 = 4 for X2 and the variation is

(5 − 3)2 + (2 − 4)2 + (1 − 3)2 + (6 − 4)2 = 16.

The total within-cluster variation is (per equation (10.12) in the first edition of ISLR)

2(22.6667 + 16) = 77.33.

Copyright 2023. The Society of Actuaries, Schaumburg, Illinois. Reproduced with permission.

00