May 25'23

Exercise

Determine which of the following statements is applicable to K-means clustering and is not applicable to hierarchical clustering.

  • If two different people are given the same data and perform one iteration of the algorithm, their results at that point will be the same.
  • At each iteration of the algorithm, the number of clusters will be greater than the number of clusters in the previous iteration of the algorithm.
  • The algorithm needs to be run only once, regardless of how many clusters are ultimately decided to use.
  • The algorithm must be initialized with an assignment of the data points to a cluster.
  • None of (A), (B), (C), or (D) meet the meet the stated criterion.

Copyright 2023. The Society of Actuaries, Schaumburg, Illinois. Reproduced with permission.

May 26'23

Key: D

(A) For K-means the initial cluster assignments are random. Thus different people can have different clusters, so the statement is not true for K-means clustering. It is true for hierarchical clustering.

(B) For K-means the number of clusters is set in advance and does not change as the algorithm is run. For hierarchical clustering the number of clusters is determined after the algorithm is completed.

(C) For K-means the algorithm needs to be re-run if the number of clusters is changed. This is not the case for hierarchical clustering.

(D) This is true for K-means clustering. Agglomerative hierarchical clustering starts with each data point being its own cluster.

Copyright 2023. The Society of Actuaries, Schaumburg, Illinois. Reproduced with permission.

00