May 25'23

Exercise

You are given a set of n observations, each with p features. Determine which of the following statements is/are true with respect to clustering methods.

  • The n observations can be clustered on the basis of the p features to identify subgroups among the observations.
  • The p features can be clustered on the basis of the n observations to
  • identify subgroups among the features.
  • Clustering is an unsupervised learning method and is often performed as part of an exploratory data analysis.
  • None
  • I and II only
  • I and III only
  • II and III only
  • The correct answer is not given by (A), (B), (C), or (D).

Copyright 2023. The Society of Actuaries, Schaumburg, Illinois. Reproduced with permission.

May 26'23

Key: E

I and II are both true because the roles of rows and columns can be reversed in the clustering algorithm. (See Section 10.3 of An Introduction to Statistical Learning.)

III is true. Clustering is unsupervised learning because there is no dependent (target) variable. It can be used in exploratory data analysis to learn about relationships between observations or features.

Copyright 2023. The Society of Actuaries, Schaumburg, Illinois. Reproduced with permission.

00