Revision as of 22:58, 25 May 2023 by Admin (Created page with "A classification tree is being constructed to predict if an insurance policy will lapse. A random sample of 100 policies contains 30 that lapsed. You are considering two split...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
ABy Admin
May 25'23

Exercise

A classification tree is being constructed to predict if an insurance policy will lapse. A random sample of 100 policies contains 30 that lapsed. You are considering two splits:

Split 1: One node has 20 observations with 12 lapses and one node has 80 observations with 18 lapses.

Split 2: One node has 10 observations with 8 lapses and one node has 90 observations with 22 lapses.

The total Gini index after a split is the weighted average of the Gini index at each node, with the weights proportional to the number of observations in each node. The total entropy after a split is the weighted average of the entropy at each node, with the weights proportional to the number of observations in each node.

Determine which of the following statements is/are true?

  • Split 1 is preferred based on the total Gini index.
  • Split 1 is preferred based on the total entropy.
  • Split 1 is preferred based on having fewer classification errors.
  • I only
  • II only
  • III only
  • I, II, and III
  • The correct answer is not given by (A), (B), (C), or (D).

Copyright 2023. The Society of Actuaries, Schaumburg, Illinois. Reproduced with permission.

ABy Admin
May 26'23

Key: E

The total Gini index for Split 1 is

2[20(12/20)(8/20) + 80(18/80)(62/80)]/100 = 0.375 

and for Split 2 is

2[10(8/10)(2/10) + 90(22/90)(68/90)]/100 = 0.3644. 

Smaller is better, so Split 2 is preferred. The factor of 2 is due to summing two identical terms (which occurs when there are only two classes).

The total entropy for Split 1 is

 –[20(12/20)ln(12/20) +20(8/20)ln(12/20) + 80(18/80)ln(18/80) + 80(62/80)ln(62/80)]/100 = 0.5611 

and for Split 2 is

 – [10(8/10)ln(8/10) +10(2/10)ln(2/10) + 90(22/90)ln(22/90) + 90(68/90)ln(68/90)]/100 =0.5506. 

Smaller is better, so Split 2 is preferred.

For Split 1, there are 8 + 18 = 26 errors and for Split 2 there are 2 + 22 = 24 errors. With fewer errors, Split 2 is preferred.

Copyright 2023. The Society of Actuaries, Schaumburg, Illinois. Reproduced with permission.

00