#### 您當前位置：首頁 >> Algorithm 算法作業Algorithm 算法作業

###### 日期：2019-11-19 09:38

B365 Homework 5

1. Consider the Fisher iris data, as plotted in the program “iris.r” studied in class.

(a) Reasoning from this plot, choose a variable and a split point for the variable so that the resulting two

regions have one that contains only Setosa flowers, while the ther mixes the other two classes.

(b) For the “mixed” region resulting from the first part, choose a variable and split point that separates the

Versicolor and Virginica flowers as well as possible.

(c) Sketch the resulting three regions over the scatterplot of the two relevant variables, clearly labeling each

region with the resulting class.

2. Consider the table for a 2-class tree classifier with classes {+, ?} below giving the number of +’s and -’s

reaching each node.

The nodes are numbered so that 1 is the root, while the children of node k are 2k and 2k + 1. The “terminal”

column says if a node is terminal or not.

(a) Construct the tree as a graph (the usual depiction of a tree) labeling the nodes with numbers and giving

the classification each node and the number of errors it would produce if it were a terminal node.

(b) Compute R(T) where T is the tree given in the example and R(T) is our probabability of error for the

tree when tested on the training set.

(c) Explain why you do or do not believe this is an accurate representation of the tree’s performance on new

data.

(d) Compute the optimal penalized risk, R?α for each node of T where α = .03. Give the corresponding optimal

tree Tα.

(e) How much do you need to increase α before a different Tα appears? Same question for decreasing α.

3. The following code shows two things. First we show how to create a matrix from the file “tree data.dat,”

available from Canvas, which stores the data above. In the resulting matrix, the columns contain the number

of +’s, the number of -’s arriving at each node, and the boolean variable describing the node as terminal or

not. The rows 10 through 17 are all zeros and are unused. This way X[i, ] gives the data associated with tree

node i.

The factorial function, shown below, gives an example of a simple recursive function in R, which you would

call by, e.g. factorial(5).

X = matrix(scan("tree_data.dat"),byrow=T,ncol = 3)

factorial <- function(i) {

if (i == 1) { return(1); }

else return(i*factorial(i-1))}

(a) Write a recursive function in R that takes as input the number of a node and returns the optimal risk

associated with that node, with a split penalty of α = .03. When you run your function with input 1 (the

root node) it should return the optimal risk for the entire tree.

(b) Let Tα=.03 denote the associated optimal tree, as computed in the previous problem. Construct this tree,

explicitly giving the classifications associated with each terminal tree node.

(c) Explain clearly what problem has Tα=.03 as its optimal solution.

4. Consider the following table of cross validation on tree induction for a two-class classification problem,as

discussed in class:

Root node error: 1524/3100 = 0.49161

n= 3100

CP nsplit rel error xerror xstd

1 0.54757282 0 1.0000000 1.025890 0.0180141

2 0.11909385 1 0.4524272 0.462136 0.0151731

3 0.06601942 2 0.3333333 0.368932 0.0139601

4 0.05372168 3 0.2673139 0.293204 0.0127297

5 0.05242718 4 0.2135922 0.238835 0.0116698

6 0.03430421 5 0.1611650 0.186408 0.0104615

7 0.01326861 6 0.1268608 0.150809 0.0095013

8 0.01165049 8 0.1003236 0.127508 0.0087912

9 0.01035599 9 0.0886731 0.119741 0.0085368

10 0.00776699 10 0.0783172 0.100324 0.0078541

11 0.00550162 12 0.0627832 0.083495 0.0071968

12 0.00517799 14 0.0517799 0.077023 0.0069238

13 0.00453074 15 0.0466019 0.073786 0.0067825

14 0.00291262 17 0.0375405 0.066019 0.0064285

15 0.00258900 19 0.0317152 0.062783 0.0062741

16 0.00226537 20 0.0291262 0.057605 0.0060178

17 0.00194175 22 0.0245955 0.055663 0.0059185

18 0.00161812 24 0.0207120 0.054369 0.0058512

19 0.00129450 26 0.0174757 0.054369 0.0058512

20 0.00097087 30 0.0122977 0.050485 0.0056440

21 0.00064725 34 0.0084142 0.045955 0.0053910

22 0.00032362 42 0.0032362 0.048544 0.0055371

23 0.00000000 52 0.0000000 0.046602 0.0054279

(a) In the “rel error” column we get a value of 0. for the 23rd row. Explain what this number means.

(b) In terms of error rate, how well do you think the tree associated with line 23 will perform on different

data from the sample population.

(c) Consider the tree that makes no splits — i.e. the one that simply classifies according to the most likely

class. How well will this tree classify new data from the same population.

(d) Judging from the table, what appears to be your best choice of complexity parameter α? In what sense

is your α value best?

5. As a result of a recent exam, an instructor of a class believes that 70% of the students do yet not understand a

topic sufficiently well. The instructor wishes to implement a Naive Bayes classifier to estimate each student’s

probability of understanding. Students are asked a sequence of 7 true or false questions. The instructor assumes

that the responses to these questions are conditionally independent given the student’s state of knowledge —

understands or does not understand. Of course, understanding is not really a binary attribute in real life as

there are degrees of understanding and various aspects to understanding, though we regard it as binary here.

2

The following code fragment creates a 2x7 matrix, x, where x[i, j] is the probability that a student will answer

the jth question correctly when her state of knowledge is i. Here i = 1 corresponds to “not understanding” and

i = 2 corresponds to understanding. For ease of computation we compute the 2x7x2 array, z, where z[i, j, k]

gives the probability that a student will give answer k (k=1 means wrong and k=2 means right) to question j,

given her state of knowledge is i.

x = matrix(c(.7, .6, .5, .5, .5, .5, .7, .8, .7, .6, .7, .9, .8, .9 ),byrow=T,nrow=2);

z = array(0,c(2,7,2));

(a) Write R code to fill in the matrix z to be as described in the problem.

(b) Using your z matrix create an R function that receives a vector of 7 test answers which are either wrong

or right. For instance, if the answers are c(0,0,0,0,1,1,1), that would mean the student answered only the

last three questions correctly. The function should return the probability that the student has understood

the subject, using a Naive Bayes classifier.

3