编辑: hys520855 | 2019-07-11 |
3 The Rulearner Algorithm The Rulearner algorithm takes as input (i) a lattice, L, (ii) a set of instance classification labelings, C, which correspond to the instance nodes in L, and (iii) a noise parameter, N, indicating a percentage by which each induced rule can misclassify some portion of the training instances to which it applies. The algorithm produces a set of symbolic classification rules as output. Furthermore, the user can configure the system to induce either a decision-list or an unordered set of rules, and can also decide whether the rules induced should only classify one (ie. positive) or all given labelings. While the algorithm is general enough to deal with more than two classification labelings, we present the algorithm here as a binary classifier for easier understanding. The algorithm first labels all instance nodes (whose labelings are given in C) and then filters these labelings up the lattice. A node u is given a particular label, say POSITIVE, if the instance nodes in DC(u) are all of the class POSITIVE (allowing for some percentage of mislabelings given by the noise parameter N). If there are insufficient instance nodes of any given class in DC(u) to give u a particular label, then it is labeled MIXED. Each node is initially marked active, and each time a rule is induced using a node u, all the nodes in DC(u) are marked inactive . As long as active nodes remain in the lattice, there are still candidate nodes for rule induction and hence new rules are induced. The rule induction process simply finds a non- MIXED active node in the lattice which covers the most previously uncovered instance nodes and forms a rule. If several nodes have the same number of instances in their downward closures, we prefer the node which has the fewest features in its upward closure ― an application of Occam'
s Razor. Intuitively, this corresponds to finding a minimal set of features that covers a large portion of the instance space with a given labeling. An important factor in this minimal set of features is that it is directly derived from commonalties in the underlying data and hence the antecedent of the rule induced is directly driven by the data in the training set. PROCEDURE Rulearner(L, C, N) Initialize all nodes in the lattice to be active Label all nodes in the lattice (using the noise parameter N) WHILE (there are still active nodes in the lattice) u ← node with greatest cover and non-MIXED labeling Output rule: feature nodes in UC(u) implies label of node u FORALL (v ∈ DC(u) where v is an instance node) FORALL (w ∈ UC(v)) Decrement the cover of w IF (cover of w ≤ 0) mark w inactive Mark v as inactive IF (Decision-List) re-label active nodes (using the noise parameter N) END Fig. 1. Pseudo-code for the Rulearner algorithm.
4 Experimental Results The Rulearner system was first tested on the Monk'
s Problems [TH, 1991]. As a comparison, we tried several configuration of other induction systems to capture the best performance of those systems compared to Rulearner. For the first two Monk problems, we test three basic configurations of the Rulearner system: (i) decision- lists, (ii) unordered rule sets, and (iii) rule sets to classify only the positive instances, predicting negative as a default rule. The noise parameter, N, was set at 0%. Algorithm Monk
1 Monk
2 CN2 (decision-list) 100.0% 72.9% CN2 (unordered rules) 98.6% 75.7% C4.5 (unpruned tree) 76.6% 65.3% C4.5 (pruned tree) 75.7% 65.0% C4.5 -s (unpruned tree) 94.4% 69.0% C4.5 -s (pruned tree) 100.0% 70.4% Rulearner (decision-list) 100.0% 74.5% Rulearner (unordered rules) 100.0% 74.8% Rulearner (pos rules only) 100.0% 70.4% Table 1. Accuracy of induction algorithms on MONK'