编辑: 喜太狼911 | 2019-07-16 |
Donnellincluding joint work with:Nader Bshouty C Technion, Elchanan Mossel C UC Berkeley, Rocco Servedio C Columbia Learning theory Computational learning theory deals with an algorithmic problem: Given random labeled examples from an unknown function f : {0,1}n → {0,1}, try to find a hypothesis h : {0,1}n → {0,1} which is good at predicting the label of future examples.
Valiant'
s PAC model [Val84] PAC = Probably Approximately Correct:learning problem is identified with a concept class C, which is a set of functions ( concepts ) f : {0,1}n → {0,1}nature/adversary chooses one particular f ?? C and a probability distribution on inputs Dthe learning algorithm now takes as inputs ε and δ, and also gets random examples ?x, f(x)?, x drawn from Dgoal: with probability 1-δ, output a hypothesis h which satisfies Prx←D [h(x) ≠ f(x)] <
εefficiency: running time of algorithm, counting time
1 for each example;
hopefully poly(n, 1/ε , 1/δ) Example C learning conjunctions As an example, we present an algorithm ([Val84]) for learning the concept class of conjunctions C i.e., C is the set of all AND functions.start with the hypothesis h = x1 ? x2 ? ? ? ? ? xndraw O((n / ε) log(1/δ)) examples:whenever you see a positive example;
e.g., ?11010110, 1?, you know that the zero coordinates (in this case, x3, x5, x8) can'
t be in the target AND;
delete them from the hypothesisIt takes a little reasoning to show this works, but it does. Learning DNF formulas Probably the most important concept class we would like to learn is DNF formulas: e.g., the set of all functions like f = (x1 ? x2 ? x6 ) ?? ( x1 ? x3) ? (x4 ? x5 ? x7 ? x8).(We actually mean poly-sized DNF: the number of terms should be nO(1), where n is the number of variables.)Why so important?natural form of knowledge representation for peoplehistorical reasons: considered by Valiant, who called the problem tantalizing and apparently [simple] yet has proved a great challenge over the last
20 years Talk overview In this talk I will:1. Describe variants of the PAC model, and the fastest known algorithms for learning DNF in each of them2. Point out road blocks for improving these results, and present some open problems which are simple, concrete, and lucrativeI will not:discuss every known modelconsider the important problem of learning restricted DNF C e.g., monotone DNF, O(1)-term DNF, etc. The original PAC model The trouble with this model is that, despite Valiant'
s initial optimism, PAC-learning DNF formulas appears to be very hard.The fastest known algorithm is due to Klivans and Servedio [KS01], and runs in time exp(n1/3 log2n).Technique: They show that for any DNF formula, there is a polynomial in x1, …, xn of degree at most n1/3 log n which is positive whenever the DNF is true and negative whenever the DNF is false. Linear programming can be used to find a hypothesis consistent with every example in time exp(n1/3 log2n).Note: Consider the model, more difficult than PAC, in which the learner is forced to output a hypothesis which itself is a DNF. In this case, the problem is NP-hard. Distributional issues One aspect of PAC learning that makes it very difficult is that the adversary gets to pick a different probability distribution for the examples for every concept:Thus the adversary can pick a distribution which puts all the probability weight on the most difficult examples.A very commonly studied easier model of learning is called Uniform Distribution learning. Here, the adversary must always use the uniform distribution on {0,1}n.Under the uniform distribution, DNF can be learned in quasipolynomial time… Uniform Distribution learning In 1990, Verbeugt [Ver90] observed that, under the uniform distribution, any term in the target DNF which is longer than log(n/ε) is essentially always false, and thus irrelevant. This fairly easily leads to an algorithm for learning DNF under uniform in quasipolynomial time: roughly nlog n. In 1993, Linial, Mansour, and Nisan [LMN93] introduced a powerful and sophisticated method of learning under the uniform distribution based on Fourier analysis. In particular, their algorithm could learn depth d circuits in time roughly nlogdn.Fourier analysis proved very important for subsequent learning of DNF under the uniform distribution. Membership queries Still, DNF are not known to be learnable in polynomial time even under the uniform distribution.Another common way to make learning easier is to allow the learner membership queries. By this we mean that in addition to getting random examples, the learner is allowed to ask for the value of f on any input it wants. In some sense this begins to stray away from the traditional model of learning in which the learner is passive. However, it'