编辑: hgtbkwd | 2013-06-21 |
2006 的数据集上进行了实验.通过对实验结果的分析以及同评 测结果的比较,验证了本文方法的有效性. 关键词 汉语句法分析;
依存语法;
名词复合短语;
动态局部优化;
多语依 存分析 Abstract III Abstract The goal of parsing is to derive the syntactic structures of sentence according to a certain grammar. The improvement of parsing will give an enormous impetus to natural language processing applications such as information retrieval, information extraction and machine translation. The statistical parsing community has begun to reach out for dependency grammar that is easy to understand, annotate and use. In recent years, dependency grammar has been adopted by many researchers in the field of natural language processing, and applied to many languages. However, dependency grammar has not been researched fully for Chinese because of the shortage of treebank resource and the problems of technology. To solve this problem the techniques of Chinese dependency parsing are investigated based on statistical learning methods in this paper. The work in this paper falls into five parts that includes: 1. A lexical analysis system that includes segmentation and POS tagging is implemented, and particularly the function of verb subdividing is added to the system. Distinguishing the different properties of verbs aims to reduce the syntactic ambiguities resulting from verbs, and decrease the complexity of syntactic structures. This paper makes a verb subclasses scheme that divides verbs into eight subclasses. Then the maximum entropy method is used to distinguish the verb subclasses to improve the performance of dependency parsing. 2. Noun compounds are popular grammatical structures in many kinds of languages. They have a great influence on some applications such as information extraction and machine translation. Because traditional parsing methods are not good at processing noun compounds, this paper solves the problem specially to reduce the difficulties of syntactic analysis. As to the characteristics of Chinese noun compounds a method based on hidden markov tree model is presented to mitigate the effect of such phrases on the parsing. 3. Syntactic analysis is very sensitive to the sentence length. The efficiency 哈尔滨工业大学工学博士学位论文 IV of searching algorithm and parsing accuracy will suffer with the increase of the sentence length. This paper presents a segment-based method to solve the problem of length in the parsing. Firstly, a sentence is divided into different segments, whose types are identified by SVM classifier. Then the sentence is parsed based on the segments. Finally, all the segments are linked through the dependency relations to form a complete dependency tree. 4. According to the characteristics of language an efficient Chinese dependency parsing algorithm is proposed. For the flexical syntactic structures and the lack of Chinese treebank a divide-and-conquer strategy is used to deal with the specific grammatical structures........