编辑: 达达恰西瓜 | 2018-08-26 |
ac.cn Journal of Software,2016,27(2):309?328 [doi: 10.13328/j.cnki.jos.004860] http://www.jos.org.cn ?中国科学院软件研究所版权所有. Tel: +86-10-62562563 面向自然语言文本的否定性与不确定性信息抽取 ? 邹博伟, 钱忠, 陈站成, 朱巧明, 周国栋 (苏州大学 计算机科学与技术学院,江苏 苏州 215006) 通讯作者: 周国栋, E-mail: [email protected], http://nlp.suda.edu.cn/~gdzhou/ 摘要: 目前,信息抽取研究主要面向肯定性信息,而自然语言文本中包含了大量否定性和不确定性信息,为了将 此类信息与肯定性信息区分开,有必要针对否定性与不确定性信息抽取进行深入研究.针对这一任务,首次构建了一 个16
841 句的汉语语料资源,利用序列标注模型与卷积树核模型,系统地探索了各种序列化依存特征和结构化句法 树特征的有效性,并提出了元决策树模型,对二者进行融合.实验结果显示,该方法在否定性和不确定性信息抽取任 务上的精确率分别达到 69.84%和58.57%,为相关研究打下了坚实的基础. 关键词: 信息抽取;
否定性信息;
不确定性信息;
线索词检测;
覆盖域界定 中图法分类号: TP391 中文引用格式: 邹博伟,钱忠,陈站成,朱巧明,周国栋.面向自然语言文本的否定性与不确定性信息抽取.软件学报,2016,27(2): 309?328. http://www.jos.org.cn/1000-9825/4860.htm 英文引用格式: Zou BW, Qian Z, Chen ZC, Zhu QM, Zhou GD. Negation and uncertainty information extraction oriented to natural language text. Ruan Jian Xue Bao/Journal of Software, 2016,27(2):309?328 (in Chinese). http://www.jos.org.cn/1000- 9825/4860.htm Negation and Uncertainty Information Extraction Oriented to Natural Language Text ZOU Bo-Wei, QIAN Zhong, CHEN Zhan-Cheng, ZHU Qiao-Ming, ZHOU Guo-Dong (School of Computer Science and Technology, Soochow University, Suzhou 215006, China) Abstract: The current research on information extraction mainly focuses on affirmative information. However there are more negation and uncertainty information in natural language texts. For purpose of separating them from affirmative information, it is necessary to make an intensive study of negation and uncertainty information extraction. For this task, this study firstly constructs a Chinese corpus including
16 841 sentences. Employing the sequence labeling model and the convolution tree kernel model, it systematically explores the efficiency of various kinds of serialized dependency features and structured parsing features. Finally, it proposes a meta-decision tree model to integrate the above two models. Experimental results show that the performances of the new method on negation and uncertainty information extraction achieve 69.84% and 58.57% of accuracy respectively, providing a solid foundation for related studies in the future. Key words: information extraction;
negation information;
uncertainty information;
cue detection;
scope resolution 随着信息抽取技术的发展,越来越多的相关应用已能较为准确地从海量自然语言文本数据中获取各类所 需信息,然而却并未对信息表述的否定性和不确定性进行甄别.如果所获取的信息来自于包含否定、猜测、假 设等非事实类型的表述,则将导致该信息或知识的价值大幅度降低,甚至可能获得与真实情况完全相反的信息. 基于此,为了抽取出此类否定性或不确定性信息及表述,并将其与事实信息相分离,面向自然语言文本的否定性 与不确定性信息抽取(negation and uncertainty information extraction)任务应运而生. ? 基金项目: 国家自然科学基金(61272260, 61331011, 61273320) Foundation item: National Natural Science Foundation of China (61272260, 61331011, 61273320) 收稿时间: 2015-01-30;