编辑: 迷音桑 | 2018-04-12 |
8, No.
2 , August 2003, pp. 29-60
29 ? The Association for Computational Linguistics and Chinese Language Processing Chinese Named Entity Recognition Using Role Model1 Hua-Ping ZHANG* , Qun LIU *+ , Hong-Kui YU* , Xue-Qi CHENG * , Shuo BAI * Abstract This paper presents a stochastic model to tackle the problem of Chinese named entity recognition. In this research, we unify component tokens of named entity and their contexts into a generalized role set, which is like part-of-speech (POS). The probabilities of role emission and transition are acquired after machine learning on a role-labeled data set, which is transformed from a hand-corrected corpus after word segmentation and POS tagging are performed. Given an original string, role Viterbi tagging is employed on tokens segmented in the initial process. Then named entities are identified and classified through maximum matching on the best role sequence. In addition, named entity recognition using role model is incorporated along with the unified class-based bigram model for word segmentation. Thus, named entity candidates can be further selected in the final process of Chinese lexical analysis. Various evaluations conducted using one
1 This research is supported by the national
973 fundamental research program under grants number G1998030507-4 and G1998030510 and the ICT Youth Fund under contract number 20026180-23. Hua-Ping Zhang (Kevin Zhang): born in February, 1978, a PhD candidate in the Institute of Computing Technology (ICT), Chinese Academy of Sciences. His research interests include computational linguistics, Chinese natural language processing and information extraction. Qun Liu: born in October 1966, an associate professor at ICT and a PhD candidate at Peking University. His research interests include machine translation, computational linguistics and Chinese natural language processing. Hong-KuiYu: born in November 1978, a visiting student at ICT from Beijing University of Chemical Technology. His research interests include natural language processing and named entity extraction. Xue-Qi Cheng: born in 1971, an associate professor and director of the software division of ICT. His research fields include computational linguistics, network and information security. Shuo Bai: born in March 1956, a professor, PhD supervisor and principal scientist of the software division of ICT. His research fields include computational linguistics, network and information security. * Software Division, Institute of Computing Technology, The Chinese Academy of Sciences, Beijing, P.R. China,
100080 Email: zhanghp@ software.ict.ac.cn + Institute of Computational Linguistics, Peking University, Beijing, P.R. China,
100871 30 Hua-Ping Zhang et al. month of news from the People'
s Daily and MET-2 data set demonstrate that the role modeled can achieve competitive performance in Chinese named entity recognition. We then survey the relationship between named entity recognition and Chinese lexical analysis via experiments on a 1,105,611-word corpus using comparative cases. It was found that: on one hand, Chinese named entity recognition substantially contributes to the performance of lexical analysis;
on the other hand, the subsequent process of word segmentation greatly improves the precision of Chinese named entity recognition. We have applied the role model to named entity identification in our Chinese lexical analysis system, ICTCLAS, which is free software and available at the Open Platform of Chinese NLP (www.nlp.org.cn). ICTCLAS ranked first with 97.58% in word segmentation precision in a recent official evaluation, which was held by the National