编辑: 人间点评 | 2019-07-12 |
21009299 指导教师: 林鸿飞 教授 完成日期:
2014 年5月9日大连理工大学 Dalian University of Technology 大连理工大学学位论文独创性声明 作者郑重声明:所呈交的学位论文,是本人在导师的指导下进行研究 工作所取得的成果.
尽我所知,除文中已经注明引用内容和致谢的地方外, 本论文不包含其他个人或集体已经发表的研究成果,也不包含其他已申请 学位或其他用途使用过的成果.与我一同工作的同志对本研究所做的贡献 均已在论文中做了明确的说明并表示了谢意. 若有不实之处,本人愿意承担相关法律责任. 学位论文题目: 作者签名:日期: 年月日大连理工大学硕士学位论文 - I - 摘要随着以微博为代表的社交网络的兴起,在带给人们前所未有的资讯体验的同时,社 交网络中层出不穷、愈演愈烈的谣言信息也成为日益突出的问题.谣言的自动检测研究 作为社交网络谣言研究、监控、应对和治理的前提,正逐渐受到关注. 本文以流行的中文社交平台――新浪微博为背景,以微博谣言为研究对象,在前人 将检测任务作为分类问题求解的框架下,重点关注于微博评论中的情感反馈,提出将评 论的总体情感正负倾向作为一项新的特征,用于谣言检测的分类任务中.总体说来,本 文工作主要体现在以下两个方面: (1)本文为实验所需从官方公示中爬取了高质量的谣言样本,同时爬取分布覆盖 比较广泛的普通微博数据,构建起微博语料集.由于本文从真实微博环境中爬取的数据 中大量存在的垃圾评论噪声会对实验结果产生较大干扰, 本文在数据的预处理阶段重点 对垃圾评论进行分类过滤.通过预处理,大大地减少了语料中的噪音,为实验能够有效 验证打下良好基础. (2)造谣者在炮制谣言时的刻意为之,加之谣言在公共参与讨论中必然招致的质 疑批驳,都使得谣言微博的评论总体上比普通微博更倾向于负面情感.因此本文提出了 一种利用微博的评论总体情感倾向的特征, 并利用基于词频特征的分类器对单条评论进 行情感倾向的有效识别, 从而得到总体情感倾向特征值. 最后在微博语料库上实验验证, 表明本文提出的新特征在现有特征基础上对分类结果有可观的提升. 关键词:文本挖掘;
机器学习;
情感计算;
谣言检测 基于评论情感的微博谣言检测研究 - II - The Research of Microblog Rumors Detection Based on Comments Sentiment Abstract With the rise of the social network represented by micro-blog, people benefit from unprecedented information experience. However, in the meantime, the intensified emergency and growth of rumors on social networks has become an increasingly prominent problem. As a premise of rumors researching, monitoring, responding, and governing, research of rumor auto-detection on social networks gains gradually attention. Taking Sina Weibo -- the biggest Chinese social service -- as the background, this thesis studies on micro-blog rumor auto-detection. Solving the detection task as a classification problem as previous related works did, and focusing on the emotional feedback in the micro-blog comments, the overall positive and negative sentiment polarity is proposed in this thesis as a new feature for the classification task C the rumor detection. Generally speaking, the work of this thesis is mainly reflected in the following two aspects: (1) This thesis built a Weibo micro-blog corpus for experiment, which is composed by the high quality rumor samples from the official publicity, and the common micro-blogs widely distributing cover the live Weibo data stream. Because the spam comments in the corpus will affect the experimental result, the thesis had to conduct a procedure to filter out these noise data during the data preprocessing stage. By this procedure, greatly reducing the noise in the corpus, a good foundation for the effective experiment is laid. (2) The rumormongers'