编辑: 过于眷恋 2019-07-04
第三届 泰迪杯 全国大学生数据挖掘竞赛 优秀作品作品名称:基于电商平台家电设备的消费者评论数据挖掘分析 荣获奖项:二等奖 作品单位:暨南大学 作品成员:邓伟雄 童雪玉 黄国南 指导教师:张元标 泰迪杯大学生数据挖掘竞赛论文报告 www.

tipdm.org 第1页基于电商平台家电设备的消费者需求及产品数据挖掘 摘要本文通过对电商评论数据的处理和分析,构建了垃圾评论识别模型、基于 RAE 词向量自编码的 SVM 文本情感极性分析模型和产品优劣势分析模型进行文本挖掘,最后基于对淘宝指数和百度指数 的提取与分析,构建了用户购买行为的挖掘模型. 针对垃圾评论的识别问题,将垃圾评论归为无关信息、水军评论和系统默认好评三种,并从根 据不同的分类特征制定规则予以剔除. 针对评论情感分析,尝试使用半监督的深层学习 RAE 模型,采用 word2vec 工具对

8 万多条评论 进行训练得到词向量,再对评论进行情感极性分类,然后从情感的积极方提炼出产品的优势,从情 感的消极方提取产品的劣势,但由于其对于不同软件的接口封装较难转移,参数繁多且较难设定和 偏置函数无法获得等原因,进而改用基于 RAE 的递归自编码模型的有监督的 SVM 模型,进行情感极 性识别,通过手工标示

400 条评论的情感极性,进而训练 SVM 模型,使其对剩下的评论进行情感极 性分类,结果显示情感分类的正确率达 85%. 针对产品优劣势分析,由于消极情感只占总评价数的 0.28%,样本过小,因此从消极的情感方 提取产品劣势并不可行, 转而使用用户关注度分析的方法对产品属性下的用户满意度进行统计分析, 通过词频统计提炼出产品的优劣势所在. 针对用户购买行为的挖掘,先确定一组搜索关键字,然后爬取对应关键字下的日搜索量,搜索 人群年龄性别及消费能力等分布,进行确定产品的主要消费人群及其消费关注点 关键词: 词向量 递归自编码 SVM 模型 情感极性分析 泰迪杯大学生数据挖掘竞赛论文报告 www.tipdm.org 第2页The data mining based on the electric business platform about consumers'

demands and products characters Abstract: To deeply mine the comments of ecomercial products, this paper aims to build the model of invalid comments recognition, the SVM text emotional polarity analysis model based on RAE auto coding and then distinguishes the advantages and disadvantages via texts analysis. At last, it grabs and analyzes the Taobao index and Baidu index, building the purchase behavior mining model. In the invalid comments recognition model, it first labels three kinds of invalid information, like irrelevant comments, posters comments and system comments. Then separate these information by their own characters. As for the emotional polarity analysis, this paper tried the semi-supervised deep learning RAE model at first, using toolbox word2vec to initial eighty thousands term vectors separated from our comment list. Then classified the comments based on these vectors with RAE, obtaining the advantages from the positive comments and the disadvantages from the negative. However, given the difficulty to transfer packages among different softwares, the numeric unknown parameters and offset function, it tries another supervised approaches SVM model based on RAE auto coding. By handmade labeling four hundreds comments with emotional polarity to train the SVM, then use the well-trained models to classify the rest comments, showing that it has an 85% accuracy. In the advantages and disadvantages analysis model, the negative comments just account for 0.28%, a small scale, making the plan to obtain negative information infeasible. Hence it'

下载(注:源文件不在本站服务器,都将跳转到源网站下载)
备用下载
发帖评论
相关话题
发布一个新话题