编辑: sunny爹 2014-06-07

从而容易理解模型结果 (如汽车颜色一般不能区分其性能的好坏, 但型号,气缸数,制造商等往往能决定其性能)What?使用统计的方法进行数据预处理过滤掉一些不相关或者相关性比较弱的字段保留并对相关属性进行排序相关性和维度和层次有关分析特征化,分析比较 属性相关性分析 步骤?数据收集分析概化Use information gain analysis (e.g., entropy or other measures) to identify highly relevant dimensions and levels.相关性分析Sort and select the most relevant dimensions and levels.面向属性的类描述(AOI)On selected dimension/level 相关性度量标准 相关性度量标准决定了如何对属性进行判断的标准方法information gain (ID3)gain ratio (C4.5)gini index?2 contingency table statisticsuncertainty coefficient Entropy 和Information Gain 集合S中类别Ci的记录个数是si 个i={1, …, m} 期望信息属性A的熵是 ---A有v个不同的取值信息增益 一个例子 任务使用分析特征化来了解研究生的一般特征属性名称 gender, major, birth_place, birth_date, phone#, and gpaGen(ai) = concept hierarchies on aiUi = attribute analytical thresholds for aiTi = attribute generalization thresholds for aiR = attribute relevance threshold 例子:分析特征化(续) 1. 数据收集target class: graduate studentcontrasting class: undergraduate student2.使用 Ui分析概化属性移除remove name and phone#属性概化 generalize major, birth_place, birth_date and gpaaccumulate counts候选关系: gender, major, birth_country, age_range and gpa 例子:分析特征化 (2) Candidate relation for Target class: Graduate students (?=120) Candidate relation for Contrasting class: Undergraduate students (?=130) 例子:分析特征化 (3) 3. 相关性分析计算期望信息计算每个属性的熵 Number of grad students in Science Number of undergrad students in Science 例子:分析特征化 (4) 得出每个属性的熵计算每个属性的Information GainInformation gain for all attributes 例子:分析特征化 (5) 4. Initial working relation (W0) derivationR = 0.1移除不相关或者弱相关的属性 =>

drop gender, birth_country移除比较类的关系5. 在W0进行AOI分析 Initial target class working relation W0: Graduate students 特征化和比较 什么是概念描述?数据概化和基于汇总的特征化分析特征化: 分析属性之间的关联性挖掘类比较:获取不同类之间的不同处在大型数据库中挖掘描述统计度量 挖掘类比较 比较:比较两个或者更多类.方法: 将相关的数据分成目标类和比较类. 将两个类别的数据概化到相同的层次.用相同层次的描述对元组进行比较.对于每个元组展现其描述和两个衡量标准:support - distribution within single classcomparison - distribution between classes将差异很大的元组特别显示出来相关性分析:发现最能体现类别之间差异的属性. 例子:分析性比较 Task使用区别规则来分析本科生和研究生DMQL query use Big_University_DBmine comparison as grad_vs_undergrad_students in relevance to name, gender, major, birth_place, birth_date, residence, phone#, gpafor graduate_students where status in graduate versus undergraduate_students where status in undergraduate analyze count%from student 例子:分析性比较 (2) 条件:attributes name, gender, major, birth_place, birth_date, residence, phone# and gpaGen(ai) = concept hierarchies on attributes aiUi = attribute analytical thresholds for attributes aiTi = attribute generalization thresholds for attributes aiR = attribute relevance threshold 例子:分析性比较(3) 1. 数据收集目标类和比较类2. 属性相关性分析remove attributes name, gender, major, phone#3. 同步概化controlled by user-specified dimension thresholdsprime target and contrasting class(es) relations/cuboids 例子:分析性比较 (4) Prime generalized relation for the target class: Graduate students Prime generalized relation for the contrasting class: Undergraduate students 比较项 例子:分析性比较 (5) 4. 在........

下载(注:源文件不在本站服务器,都将跳转到源网站下载)
备用下载
发帖评论
相关话题
发布一个新话题