【PDF】unchen, Germany - 资源下载

编辑：

lqwzrs

2019-07-02

On Exploring Complex Relationships of Correlation Clusters Elke Achtert, Christian B¨ ohm, Hans-Peter Kriegel, Peer Kr¨ oger, Arthur Zimek Institute for Informatics, Ludwig-Maximilians-Universit¨ at M¨ unchen, Germany http://www.

dbs.ifi.lmu.de E-mail: {achtert,boehm,kriegel,kroegerp,zimek}@dbs.ifi.lmu.de Abstract In high dimensional data, clusters often only exist in ar- bitrarily oriented subspaces of the feature space. In addi- tion, these so-called correlation clusters may have complex relationships between each other. For example, a correla- tion cluster in a 1-D subspace (forming a line) may be en- closed within one or even several correlation clusters in 2- D superspaces (forming planes). In general, such relation- ships can be seen as a complex hierarchy that allows mul- tiple inclusions, i.e. clusters may be embedded in several super-clusters rather than only in one. Obviously, uncover- ing the hierarchical relationships between the detected cor- relation clusters is an important information gain. Since ex- isting approaches cannot detect such complex hierarchical relationships among correlation clusters, we propose the al- gorithm ERiC to tackle this problem and to visualize the result by means of a graph-based representation. In our ex- perimental evaluation, we show that ERiC ?nds more infor- mation than state-of-the-art correlation clustering methods and outperforms existing competitors in terms of ef?ciency. 1. Introduction In high-dimensional data, meaningful clusters are usu- ally based only on a subset of all dimensions. Subspace clustering (or projected clustering) is a well known ap- proach to ?nd λ-dimensional subspaces of a d-dimensional data space (λ <

d), where certain sets of points cluster well. Despite great efforts in developing subspace clus- tering methods, all existing approaches suffer from certain drawbacks. Each approach is based on certain heuristics because the optimal solution would require at least a time complexity exponential in the number of dimensions d. An even more challenging problem is to ?nd clusters in arbi- trarily oriented subspaces. Such clusters appear as sets of points located near a common hyperplane (of arbitrary di- mension λ) in a d-dimensional data space. Since these hy- perplanes correspond to linear dependencies among several attributes (and thus the corresponding attributes are cor- related), the concept of knowledge discovery in databases addressing this problem is known as correlation clustering [10]. Correlation clustering groups the data sets into sub- sets called correlation clusters such that the objects in the same correlation cluster are all associated to a common hy- perplane of arbitrary dimensionality. A correlation cluster associated to a λ-dimensional hyperplane is referred to as a λ-dimensional correlation cluster. The dimensionality of a hyperplane associated to a correlation cluster is called the correlation dimensionality. A good example of a successful application of correla- tion clustering are recommendation systems. In target mar- keting, it is important to ?nd homogeneous groups of users with similar ratings in subsets of the attributes. In addi- tion, it is interesting to ?nd groups of users with correlated af?nities. This knowledge can help companies to predict customer behavior and thus develop future marketing plans. A second application of correlation clustering is metabolic screening. The collected data usually contain the concentra- tions of certain metabolites in blood samples of thousands of patients. In such data sets, it is important to ?nd homoge- neous groups of patients with correlated metabolite concen- trations indicating a common metabolic disease. This is an important step towards understanding metabolic or genetic disorders and designing individual drugs. Another promi- nent application for correlation clustering is the analysis of gene expression data. Gene expression data contain the ex- pression levels of thousands of genes, indicating how active the genes are, according to a set of samples. A common task is to ?nd clusters of co-regulated genes, i.e. clusters of genes that share a common linear dependency within a set of their features. The ?rst approach that can detect correlation clusters is ORCLUS [6] that integrates PCA into k-means clustering. The algorithm 4C [10] integrates PCA into a density-based clustering algorithm. These approaches can be seen as ?at approaches in the following sense. A correlation cluster C1 with dimensionality λ1 may be embedded in (and therefore In Proc. 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007), Banff, Canada,

注：以上内容是本站开源项目的机器提供的预览内容，更完整和更好的阅读体验请直接免费下载资源后阅读

下载（注：源文件不在本站服务器，都将跳转到源网站下载）

备用下载

下一篇: 2011年度国家星火计划立项项目清单
上一篇: 征兵身高视力标准调整

PDF《unchen, Germany》