编辑: ACcyL 2019-08-29

1 We encourage the interested reader to visit [5] to view full color examples of all figures in this work. representation of a sequence allows the investigation of the patterns in sequences, giving the human eye a possibility to recognize hidden structures. Figure 2. The quad-tree representation of a sequence over the alphabet {A,C,G,T} at different levels of resolution. We can get a hint of the potential utility of the approach if, for example, we take the first 5,000 symbols of the mitochondrial DNA sequences of four familiar species and use them to create their own file icons. Figure

3 below illustrates this. Note that Pan troglodytes is the familiar Chimpanzee, and Loxodonta africana and Elephas maximus are the African and Indian Elephants, respectively. Even if we did not know these particular animals, we would have no problem recognizing that there are two pairs of highly related species being considered. Figure 3. The bitmap representation of the gene sequences of four animals. With respect to the non-genetic sequences, Joel Jeffrey noted, The CGR algorithm produces a CGR for any sequence of letters [4]. However, it is only defined for discrete sequences, and most time series are real valued. The results in Figure

3 encouraged us to try a similar technique on real valued time series data and investigate the utility of such a representation on the data mining task of anomaly detection. Since CGR involves treating a data input as an abstract string of symbols, a discretization method is necessary to transform continuous time series data into discrete domain. For this purpose, we used the Symbolic Aggregate approXimation (SAX) [8], which we review below. 2.2 Symbolic Time Series Representations While there are at least

200 techniques in the literature for converting real valued time series into discrete symbols, the SAX technique of Lin et. al. [8] is unique and ideally suited for data mining. SAX is the only symbolic representation that allows the lower bounding of the distances in the original space. The SAX representation is created by taking a real valued signal and dividing it into equal sized sections. The mean value of each section is then calculated. By substituting each section with its mean, a reduced dimensionality piecewise constant approximation of the data is obtained. This representation is then discretized in such a manner as to produce a word with approximately equi-probable symbols. Figure

4 shows a short time series being converted into the SAX word baabccbc. Figure 4. A real valu........

下载(注:源文件不在本站服务器,都将跳转到源网站下载)
备用下载
发帖评论
相关话题
发布一个新话题