编辑: hys520855 2019-07-01

2 to

3 minutes for the users viewing the result pages. Only a few of users viewed the historic web pages ( or called cached pages ). The numbers of different queries, different users and different URLs follow Heaps'

law. 3.Several key trends in Chinese Web searching behaviors were examined, based on Tianwang user'

s logs over the last five years. The results show that the mean number of terms in a single query is significantly increasing;

the mean session lengths is decreasing, as well as the mean number of result pages viewed, and the mean duration of users viewing. Besides, there is little change in the mean number of Chinese characters appearing in a single query over the five years. The 摘要?4?correlativity between the frequency of a term appearing in query'

s log and in click'

s log is getting small. There are a quickly drift for searching topics of users. 4. The web queries submitted to search engines usually involve single or multiple topics (multitasking). The characteristics of multitasking Chinese Web searches based on the query logs of Tianwang System are investigated and analyzed. The results shows: more than one third of users often perform multitasking Web searching;

more than half of multitasking sessions include two topics with two to seven queries per session. The mean duration of multitasking sessions is twice that of regular sessions. Most multitasking searches in Tianwang system have three topics: computer/network, entertainment and education. Nearly one fourth of multitasking sessions include inexplicit information. 5. When people submit a Web query to a search engine, it is helpful for them to modify the query and find the information needed if the system returns a list of related Web queries. A new method to determine the related Web queries is presented. Some statistics of a candidate query for given a query are extracted from the log files. It includes the number of different users submitting the candidate query, the total numbers of submitting the candidate query and hitting the returned result, the numbers of common terms and hitting common URLs between the candidate query and the given query. These candidate queries are ranked based on support vector regression models learned from human labeled training data. Experimental result shows that the method has a higher predictive precision. Keywords: Search Engine;

User'

s Log;

Web Mining;

Multitasking Web Search;

Related Query 目录?5?目录摘要1Abstract.3 目录5第1章搜索引擎日志挖掘研究框架

8 1.1 引言.8 1.2 日志挖掘的研究内容

10 1.2.1 特定术语

10 1.2.2 挖掘的主要内容.10 1.3 数据集选择与数据预处理

12 1.3.1 数据集的选择

12 1.3.2 数据预处理

12 1.4 技术方法

13 1.4.1 统计分析方法

14 1.4.2 建模分析预测

14 1.4.3 序列模式发现

15 1.4.4 关联规则挖掘

15 1.4.5 聚类分析

16 1.5 不同地域用户查询行为比较

16 1.6 应用于搜索引擎系统的改进

18 1.6.1 提高结果排序的质量

18 1.6.2 Cache 的使用与替换策略.18 1.6.3 发现相近查询词.19 第2章中文搜索引擎用户日志分析

21 2.1 引言

21 2.2 数据准备

22 2.3 用户的查询与点击行为分析

23 2.3.1 用户的查询类型与数量.23 2.3.2 查询串中包含中文字符的情况

24 目录?6?2.3.3 查询串中含有的词项个数

26 2.3.4 结果页面的查看与时间间隔

26 2.3.5 用户到达时间的分布

26 2.3.6 用户点击 URL 与历史网页的查看

下载(注:源文件不在本站服务器,都将跳转到源网站下载)
备用下载
发帖评论
相关话题
发布一个新话题