编辑: 旋风 | 2017-12-26 |
c-s-a.org.cn 2016年第25卷第10 期102 系统建设 System Construction 容器虚拟化支撑平台监测系统① 顾泽宇 1,2 , 王焘1,宋云奎
1 , 杨红涛
3 , 张文博
1 1 (中国科学院 软件研究所, 北京 100190)
2 (中国科学院大学, 北京 100049)
3 (太原罗克佳华工业有限公司, 太原 030032) 摘要: 在容器虚拟化支撑平台中, 多种应用竞争共享资源造成监测系统难以通过设定固定阈值的方式进行异 常报警, 同时固定的监测周期需要在监测及时性与监测开销之间权衡. 针对这些问题, 本文提出了一种基于 PCA 的在线异常检测方法, 并根据异常程度动态调整监测周期以节省监测开销. 首先通过非侵入方式收集每个容器 的监测数据构成数据矩阵, 然后利用主成分分析的方法计算矩阵的主方向, 最后计算当前与上次主方向的余弦 相似度, 将其作为当前系统状态的异常程度. 对于异常程度超过设定的用户容忍度的容器, 发送告警信息, 同时 根据异常程度增大或者减小监测周期. 实验结果表明, 对于在分布式存储系统中注入的典型错误, 本方法的监测 准确性达到 80%以上, 告警延迟控制在
5 秒以内, 监测开销低于固定监测周期的方法. 关键词: Docker 容器;
虚拟化;
运行时监测;
主成分分析;
监测周期 Monitoring System for Container-Based Virtualization Platforms GU Ze-Yu1,2 , WANG Tao1 , SONG Yun-Kui1 , YANG Hong-Tao3 , ZHANG Wen-Bo1
1 (Institute of Software, Chinese Academy of Sciences, Beijing 100190, China)
2 (University of Chinese Academy of Sciences, Beijing 100049, China)
3 (Taiyuan RocKontrol Industrial Corporation Limited, Taiyuan 030032, China) Abstract: In container-based virtualization platforms, detecting anomalies with fixed thresholds is not practical, because many kinds of applications completely shared physical resources. Furthermore, monitoring systems need to consider the tradeoff between the timeliness and overhead by adjusting the monitoring period. To address these issues, this paper proposes an online anomaly detection method based on Principle Component Analysis (PCA), and then adjusts the monitoring period according to the anomaly significance. First, it non-intrusively collects the monitoring data of containers to get a matrix for each container. Then, it uses PCA to present the main direction of each matrix, and calculates the abnomal significance by calculating the cosine similarity between the current direction and the last one. If the significance is out of the defined tolerability threshold, the monitoring system sends an alert message to administrators, and automatically adjusts the monitoring period according to the significance. The experimental results demonstrate that our method can detect typical faults injected in HDFS with the accuracy of 80%, the delay of alerts iswithin
5 seconds, and the monitoring overhead ismuch lower than that with a fixed monitoring period. Key words: docker container;
virtualization;
runtime monitoring;
principal component analysis;
monitoring period
1 引言 容器本质是一种复用 Kernel, 为应用进程提供隔 离机制的虚拟化技术. 容器虚拟化不同于 Hypervisor 的虚拟化, 具有资源开销小, 隔离性强, 部署时效性好 的特点, 因此互联网企业倾向于使用大规模容器集群 应对突发的峰值访问. 例如京东运行着世界上最大规 模的 Docker 集群(6 万个), 所有在线应用在新机房都 将通过 Docker 技术进行发布, 支持同时在线的 Docker ① 基金项目:国家科技支撑计划(2013BAH45F01);