编辑: 捷安特680 2017-10-07
www.

dell.com/powersolutions Reprinted from Dell Power Solutions, October 2004. Copyright ?

2004 Dell Inc. All rights reserved. POWER SOLUTIONS

105 105 SYSTEMS MANAGEMENT Dell OpenManage Server Administrator has an excel- lent logging feature that stores the event messages and logs of Dell PowerEdge servers. The logs are catego- rized into four types: hardware log, alert log, power-on self-test (POST) log, and command log. ? Hardware log: The hardware log reports potential problems in a PowerEdge server'

s hardware com- ponents. This log, which is also referred to as the Embedded Server Management (ESM) log―or on some systems, the system event log (SEL)―comprises a set of embedded instructions that can send hard- ware status messages to Dell OpenManage systems management software. ? Alert log: The alert log is the list of all events gen- erated by the Server Administrator instrumentation service in response to sensor status changes such as temperature, voltage, and specified threshold values. The alert log also records other monitored parameters such as chassis intrusion, log capacity, and so forth. ? POST log: The POST log consists of a list of the POST codes and their corresponding descriptions. The POST operation tests various system components such as RAM, hard drives, and the keyboard before the operating system (OS) loads, and then provides details of these tests in the POST log. ? Command log: The command log provides the details of the commands executed from the Server Administrator graphical user interface (GUI) and from the Server Administrator command-line interface (CLI). Both the hardware log and the alert log provide details of system events, categorizing each event by its corresponding severity status: normal, warning, or critical. Each event description has an associated date and time, for efficient tracking. The Dell OpenManage Server Administrator logging feature is designed such that, when a heading is selected, the columns are sorted in ascending or descending order. Each log page has a status indicator at the top, which changes from a green check mark to a yellow triangle containing an exclamation point when the log file reaches

80 percent capacity. At this point, the administrator is advised to export the current logs to a different location on the hard drive and clear the logs present in the Server Administrator software. BY SENTHIL KUMARAN OR Using Log Messages and Alert Actions in Dell OpenManage Server Administrator System administrators can effectively monitor Dell? PowerEdge? servers by using error messages and alert actions provided by Dell OpenManage? Server Administra- tor software. This article analyzes several critical event logs and alert actions that are configurable in the Dell OpenManage Server Administrator software. SYSTEMS MANAGEMENT POWER SOLUTIONS October

2004 06

106 Responding to critical error messages Critical error messages appear in both the hardware and alert logs when system events occur. Error messages related to temperature, fan, voltage, and current sensors are reported as nonrecoverable failures when the specified system detects an error from which it cannot recover. Following are examples of critical event messages that can appear in the Dell OpenManage Server Administrator hardware and alert logs. If Dell OpenManage Server Administrator displays any of these critical error messages, the system administrator should take remedial action immediately and contact Dell for technical support if necessary. Thermal shutdown protection has been initiated. The hard- ware and alert logs display this message when a system is configured for thermal shutdown because of an error event. If a temperature sensor reading exceeds the error threshold for which the system is configured, the OS shuts down and the system powers off. This event may also be initiated when a fan enclosure is removed from a system for an extended period of time. Automatic System Recovery (ASR) action was performed. The alert log displays this message when an automatic system recov- ery action is performed because of a hung operating system. The action performed and the time the action occurred are provided. Memory device status is . The hardware and alert logs display this message when a memory device correction rate exceeds an acceptable value, a memory spare bank is activated, or a multibit error-correcting code (ECC) error occurs. The system continues to function normally except in the case of a multibit ECC error. AC power has been lost. The hardware and alert logs dis- play this message if an AC power cord loses power, because the resulting lack of redundancy requires this event to be classified as an error. The sensor location and chassis location information are provided. Log size is near or at capac- ity. The hardware and alert logs display this message when the size of a log is near or at full capacity. The message will indi- cate which log―either hardware, alert, POST, or command―is near capacity. Temperature sensor detected a failure value. The hardware and alert logs display this message, which indicates that a temperature sensor on the backplane board, system board, or drive carrier in the specified system has exceeded its failure threshold. The sensor location, chassis location, previous state, and temperature sensor value are provided. Fan sensor detected a failure value. The hardware and alert logs display this message if a fan sensor in the specified system has detected the failure of one or more fans in the server. The sensor location, chassis location, previous state, and fan sensor value are provided. Voltage sensor detected a failure value. The hardware and alert logs display this message if a voltage sensor in the specified system has exceeded its failure threshold. The sensor location, chassis loca- tion, previous state, and voltage sensor value are provided. Current sensor detected a failure value. The hardware and alert logs display this message if a current sensor on the power supply for the specified system has exceeded its failure threshold. The sensor location, chassis location, previous state, and current sensor value are provided. Chassis intrusion detected. The hardware and alert logs dis- play this message when a chassis intrusion sensor in the specified system detects that the system cover was opened while the system was operating. The sensor location, chassis location, previous state, and chassis intrusion state are provided. Power supply detected a failure. The hardware and alert logs display this message when a power supply is disconnected or fails. The sensor location, chassis location, previous state, and additional power supply status information are provided. Fan enclosure removed from system for an extended amount of time. The hardware and alert logs display this message when a fan enclosure has been removed from the specified system for a user-definable length of time. The sensor location and chassis location are provided. Configuring alerts using the Alert Actions feature Dell OpenManage Server Administrator also provides an Alert Actions feature that allows administrators to configure the type of alert they want to receive when the system has encountered any of the critical error messages previously discussed. Alert actions for Windows systems Alert actions for servers running Microsoft? Windows? operat- ing systems include beeping the speakers on the affected server, displaying an alert on the Server Administrator console, broad- casting a me........

下载(注:源文件不在本站服务器,都将跳转到源网站下载)
备用下载
发帖评论
相关话题
发布一个新话题