Monitor disks behind Dell RAID controllers without OpenManage

Posted: August 9th, 2013 | Author: | Filed under: Linux, Storage | No Comments »

When one runs non-RHEL linux on modern Dell servers, disk monitoring is a bit of a challenge. Dell’s OpenManage is a fine tool, but it cannot be run on non-RHEL versions of linux. So, how can one monitor health of hard drives when OpenManage is not an option?

An actual approach depends on the storage controller in use on the server.

LSI Fusion-MPT controllers

LSI SAS1068E is this type of controller that I’ve seen in use in some 1u Dell servers.

mpt-status tool is designed to work with this type of LSI controllers:

mpt-status -i 2
ioc0 vol_id 2 type IM, 2 phy, 1862 GB, state OPTIMAL, flags ENABLED
ioc0 phy 3 scsi_id 8 ATA      Hitachi HUA72202 A25C, 1863 GB, state ONLINE, flags NONE
ioc0 phy 2 scsi_id 3 ATA      Hitachi HUA72202 A25C, 1863 GB, state ONLINE, flags NONE
mpt-status -s -i 2 
log_id 2 OPTIMAL
phys_id 3 ONLINE
phys_id 2 ONLINE

One can also use smartmontools. Smartctl will not work with /dev/sdX block devices in this type of environment but it will work fine with /dev/sgX device files. In the example above, HITACHI disks in the RAID 1 set are present as /dev/sg3 and /dev/sg4 SCSI devices. Smartctl will happily communicate via these devices:

smartctl -a /dev/sg3
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     Hitachi HUA722020ALA330
Serial Number:    JK11D1B8G4M2RZ
Firmware Version: JKAOA25C
User Capacity:    2,000,398,934,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Thu Aug  8 21:06:55 2013 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
..... 

Perc H700 and H800 controllers

Both Dell Perc H700 and H800 storage controllers are re-branded LSI controllers and are common in Dell’s 11th generation servers. (Information presented here also applies to Dell Perc H710P controller in their 12th generation servers.)

Let’s take a look at one of Dell servers with both controllers. This is a Dell PowerEdge R415 with 2x 146GB, and 12x 2TB hard drives inside the main server enclosure and connected to Perc H700 controller. There is also a Dell MD1200 storage array with 12x 4TB drives and SAS-attached via Perc H800 controller.

There are two virtual disks and thus two block devices on H700 and one virtual disk / block device on H800:

lsscsi -g
[0:0:32:0]   enclosu DP       BACKPLANE        1.10  -         /dev/sg0
[0:2:0:0]    disk    DELL     PERC H700        2.30  /dev/sda   /dev/sg1
[0:2:1:0]    disk    DELL     PERC H700        2.30  /dev/sdb   /dev/sg2
[1:2:0:0]    disk    DELL     PERC H800        2.10  /dev/sdc   /dev/sg4

Main tool to work with Perc H700 and H800 controllers is MegaCLI (part of MegaRAID software package) that can be downloaded from LSI. MegaRAID usually gets installed at /opt/MegaRAID. For convenience, one could set up a symbolic link:

ln -s /opt/MegaRAID/MegaCli/MegaCli64 /usr/sbin/megacli

The following command will display detailed information from both controllers in our server:

megacli -AdpAllinfo -aALL

H700 controller is controller “0”, while H800 is controller “1”. Thus, replacing “-aALL” with “-a0” will limit display to H700 controller information only.

To see event log from H700 controller, the following syntax will work:

megacli -adpeventlog -getevents -f h700-events.log -a0 -nolog

MegaCLI tool is very powerful. It allows full configuration and management of the controller, and its virtual disks. Unfortunately, the documentation is not easy to follow. Myself, I often use the document posted on Cisco’s support forum as a handy reference: “MegaCli Common Commands and Procedures”.

Smartmontools can also be used to check disks behind H700 and H800 controllers – the device type is megaraid.

In our example, the following command displays SMART information for drive 0 on H700 controller:

smartctl -a -d megaraid,0 /dev/sda

** Note that in our case /dev/sda can be replaced with /dev/sdb in the above command – identical information is displayed in both cases as both block devices are on the same controller.

Numbering of devices can be quite tricky when using smartctl with Perc controllers. In the case of H800 controller in our server, the first disk is device number 4:

smartctl -a -d megaraid,4 /dev/sdc

** Why sdc in the above command? Because this is the block device on H800 controller – see the output of “lsscsi -g” command at the beginning of this section.

What I suggest is to learn what disks and at what device numbers are present, with something such as this simple bash command:

for i in {0..60}; do 
echo "=========== $i ============" >> h800_disks.txt
smartctl -a -d megaraid,$i /dev/sdc | egrep "Device:|Serial" >> h800_disks.txt
echo "" >> h800_disks.txt
done


Leave a Reply