Once in a while my home server have disk problems. Thanks to Linux Software RAID, I have not lost data yet (but I was close this summer :). But once a disk is starting to behave funny, a practical problem present itself. How to get from the Linux device name (like /dev/sdd) to something that can be used to identify the disk when the computer is turned off? In my case I have SATA disks with a unique ID printed on the label. All I need is a way to figure out how to query the disk to get the ID out.
After fumbling a bit, I found that hdparm -I will report the disk serial number, which is printed on the disk label. The following (almost) one-liner can be used to look up the ID of all the failed disks:
for d in $(cat /proc/mdstat |grep '(F)'|tr ' ' "\n"|grep '(F)'|cut -d\[ -f1|sort -u); do printf "Failed disk $d: " hdparm -I /dev/$d |grep 'Serial Num' done
Putting it here to make sure I do not have to search for it the next time, and in case other find it useful.
At the moment I have two failing disk. :(
Failed disk sdd1: Serial Number: WD-WCASJ1860823 Failed disk sdd2: Serial Number: WD-WCASJ1860823 Failed disk sde2: Serial Number: WD-WCASJ1840589
The last time I had failing disks, I added the serial number on labels I printed and stuck on the short sides of each disk, to be able to figure out which disk to take out of the box without having to remove each disk to look at the physical vendor label. The vendor label is at the top of the disk, which is hidden when the disks are mounted inside my box.
I really wish the check_linux_raid Nagios plugin for checking Linux Software RAID in the nagios-plugins-standard debian package would look up this value automatically, as it would make the plugin a lot more useful when my disks fail. At the moment it only report a failure when there are no more spares left (it really should warn as soon as a disk is failing), and it do not tell me which disk(s) is failing when the RAID is running short on disks.