I have a server with mdadm raid0:
# mdadm --version mdadm - v3.1.4 - 31st August 2010 # uname -a Linux orkan 2.6.32-5-amd64 #1 SMP Sun Sep 23 10:07:46 UTC 2012 x86_64 GNU/Linux
One of the disk has failed:
# grep sdf /var/log/kern.log | head Jan 30 19:08:06 orkan kernel: [163492.873861] sd 2:0:9:0: [sdf] Unhandled error code Jan 30 19:08:06 orkan kernel: [163492.873869] sd 2:0:9:0: [sdf] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jan 30 19:08:06 orkan kernel: [163492.873874] sd 2:0:9:0: [sdf] Sense Key : Hardware Error [deferred]
Right now in dmesg I can see:
Jan 31 15:59:49 orkan kernel: [238587.307760] sd 2:0:9:0: rejecting I/O to offline device Jan 31 15:59:49 orkan kernel: [238587.307859] sd 2:0:9:0: rejecting I/O to offline device Jan 31 16:03:58 orkan kernel: [238836.627865] __ratelimit: 10 callbacks suppressed Jan 31 16:03:58 orkan kernel: [238836.627872] mdadm: sending ioctl 1261 to a partition! Jan 31 16:03:58 orkan kernel: [238836.627878] mdadm: sending ioctl 1261 to a partition! Jan 31 16:04:09 orkan kernel: [238847.215187] mdadm: sending ioctl 1261 to a partition! Jan 31 16:04:09 orkan kernel: [238847.215195] mdadm: sending ioctl 1261 to a partition!
But mdadm did not notice that the drive has failed:
# mdadm -D /dev/md0 /dev/md0: Version : 0.90 Creation Time : Thu Jan 13 15:19:05 2011 Raid Level : raid0 Array Size : 71682176 (68.36 GiB 73.40 GB) Raid Devices : 3 Total Devices : 3 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Thu Sep 22 14:37:24 2011 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Chunk Size : 64K UUID : 7e018643:d6173e01:17ab5d05:f75b494e Events : 0.9 Number Major Minor RaidDevice State 0 8 17 0 active sync /dev/sdb1 1 8 65 1 active sync /dev/sde1 2 8 81 2 active sync /dev/sdf1
Also, forcing a read from /dev/md0 does support the theory that /dev/sdf has failed and yet mdadm does not mark the drive as failed:
# dd if=/dev/md0 of=/root/md.data bs=512 skip=255 count=1 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.00367142 s, 139 kB/s # dd if=/dev/md0 of=/root/md.data bs=512 skip=256 count=1 dd: reading `/dev/md0': Input/output error 0+0 records in 0+0 records out 0 bytes (0 B) copied, 0.000359543 s, 0.0 kB/s # dd if=/dev/md0 of=/root/md.data bs=512 skip=383 count=1 dd: reading `/dev/md0': Input/output error 0+0 records in 0+0 records out 0 bytes (0 B) copied, 0.000422959 s, 0.0 kB/s # dd if=/dev/md0 of=/root/md.data bs=512 skip=384 count=1 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.000314845 s, 1.6 MB/s
However trying to access the /dev/sdf disk fails with:
# dd if=/dev/sdf of=/root/sdf.data bs=512 count=1 dd: opening `/dev/sdf': No such device or address
The data is not that important to me, I just want to understand why mdadm insists that the array is "State: clean"
Apart from the obvious - that only people who don't value their data run RAID-0 - mdadm doesn't alert you on anything unless you run the monitor daemon:
mdadm --monitor /dev/md0.
You can examine the problematic device explicitly using:
mdadm -E /dev/sdf.
Of course, detecting that a RAID-0 array has failed is pretty meaningless: it is lost, recover from backups.
The md(4) man page sheds some light on how the word "clean" is used (crucial bit italicized):
When changes are made to a RAID1, RAID4, RAID5, RAID6, or RAID10 array there is a possibility of inconsistency for short periods of time as each update requires at least two block to be written to different devices, and these writes probably won't happen at exactly the same time. Thus if a system with one of these arrays is shutdown in the middle of a write operation (e.g. due to power failure), the array may not be consistent.
To handle this situation, the md driver marks an array as "dirty" before writing any data to it, and marks it as "clean" when the array is being disabled, e.g. at shutdown. If the md driver finds an array to be dirty at startup, it proceeds to correct any possibly inconsistency. For RAID1, this involves copying the contents of the first drive onto all other drives. For RAID4, RAID5 and RAID6 this involves recalculating the parity for each stripe and making sure that the parity block has the correct data. For RAID10 it involves copying one of the replicas of each block onto all the others. This process, known as "resynchronising" or "resync" is performed in the background. The array can still be used, though possibly with reduced performance.
If a RAID4, RAID5 or RAID6 array is degraded (missing at least one drive, two for RAID6) when it is restarted after an unclean shutdown, it cannot recalculate parity, and so it is possible that data might be undetectably corrupted. The 2.4 md driver does not alert the operator to this condition. The 2.6 md driver will fail to start an array in this condition without manual intervention, though this behaviour can be overridden by a kernel parameter.
It's plausible that a disk in the RAID failed after the RAID was safely and normally disabled by the system (at, e.g., a shutdown). In other words, the disk failure happened with the RAID in a consistent, synchronized state. The RAID would then be flagged "clean", and, when it was next enabled and one of its disks failed, the flag would remain.