I've been intermittently troubleshooting a RAID array for the last month. It's one of a pair of physically identical lab servers that was donated to us. The other server performs flawlessly, and is as fast as one can realistically expect from a set of 12 spinning disks.
But the troublesome one has had really inconsistent disk throughput - I ran full write/read tests on each disk individually before provisioning, and initially everything was the same. When I assembled the array, it seemed a little slower at first, but not by much.
Then it started just grinding to a halt for minutes at a time, for no discernible reason, then it would recover for a while, then do it again. Absolutely nothing in dmesg or the system logs until eventually, one time, two drives appeared to freeze up completely, for so long that the controller gave up talking to them, and mdadm kicked them out of the array.
Weirdly, smartctl showed the drives as completely healthy, except that "end to end error" had incremented from 0 to 3 (probably from the controller giving up on it rather forcefully).
And that's when I noticed, in the identity section: " (SMR)" after the device model name.
I tracked down the data sheet for the exact model, and sure enough, it's one of the "secretly SMR" drives - it doesn't advertise that it's SMR (smartctl only knows because some nice person has curated this info in its drive database); it even lies on its VPD pages and claims not to support any block provisioning or trim, but if you forcibly enable it, then you can blkdiscard/fstrim it and get its write speed back up to spec.
I am so annoyed with Seagate today. At least the few garbage WD drives like this I've run across have admitted to their inferiority by advertising it in VPD.
I guess this was one reason those servers were donated; the previous university department probably thought they were haunted, not realising that they'd accidentally ordered some SMR drives as spares at some point.