May '10 15

New WD green drives and 4k (4096) block sizes

Rossco pointed out to me that the new 4k blocks on WD green drives can cause performance issues. I also found reference to it in Ubuntu 10.04 release notes ( search for Partition alignment ), and refer to this bug. Also some good info on it here.

Just to make it more complicated, this apply’s to every partition, this begs to ask what we do with LVM’s ,  there another complication, as we need to align LVM’s on 4k boundaries too.

Since i’m trying to setup my new 4×1.5TB Raid5 array, I want to make sure that I have it all done correctly the first time, and i see people recommend either 128k chunks.

I found a good raid calculator, and a post that recommends, the following, although its a bit out of date.

mdadm –create /dev/md2 –metadata=1.0 –level=5 –verbose –chunk=64 –raid-devices=4 /dev/sd[bcde]1
pvcreate –verbose –dataalignment 4096 /dev/md2
lvcreate -n LV_media -L +3T VG_storage /dev/md2
mke2fs -t ext4 -b 4096 -j -E stride=16,stripe-width=48 -v -L media -m 0.1 /dev/VG_storage/LV_media
mdadm –examine –scan >> /etc/mdadm/mdadm.conf

While i’d like to use BTRFS, currently the fsck doesnt do anything except tell you there is a problem.

“btrfsck is currently very limited; it only detects a limited number of problems, and it can’t fix anything. Btrfs focuses on handling problems when they are discovered while using the FS; generally, it should handle corruption relatively gracefully. However, if anything really crucial was overwritten and the FS can’t be mounted, there aren’t any tools to repair it.”

Update: I found that you cannot use md metadisk format 1.0 with Grub2, so I am running a /boot partition on RAID1 (no LVM) with 0.9 format. Make sure mdadm.conf is updated correctly before running the following. If you get errors about missing md devices, then add them manually to the /boot/grub/

grub-install –no-floppy –modules=raid /dev/sda
grub-install –no-floppy –modules=raid /dev/sdb

Update-2: Newer info I have found indicated a chunk size of 512k is now the default. There also seems to be merit in using a larger block size, such as 1Mb or 4Mb applied by the ext4 -T largefiles or largefiles4 switches.

Stride tells how many ext4 blocks (a 4096 byte) will fit into one chunk. So it is chunksizeKB/4=stride. In our example it is 128/4=32
The stripe-width tells ext4 how many strides will fit into the full raid array. That means how many blocks ext4 needs to write to write one chunk on every physical and active disk. So in a raid5 array, we need to multiply the Stride value by the number of active disks. The number of active disks is the number of disks in raid – 1. So it is 3 in our example here. The stripe-width then is 32*3=96.

For a 512k Chunk- stride=128,stripe-width=384

The mount option relatime looks like its a better alternative to noatime for optomising performance.

For LVM, it’s easiest if your RAID chunk size is a multiple of the LVM extent size. That way if you add more drives to the RAID, the LVM extents still divide evenly over the stripe.