For years extfs has been a reliable, stable and well performing filesystem. But there has always been a weak point it its design for operation on long-running servers with lots of data. It is what manifests as the mount count, or more precisely, the inability to check the filesystem online.
This is a real problem for servers with high uptime demands. The usual way is to reboot the machine (in case the filesystem is vital for system operation) or remount the filesystem from time to time, and when the mount count reaches its limit the filesystem is checked. But not only the reboots and remounts interrupt the servers operation and raise the service downtime. Also, the filesystem check must be completed before the filesystem can be mounted again. With terabytes of data a check can take multiple hours or even days to complete, depending on your disk layout. And if this catches you in the wrong moment, e.g. when the server crashed and you need it back online fast, this can be a real pain. The alternative is to disable this behavior by setting the mount count to 0. In this case no automatic checks are done, but which honest system administrator would claim to be able to regularly check all filesystems on all servers by hand.
With the recent growing use of ext4 as standard filesystem in Linux distributions I asked myself if this shortcoming still exists today. Luckily things have changed for the better, and not first with the introduction of ext4 but with the inclusion of the VFS lock patches for the 2.6 series of kernels. Ext4 still can not do it on its own, but with the help of LVM there is a way. What is a simple fsck_ufs -B for the BSD user, and where the ZFS user only smiles at you and asks "What, you only check once a month? I do it all the time, it is called checksums.", is not that simple for the Linux user. The trick is to put the extfs on top of an LVM and use snapshots. The extfs and LVM code in the kernel need to play together, so when you make a snapshot everything is properly locked and consistent. Then you can run fsck on the snapshot, while the original copy of the filesystem continues its operation. If fsck reports success you simply discard the snapshot again, and know everything is fine.
The ugly thing is that LVM snapshots are a bit fragile. When you create the snapshot you must specify the size. It is the amount of space that is reserved to write the changes that happen since you made the snapshot. You have to properly estimate it depending on how long you will need the snapshot and how much data will be modified meanwhile. This might not be an easy task, even for a sysadmin. If the snapshot runs out of space you risk to loose data.
So, it is possible but truly not a very solid solution. I have never seen a distribution that implements this by default, though this seems to me to be a very essential task. So, how do you manage to check all your ext filesystems on your enterprise Linux servers that have hundreds of days of uptime ...?
No comments:
Post a Comment