Anselm's Blog: 06/2010

At some point in your daily life with the ubiquitous development environment VIM, you might face an advanced data structure called text file, an integral part of an enterprise database system called filesystem. And as you start to play with newlines you might wonder why the following is:

$ vim vim-file
$ echo -n "echo-line" > echo-file
$ cat echo-file vim-file
echo-linevim-line
$ cat vim-file echo-file
vim-line
echo-line$

echo produces no newline as expected, but VIM does. Although the file just contains the content

vim-line
~

with no explicit newline, cat still prints one. And indeed there is one as the hex dump shows:

$ xxd vim 
0000000: 7669 6d0a                                vim.

The final dot after the characters vim shows it. In my case it was not that obvious what files originated from VIM and which did not. Unfortunately you do not see a difference between the two files in less, and VIM will only give a hint when the final newline is missing and automatically correct in when writing the file. But in the main display it is hidden.

To avoid the final newline, VIM knows the binary and eol options that can be used together to prevent VIM from writing the final newline. But be advised that in general you are safer when using it in Unix text files.

Maybe one day we might get past these teething troubles and start spending our hacking time on some user-friendly systems. But since the computer revolution has not happened yet I guess we just have to wait a little longer.

P.S. It is a shame I could not make any link to ZFS this time. So be it! For this post ...

Recently, I faced the following storage setup for a virtual server, and instantly I got a headache from it:

An enterprise storage system that abstracts almost everything presents a simple SCSI device with a virtual LUN on the SAN.
VMWare binds the LUN and formats it with VMFS, its proprietary cluster filesystem.
Inside the VMFS one VMDK file is created that uses all the space.
The VMDK file is attached as virtual disk to the guest system.
Inside Linux, the disk is partitioned into one primary partition, which is formatted as an LVM physical volume. It belongs to one volume group, from which multiple logical volumes are created.
The logical volumes are formatted with ext3.

To summarize this setup: The data from the application goes over 2 filesystem layers (ext3, VMFS) and 3 virtualization layers (LVM, VMWare, storage system) until it finally reaches the disk.

As if this wasn't enough to cause a headache, I further discovered some special setups:

Multiple LUNs are aggregated to one device in VMWare to work around the maximum size limit of the storage system.
The Linux LVM layer stripes a logical volume over multiple LUNs, which are directly passed through by VMWare from the SAN. Apparently the intention was to use LUNs from different RAID groups in the storage system for performance increase.

Consider a simple question like "This filesystem is too slow, which volume in the storage system do we have to move to a faster tier?". To answer it you might get yourself into quite some typing: Look up the device of the filesystem's mount point, map the logical volume to the right volume group, look up the physical volume for it, remember its LUN, in VMWare map the LUN to the right virtual disk, identify the VMFS where the disk's VMDK file lies, and finally look up the LUN for the VMFS. Oh my!

The different layers in this setup surely have their reasons. The storage system hides the fact that you actually have a lot of small disks instead of a huge pool. It gives you reliability and flexibility when migrating and expanding your data volumes. Without the VMWare layer you loose snapshot functionality. But then again, snapshots are also something you find in the LVM layer. As you do with logical volume management. It is somehow present in both, the storage system and LVM. This redundancy between layers and the missing integration between them leads to really bad understanding and probably also bad performance, but I would not want to have to debug the latter.

Another fundamental problem in this respect is the block layer. There are such things like adding a SCSI device on a running system. But all dynamic functions that go beyond this are implemented in layers on top of that. Wouldn't it be nice to e.g. implement dynamic resizing in the block layer? You just resize the volume in the SAN and instantly the system sees more space and adds it to the filesystem. No manual work is needed like adding a new disk in VMWare or extending the volume group with a new physical volume and growing the logical volume.

ZFS got the integration of layers right. Inside a pool you do not have a static partitioning that allocates blocks to a filesystem in the first place. Resizing volumes is merely a matter of setting quotas and reservations that are totally dynamic. But besides working in the direction of applications and integrating network filesystems like Lustre and databases into ZFS, it is also interesting to go into the other direction of integrating and enhancing the block layer. One way would be to integrate it better with existing SCSI and Fibrechannel protocols to better cooperate with existing SAN solutions. Yet a more interesting way would be to eliminate the classic SAN and let ZFS do all the work. What ZFS currently lacks is fine grained control over how data is arranged on disks. E.g. the definition of different tiers with different speeds and moving filesystems dynamically between them. And then there are all the pretty things that come with storage networks: multipathing, replication, server based RAIDs, clustering, et cetera, et cetera.

Anselm's Blog

June 29, 2010

VIM's Hidden Newline

June 24, 2010

A Storage Nightmare