December 31, 2010

Tags used the wrong way

In general I do not regret my switch from Mac OS X to FreeBSD/Linux. But when it comes to managing my pictures I have to say that I miss iPhoto and Aperture. And this not for the fancy features like face recognition or all the beautiful GUI stuff, but for a very basic one, and that is organizing pictures.

Low-level

On Mac I started with iPhoto and then switched to Aperture. Now I stay with F-Spot, I also looked at Digikam and Shotwell, but I do not see any real advantages over F-Spot right now.

All programs allow you to either copy your pictures to an internal database or just reference them in your own filesystem layout. While the library format of iPhoto and Aperture is rather complicated and somewhat proprietary, the open-source programs do it a lot simpler. F-Spot and Shotwell just create a fixed folder structure by date for year, month and day to organize the pictures. In F-Spot you have a low level folder view, while in Shotwell you never see that structure directly. Digikam allows you to manage the folders from within the program and change the structure.

When it comes to metadata all programs in recent versions allow you to choose between storing tags, captions, and other data inside the original pictures or in their internal database. When switching between different photo management software storing the metadata inside the pictures is the only way to migrate. The internal formats are not compatible between programs.

Tags are not enough

Where iPhoto and Aperture allow you to create arbitrary folder structures with virtual albums and assign each picture to any number of albums, the other programs clearly lack some basic functionality. Shotwell has, like iPhoto, an event based primary organization of pictures. But each picture can only be in exactly one event. And the folder hierarchy has a fixed structure. Beyond that all open-source programs rely on the concept of tags to do further grouping of pictures.

So what is the problem with tags? Well, they are not intended for building hierarchies, but all open-source programs use them for exactly that. The tag field in the metadata is just a set of strings with no relations to each other. For example, if you create the tags /Subject/Animals and /Testing/Animals it will just assign the tag Animals to the picture. Or maybe Subject, Animals, Testing since it is not clear if parent tags should also be assigned. Parent tags and child tags can both be assigned, whereas for example in filesystems there is a distinction between folders for building the hierarchy and files that contain the actual content. In the case of tags, the hierarchy can not be rebuild from the tags, so it is stored in the internal data of the software and lost when only taking the picture metadata. Moreover you might get a conflict for using the same tag twice in different hierarchies because a tag without hierarchy can only exist once.

One could try and use a special encoding to store the full hierarchy of a tag, but since there is no standard for this I doubt it would really help. Shotwell currently abandons hierarchical tags completely, while F-Spot and Digikam try to implement them with some issues.

Conclusion

While this is an old hat for filesystems with symbolic/hard links, where everybody agrees that multiple different folder structures are needed to navigate to the same file, this does not seem to be solved in the world of photography management software. While Apple has a solid solution in my eyes, it is of course (again) their very own way, and not exportable. But part of the blame also goes to the lack of a meta data standard for this. The open-source world tries to rely on an open protocol and uses tags. But since tags are not really intended for the job some drawbacks and issues result.

The discussion on hierarchical tags is on, may there be a solution soon!

October 28, 2010

About those bash profiles

Which profile?

It seems like a common fact to me that most people (including myself) get stuck when having to choose the right profile file in bash to add customizations. While there is actually a comprehensive manual that explains everything, I never found it clear enough, because I kept re-opening it reading the same sections over and over again. So once upon a time I took the courage trying to solve this quest. I started to hack and what came out is indeed not the simplest way of initializing a shell ...

My wish was actually simple. For each setting (like an environment variable or function definition) I wanted to be able to specify exactly for what type or types of shells it should be set. And additionally I wanted to set those settings either system-wide or per user.

Shell types

The type of a bash shell is determined by 3 boolean properties. A bash can be interactive, it can be a login shell and it can be a subshell of an existing one. The last property is not particularly interesting for now, I will use it later on when deciding whether a setting should be initialized only once or repeatedly for every subshell. This leaves us with 2 properties, that, if combined, give 4 different shell types:

  1. Interactive login shell
  2. Non-interactive login shell
  3. Interactive shell
  4. Non-interactive shell
To make it more clear, an example for each type:
  1. ssh localhost
  2. ssh localhost <command>
  3. bash
  4. bash -c <command>
A interactive login shell is what you get when you login with SSH on a remote host. From there you will mostly start subshells that are either interactive or not. If you give SSH a shell command as argument it will login, execute the command, and exit again, without giving you an interactive prompt. So that is a non-interactive login shell. Each type sources certain profile files upon initialization:
  1. /etc/profile, ~/.bash_profile
  2. (same)
  3. ~/.bashrc
  4. File defined in $BASH_ENV

Initializing the shell

I want to have a global system folder /usr/local/etc/bash/profiles/ and a per-user folder ~/.bash_profiles/ where I can put profile files and bash will source them automatically. Per user settings should overwrite global ones and within each profile I can choose what shell type it will apply for. To do this I first configure .bashrc and .bash_profile. I simply set the variables $BASH_LOGIN and $BASH_INTERACTIVE to either true or false depending on the shell type.

For .bashrc this is easy since it is only sourced by interactive shells that are not login shells:

BASH_INTERACTIVE=true
BASH_LOGIN=false

~/.bash/init
In .bash_profile I have to distinguish between interactive and non-interactive login shells. This can be done by looking at $-.
BASH_LOGIN=true

if [[ $- = *i* ]]; then
  BASH_INTERACTIVE=true
else
  BASH_INTERACTIVE=false
fi

. ~/.bash/init

From both files I call a global initialization file that will look for profile files and source them. Inside each profile I can then choose if I want to apply the settings depending on $BASH_LOGIN and $BASH_INTERACTIVE. The common init looks like this:

for profile in ~/.bash/profiles/* ~/.bash_profiles/*; do
  if [ -f $profile ]; then
    if [[ "$BASH_PROFILES" = "*$profile*" ]]; then
      BASH_SOURCED=true
    else
      BASH_SOURCED=false
    fi
    . $profile && \
      export BASH_PROFILES="${BASH_PROFILES}${profile} "
  fi
done
unset profile BASH_SOURCED

BASH_ENV="~/.bash/bash_env"
There are two more things here to explain. First, I record successfully sourced profiles in $BASH_PROFILES and set $BASH_SOURCED to true if the profile has already been sourced. So optionally I can cause profiles to not get sourced again in a sub-shell. This is handy for exported variables that you do not want to overwrite if the user changed the value and exported it from an initial shell. Second, I set $BASH_ENV globally in init, so it is set for every shell. This is for non-interactive non-login shells. Unfortunately bash does not source any files for this type directly, but instead the file specified by this variable. In my case bash_env looks simply like this:
BASH_LOGIN=false
BASH_INTERACTIVE=false

. ~/.bash/init

Logging out

Here the process is much more simple. Bash does only source one file .bash_logout and only if a login shell exits. Also, it does not make sense to setup anything on logout and the settings will not be passed on. So I simply check if it is a interactive shell and source profiles from a separate logout directory:

if [[ $- = *i* ]]; then
  BASH_INTERACTIVE=true
else
  BASH_INTERACTIVE=false
fi

for profile in ~/.bash/profiles/logout/* \
~/.bash_profiles/logout/*; do
  if [ -f $profile ]; then
    . $profile
  fi
done
unset profile

Skeleton

Finally I want to put these files into a global folder and only link from my user home to them. On my FreeBSD boxes I put them in /usr/local/etc/bash/ and create a set of skeleton symbolic links in /usr/local/etc/bash/skel/ that every user can copy to his home:

/usr/local/etc/bash/:

bash_env
bash_logout
bash_profile
bashrc
init
profiles
skel

/usr/local/etc/bash/skel/:

.bash -> /usr/local/etc/bash
.bash_logout -> .bash/bash_logout
.bash_profile -> .bash/bash_profile
.bashrc -> .bash/bashrc

Examples

tmux is a text window manager similar to screen. I like to see any open sessions upon login, but do not want get the list on every sub shell I start or when the shell is not interactive. /usr/local/etc/bash/profiles/tmux-list:

$BASH_INTERACTIVE || return 1
$BASH_LOGIN || return 1

if which -s tmux; then
  echo "Open sessions:"
  tmux list-sessions 2> /dev/null || true
fi
Environment variables usually only need to be set once at login when they are exported. /usr/local/etc/bash/profiles/default-login:
$BASH_LOGIN || return 1
$BASH_SOURCED && return 1

export PATH="$PATH:~/bin"
With functions and aliases it is different. Bash does not pass them to subshells, so they need to be sourced again and again. /usr/local/etc/bash/profiles/alias-functions:
$BASH_INTERACTIVE || return 1

function savealias() {
  echo "!!! This command is aliased for safety reasons." >&2
  echo "!!! If you are serious use `type -P $1`." >&2
  return 1
}

function ask() {
  echo -n "Enter 'y' to proceed: "
  read answer
  [ "$answer" = 'y' ] && return 0
  return 1
}

alias shutdown="savealias shutdown"
alias halt="savealias halt"
alias reboot="ask && /sbin/shutdown -r now"
alias init="savealias init"

Whining comment

Maybe I should write a patch for bash entitled "Proper profile initialization", I just can not decide if I should call it a feature or bug report ...

September 24, 2010

ZFS Wishlist

Create a new data set from existing data online

Often I start with one data set per pool and when the system evolves I create more and more data sets as needed. First I mount the new data set on a temporary path, move all data, then remove the old directory and remount the new data set to its designated path. The downside is that I can not really do this online, since a move does mostly not work for open files. Also the data must effectively be copied which can take some time. On a first thought it would be nice if there was one command that directly converts an existing folder into a new data set. On second thought, why does ZFS actually need to copy anything? Can it not just relink all znodes to the new data set? But I guess it has also something to do with the concept of mounting. How can you change an open file from one mount point to another ...

Control over data allocation on vdevs

This is something that I always liked in LVM, but somehow also made it more complicated and maybe more error prone compared to ZFS. In LVM you can exactly specify which physical extents are allocated to what logical volume. While in ZFS it is possible for example to replace single disks through the RAID algorithm or add vdevs to a pool, it automatically allocates extents to data sets and this can not be changed manually. And why should you have to make a new pool just to be able to control which data lies on what vdev? Having more that one pool always costs you some flexibility. Moving or splitting up data between pools always causes problems. A related topic is dynamically changing a RAID. One not so far reached example is to extend a mirror with additional disks to a RAIDZ for more capacity. Or to shrink a pool. Another interesting idea is the use of different tiers within a pool. There could for example be one vdev with large but slow SATA disks, and a second with small but fast SSDs. The database should lie on the fast tier, while the software repository mirror is better suited for the large capacity tier.

Integration of additional layers

One functionality that has always been present is the possibility to create a block device instead of a data set. It is basically a data set without the ZFS filesystem layer. The volume can then be formatted with an arbitrary filestem type. That is what is mostly done for swap. But no need to stop here. There are two examples where work is already underway. One is the integration of a database layer, so the database server directly uses ZFS. After all, is it really ideal to create a file in the filesystem (which also is a simple form of a database) and then in this file create a MySQL database, actually making a database within a database? The other example is network. In Lustre 3.0 it should be possible to use ZFS as backend. So speaking more clearly, the data you put on the network mount is distributed with Lustre to multiple servers that store it directly to their ZFS pool.

Update data on properties change

Like a defragmentation or RAID rebuild that runs in the background it would be nice to have the option to apply property changes to existing data. I am thinking of the case where a pool runs out of space. Then you enable compression for a data set, let it run in the background and with time some space gets free again.

June 29, 2010

VIM's Hidden Newline

At some point in your daily life with the ubiquitous development environment VIM, you might face an advanced data structure called text file, an integral part of an enterprise database system called filesystem. And as you start to play with newlines you might wonder why the following is:

$ vim vim-file
$ echo -n "echo-line" > echo-file
$ cat echo-file vim-file
echo-linevim-line
$ cat vim-file echo-file
vim-line
echo-line$
echo produces no newline as expected, but VIM does. Although the file just contains the content
vim-line
~
with no explicit newline, cat still prints one. And indeed there is one as the hex dump shows:
$ xxd vim 
0000000: 7669 6d0a                                vim.
The final dot after the characters vim shows it. In my case it was not that obvious what files originated from VIM and which did not. Unfortunately you do not see a difference between the two files in less, and VIM will only give a hint when the final newline is missing and automatically correct in when writing the file. But in the main display it is hidden.

To avoid the final newline, VIM knows the binary and eol options that can be used together to prevent VIM from writing the final newline. But be advised that in general you are safer when using it in Unix text files.

Maybe one day we might get past these teething troubles and start spending our hacking time on some user-friendly systems. But since the computer revolution has not happened yet I guess we just have to wait a little longer.

P.S. It is a shame I could not make any link to ZFS this time. So be it! For this post ...

June 24, 2010

A Storage Nightmare

Recently, I faced the following storage setup for a virtual server, and instantly I got a headache from it:

  1. An enterprise storage system that abstracts almost everything presents a simple SCSI device with a virtual LUN on the SAN.
  2. VMWare binds the LUN and formats it with VMFS, its proprietary cluster filesystem.
  3. Inside the VMFS one VMDK file is created that uses all the space.
  4. The VMDK file is attached as virtual disk to the guest system.
  5. Inside Linux, the disk is partitioned into one primary partition, which is formatted as an LVM physical volume. It belongs to one volume group, from which multiple logical volumes are created.
  6. The logical volumes are formatted with ext3.
To summarize this setup: The data from the application goes over 2 filesystem layers (ext3, VMFS) and 3 virtualization layers (LVM, VMWare, storage system) until it finally reaches the disk.

As if this wasn't enough to cause a headache, I further discovered some special setups:

  • Multiple LUNs are aggregated to one device in VMWare to work around the maximum size limit of the storage system.
  • The Linux LVM layer stripes a logical volume over multiple LUNs, which are directly passed through by VMWare from the SAN. Apparently the intention was to use LUNs from different RAID groups in the storage system for performance increase.

Consider a simple question like "This filesystem is too slow, which volume in the storage system do we have to move to a faster tier?". To answer it you might get yourself into quite some typing: Look up the device of the filesystem's mount point, map the logical volume to the right volume group, look up the physical volume for it, remember its LUN, in VMWare map the LUN to the right virtual disk, identify the VMFS where the disk's VMDK file lies, and finally look up the LUN for the VMFS. Oh my!

The different layers in this setup surely have their reasons. The storage system hides the fact that you actually have a lot of small disks instead of a huge pool. It gives you reliability and flexibility when migrating and expanding your data volumes. Without the VMWare layer you loose snapshot functionality. But then again, snapshots are also something you find in the LVM layer. As you do with logical volume management. It is somehow present in both, the storage system and LVM. This redundancy between layers and the missing integration between them leads to really bad understanding and probably also bad performance, but I would not want to have to debug the latter.

Another fundamental problem in this respect is the block layer. There are such things like adding a SCSI device on a running system. But all dynamic functions that go beyond this are implemented in layers on top of that. Wouldn't it be nice to e.g. implement dynamic resizing in the block layer? You just resize the volume in the SAN and instantly the system sees more space and adds it to the filesystem. No manual work is needed like adding a new disk in VMWare or extending the volume group with a new physical volume and growing the logical volume.

ZFS got the integration of layers right. Inside a pool you do not have a static partitioning that allocates blocks to a filesystem in the first place. Resizing volumes is merely a matter of setting quotas and reservations that are totally dynamic. But besides working in the direction of applications and integrating network filesystems like Lustre and databases into ZFS, it is also interesting to go into the other direction of integrating and enhancing the block layer. One way would be to integrate it better with existing SCSI and Fibrechannel protocols to better cooperate with existing SAN solutions. Yet a more interesting way would be to eliminate the classic SAN and let ZFS do all the work. What ZFS currently lacks is fine grained control over how data is arranged on disks. E.g. the definition of different tiers with different speeds and moving filesystems dynamically between them. And then there are all the pretty things that come with storage networks: multipathing, replication, server based RAIDs, clustering, et cetera, et cetera.

May 12, 2010

Checking Extfs Online

For years extfs has been a reliable, stable and well performing filesystem. But there has always been a weak point it its design for operation on long-running servers with lots of data. It is what manifests as the mount count, or more precisely, the inability to check the filesystem online.

This is a real problem for servers with high uptime demands. The usual way is to reboot the machine (in case the filesystem is vital for system operation) or remount the filesystem from time to time, and when the mount count reaches its limit the filesystem is checked. But not only the reboots and remounts interrupt the servers operation and raise the service downtime. Also, the filesystem check must be completed before the filesystem can be mounted again. With terabytes of data a check can take multiple hours or even days to complete, depending on your disk layout. And if this catches you in the wrong moment, e.g. when the server crashed and you need it back online fast, this can be a real pain. The alternative is to disable this behavior by setting the mount count to 0. In this case no automatic checks are done, but which honest system administrator would claim to be able to regularly check all filesystems on all servers by hand.

With the recent growing use of ext4 as standard filesystem in Linux distributions I asked myself if this shortcoming still exists today. Luckily things have changed for the better, and not first with the introduction of ext4 but with the inclusion of the VFS lock patches for the 2.6 series of kernels. Ext4 still can not do it on its own, but with the help of LVM there is a way. What is a simple fsck_ufs -B for the BSD user, and where the ZFS user only smiles at you and asks "What, you only check once a month? I do it all the time, it is called checksums.", is not that simple for the Linux user. The trick is to put the extfs on top of an LVM and use snapshots. The extfs and LVM code in the kernel need to play together, so when you make a snapshot everything is properly locked and consistent. Then you can run fsck on the snapshot, while the original copy of the filesystem continues its operation. If fsck reports success you simply discard the snapshot again, and know everything is fine.

The ugly thing is that LVM snapshots are a bit fragile. When you create the snapshot you must specify the size. It is the amount of space that is reserved to write the changes that happen since you made the snapshot. You have to properly estimate it depending on how long you will need the snapshot and how much data will be modified meanwhile. This might not be an easy task, even for a sysadmin. If the snapshot runs out of space you risk to loose data.

So, it is possible but truly not a very solid solution. I have never seen a distribution that implements this by default, though this seems to me to be a very essential task. So, how do you manage to check all your ext filesystems on your enterprise Linux servers that have hundreds of days of uptime ...?

May 2, 2010

Zenoss

I've spent quite some time now working with Zenoss, and looking at the shiny Zenoss homepage with all it's praising catch phrases I felt like I had to do some critics here. Well, sadly I have mostly bad critics, which makes me think once more how poorly integrated monitoring today still is. I apologize for not pointing out the good things of Zenoss, but I'm sure you'll find plenty of that elsewhere on the net. The information here is based on Zenoss Enterprise version 2.4.

Appliance

Zenoss provides a stack installer that includes everything you need from OpenSSL to MySQL. This saves you from dependency problems and version mismatches, but has some odd consequences. Integrating Zenoss with the rest of your software can be a pain. It always wants to access or start the internal MySQL server, but you may already have your perfect MySQL cluster where you want to put the Zenoss event data. Also, you have to su to the zenoss user whenever you do some work on Zenoss, otherwise you don't have the right profile for all the executables and libraries. When debugging you have to take care to always test with the Zenoss versions of the programs, in some cases they do not exactly match the behavior of the versions included in your system.

The stack installer does its work, but lacks real integration into the system. Upgrades and De-installation clearly need a lot of manual work that could be avoided with a real package like an RPM. The fact that today Zenoss still doesn't put any effort in proper packaging brings up the question if Zenoss' design really is the fundamental problem of proper system integration, and if not, why are they persisting that much on this strategy?

Distribution

Zenoss supports distributed collectors, so you can split the load over multiple servers. But it involves a lot of network setup. It demands multiple open ports for connections in different directions. Encrypting the traffic has to be done on ones own with SSL. This is especially a problem in large and insecure networks, as it might often be the case in environments you need that type of monitoring. Unfortunately, computation load is not the only thing that is distributed in Zenoss. Data is split over 3 different databases. The configuration is stored in ZEO (a Zope Object Database), events go to a MySQL database and performance data is stored in RRD files on the filesystem. Only the first two are centralized and need to be accessible by every collector. RRD files are distributed (yes, again!) over all collectors, each one provides the files for the hosts it monitors. And of course if you want to see nice graphs in the web interface you need to access all collectors, more specifically you need a way to access the RRD files on all collectors from the server that serves the web interface.

This architecture not only makes Zenoss very hard to setup, but also requires a lot of work when you migrate something or are looking for errors.

Memory

Zenoss starts various different daemons for a collector. Each one runs in its own Python VM, so at startup it can easily consume 50-100MB of memory without doing any work yet. Additionally some processes tend to consume much more memory under load, 300-500MB for a single process is no exception. Finally, Zenoss seems to be a not so bad friend of memory leaks. I often caught processes using up to 2GB memory and after a restart the were happy with a fourth of that memory before leaking again after another week of duty. Be prepared that restarting the Zenoss stack can take quite some time and result in some alert events or gaps in the graphs. The choice for a dynamic and interpreted language really shows some of its downside in this aspect.

Extensibility

This is one of Zenoss' strengths. Unfortunately the ways in which you can extend it are diverse and can lead to a small chaos. Zenpacks give you so much freedom you can pack any file or database object into them. But there is absolutely no control whether the installed data still matches what was in the zenpack. Also, it gets very difficult to track back what things were installed by which package. Installing, upgrading and versioning is completely up to you. And of course you need to install the zenpacks on each distributed collector individually, or sync them in the web interface which copies files over SSH and restarts everything, but you still have to do it for each collector.

There are a lot of methods to gather data in Zenoss, it even has its own syslog daemon you can stream to. But the integration of various methods lacks the care for details in the implementation. SSH commands very often report as an error, even when the check did not really fail but just the command timed out. This happens because there is no clean distinction between a failed state and an unknown state, like Nagios has it. Also, it occured to me that hundreds of OIDs reported as error just because there was a problem reading data from an SNMP agent.

Behind the scenes

Rather than having to define every item you want to monitor, Zenoss brings a lot of mechanisms that automatically gather data, and it finds out things all by itself. The consequences can be bad performance until you notice that periodic port scans are not such a good idea in a network where the majority of ports are silently blocked by default. The concept of modeling devices, so that you don't have to gather certain data every time you do a check, is in some cases overused. E.g. the size of a filesystem is modeled, but it's usage is not. So if you grow a filesystem you can get an error of e.g. 110% full, or -10% free, until some hours later the device is remodeled and the filesystem size is adapted to the grown value.

Security

A monitoring server can easily become a concern of security. It is a central device that can access data from all other devices. Whereas for certain monitoring methods the risk might be in acceptable range, there are others like SSH, where the shell access makes a lot of room for possible exploits and is difficult to restrict. The most important thing here is to handle stored passwords and keys in a secure way. But in Zenoss I found various ways to retrieve passwords as non root and also as non Zenoss admin. One way is to execute predefined commands on devices which gives you e.g. the clear text SNMP password in the output log. Another way is to browse the Zope management interface behind Zenoss, that gives you a low level access to the data. A lot of this is also an issue of access management.

As already mentioned, other weaknesses include Zenoss' need for the root SSH password (although this seems to have changed in new versions). It's needed to automatically update software and distribute configuration to all collectors. Another problem is the operation in untrusted networks. Most daemons make unencrypted connections, leaving it up to you to wrap everything with SSL tunnels.

April 11, 2010

The Why of Smalltalk

I began to learn Smalltalk in an academic way on university. So I first valued its technical qualities like easy syntax, homogenous design, dynamic behavior, etc. The first questions I faced were when somebody, who didn't know Smalltalk yet, asked me why it is so good and what's so special about it. Of course I couldn't resist and had to praise Smalltalk to the skies. It was not before such discussions, that I was also asked why Smalltalk isn't really popular if it is that good, and I started to be interested in the history of Smalltalk and where it's real applications are. Today, there are tons of new languages, most of them stating that they were among others influenced by Smalltalk, so the usual versus debates keep coming up.

Over at stackoverflow they posed some of these very essential questions and collected a lot of interesting answers:

  1. What is so special about Smalltalk?
    • The Smalltalk way isn't to crash out on unexpected behaviour - it's to adapt.
    • Today, Smalltalk isn't particularly important on its own - there are some people still using it to write stuff, but it's definitely not mainstream. Learning it will give you some insight in how and why OOP evolved, however.
  2. Where do you use Smalltalk?
    • Smalltalk's strengths play into it's weaknesses. It's not the language (dynamic vs static typing will always be a discussion point) as much as the development/deployment environment.
    • ... Smalltalk allowed me to develop and debug this system as people used it. I did not need to restart the system to install new features and bug fixes. If a user process crashed, the debugger would open on my monitor and I could debug it live. The user's terminal would restart automatically.
  3. Why isn’t Smalltalk popular?
    • When Smalltalk was introduced, it was too far ahead of its time in terms of what kind of hardware it really needed.
    • ... IBM took a look at it and figured two things. First they didn't want to enter a marketing war with Sun that was clearly planning to spend a fortune on the Java brand. ... Anyhow, I don't mind that Smalltalk isn't popular. That makes it a secret weapon for me and I am really encouraged to see all the new development projects. Smalltalk is growing and advancing again and this is good because a lot of the best ideas in software (XP, unit testing, refactoring editors, coding assistants) all were developed in Smalltalk first and then filtered out to the rest of the world (generally in diluted forms).
    • Most of the Smalltalkers that I've run into on the net try to say how Smalltalk is basically the best thing since sliced bread, and how awful every other language is.
    • Any language where the expression 2 + 3 * 4 has the value 20 isn't going to be a commercial success.
    • There was freedom. Smalltalk was about "take it all or leave it all". Java is a language, that can be used in different environments.
    • Lack of Free implementations. At the time, it hadn't been demonstrated that a language needed a solid community-driven implementation to pick up mindshare; people still thought that selling software was a good way to make money.
  4. Why use Ruby instead of Smalltalk?
    • I think your question is somewhat missing the point. You shouldn't choose, you should learn them both!
    • You answered the question in your first line: "Ruby is becoming popular".
    • Over a year ago IBM dumped the language (their term is sunset). Also, take a look at the History. ParkPlace and Digitalk where the first major commercial players in the Smalltalk arena, they merged and then went out of business.

March 11, 2010

When ZFS runs out of space

I'm one of the hopefully not so few people anymore who proudly run ZFS on their servers. I even discovered the value of regular snapshots I'm doing with freebsd-snapshot. A good thing to save you from some of your accidental rm commands and evil destructive updates.

But there are also some issues to the whole snapshot story. Once you run out of space on a pool things can get a bit more complicated with all that snapshots around. There are two defining points to understand in such a situation:

  • Data in a pool can only be freed if it's not referenced any more by any snapshot, clone or data set
  • Snapshots are read-only
Thus, cleaning up space by removing files only helps if they are not referenced by a snapshot. But that's likely not the case if you are making multiple snapshots per day. And removing parts of a snapshot is not possible, you can only destroy the snapshot as a whole. So this comes down to painfully searching snapshots and files you don't need anymore and delete them manually. Alternatively you could convert snapshots into clones, which are again writable, and delete single files from them. But I don't think this is a very clean way, and it would surely break you snapshot cycle, leaving you with a bunch of clones you would have to cleanup later.

So, isn't there any better solution? Well, I currently don't see any with ZFS, maybe you know one ... But I can imagine something that would help a lot in such situations. Writable snapshots, as BTRFS has it on its feature list. And on top of that, how cool would it be to have some snapshot-aware file utilities. I can already imagine the line on my command prompt: rm --snapshot all ~/firefox*.core *flip* and there is the free space again ...

March 7, 2010

While installing security/nmap

Recently, when I did a portinstall nmap, the build process surprised me with some pleasing ASCII art:

            .       .                                                                           
            \`-"'"-'/                                                                           
             } 6 6 {                                                                            
            ==. Y ,==                                                                           
              /^^^\  .                                                                          
             /     \  )  Ncat: A modern interpretation of classic Netcat                        
            (  )-(  )/                                                                          
            -""---""---   /                                                                     
           /   Ncat    \_/                                                                      
          (     ____                                                                            
           \_.=|____E                                                                           
Configuration complete.                                                                         
   (  )   /\   _                 (                                                              
    \ |  (  \ ( \.(               )                      _____                                  
  \  \ \  `  `   ) \             (  ___                 / _   \                                 
 (_`    \+   . x  ( .\            \/   \____-----------/ (o)   \_                               
- .-               \+  ;          (  O                           \____                          
(__                +- .( -'.- <.   \_____________  `              \  /                          
(_____            ._._: <_ - <- _- _  VVVVVVV VV V\                \/                           
  .    /./.+-  . .- /  +--  - .    (--_AAAAAAA__A_/                |                            
  (__ ' /x  / x _/ (                \______________//_              \_______                    
 , x / ( '  . / .  /                                  \___'          \     /                    
    /  /  _/ /    +                                       |           \   /                     
   '  (__/                                               /              \/                      
                                                       /                  \                     
             NMAP IS A POWERFUL TOOL -- USE CAREFULLY AND RESPONSIBLY                           
It surely makes all that compiling more fun.

March 3, 2010

Rails versus Seaside

To start this blog, I'll simply make some propaganda for Smalltalk and Seaside by referencing an older but still very good article from the On Smalltalk blog. Are you still writing HTML? Oh, you are using embedded ruby code inside your HTML. So you are still doing it the old way then ...

The abstraction and consistency of your code that you get with Seaside and Smalltalk are still the very best I've seen so far. Simply turn your HTML code into Smalltalk, keeping it in the same environment as the remaining parts of your application and profit from all the other strengths Smalltalk offers you.

To be fair, nobody is perfect. Ever looked at how Seaside handles CSS?

style
  ^ '
  #navigation {
    padding: 5px;
  }

  #body {
    margin: 5px;
    padding: 5px;
  }'
Well, surely that's some kind of abstraction, transforming CSS code into a Smalltalk string literal. At least you can now use string composition and maybe regex replacing ... ;-)