As you might have noticed there was some issues with the server yesterday. The website was down most of the day, and some people had problems logging in via SSH. This was related to an issue with the fileserver. There has actually been some issues with performance also (freeze for a couple of minutes at 01.30 CET). I planning to make a scheduled stop to fix this performance issue at some point, but because of the problems yesterday I had to fix it right away. All services was down between 19.00 and 00.30 CET yesterday because of this maintanance. The maintenance went well, and everything should be up and running as usual now.
I did expect there might be some problems with the new server after the migration, but this problem showed up a bit later than I would have thought. The server had been running well for about three weeks before this happened, but I guess that’s how it is sometimes 🙂
The performance issues was related to a new feature in ZFS called deduplication. It’s been around in the enterprise space for quite a while, but it’s very new in ZFS. I had deduplication enabled for the VM datastore for a while when testing the server this summer, but disabled it after a couple of days. I did however not realize that the data that had already been deduplicated would be kept in that state after I disabled deduplication on the filesystem. This caused the kernel to keep a huge deduplication table in memory which in turn caused other problems when the load on the server increased. I had to move all data away from the filesystem, destroy the entire ZFS pool, create a new one and copy the data back. That’s why the stop was quite long 🙂