Server overview 2021

I haven’t really talked about the backend infrastructure for Blinkenshell in a long time, so I thought I would give an update on the servers/VMs that are currently running as of now. Most people probably think it’s just a shell server and that’s it, but in reality there’s almost 25 different VMs involved in running Blinkenshell!

  • Triton (SSH)
  • Buildserver
  • Web server backend
  • Web server front/cache
  • Database replication slave
  • Mail server
  • /home storage server
  • Blinkenbot and signup utils
  • 2x IRC servers
  • ACME
  • Nagios monitoring server
  • Telegraf/Grafana monitoring server
  • Log server
  • Off-site backup
  • 3x LDAP
  • Tunnel server IPv4
  • Tunnel server IPv6
  • 3x Firewall

Why so many? The most important reason here is mostly security again, to try and isolate different parts from each other as much as possible by running on different VMs with firewalls in between. It’s also a lot more flexible when making upgrades/changes to only take down one part at a time. But still, is 25 VMs really required? Probably not, but I like labbing and testing out some different things! There’s actually even more VMs than the ones listed above but they’re not required to run the service but more of lab/test things.

If you want to help support the running costs of Blinkenshell please consider supporting via Paypal or Patreon πŸ™‚ Any anything you want to see/know more about? Let me know!

Posted in hardware | Leave a comment

KVM storage live migration

One of the features I’ve been most excited about with this new server setup is KVM live migration using virsh migrate with copy-storage-all. This is like a regular VM live migration, but you can do it even if you don’t have a shared storage for your KVM hosts (shared storage is usually a dedicated NAS for VM disk images). I don’t want to be dependent on too much hardware and I feel like a NAS will add a lot of gear, both in terms of the actual NAS which you might want to add some redundancy to, but also in terms of network equipment (like do you want redundant switches?).

My goal with this setup is to be able to do some hardware maintenance on hosts, as well as upgrading of the KVM host software and firewall software without having to take everything offline. If there is some unexpected hardware failure everything will still go offline and there is no fast way to recover from that but I’m OK with that for now. I also want the setup to be reasonably simple, so I wanted to stay away from any kind of clustering file systems that probably will not work well on just two hosts.

This is a simplified view of the setup I went for:

Blinkenshell 2021 KVM host setup

Both KVM hosts were actually installed about the same time and has been running since the server migration. The second internet connection was added later however, and I wasn’t able to test the internet failover part before we went live. I did however do some tests migrating VMs between the hosts, but was not possible to completely power off one of the KVM hosts since that would result in losing internet connectivity. A couple of weeks ago I added the second internet connection and have been working on the firewall setup to make failover possible, I might do a separate post about this if someone is interested.

So, how do you actually migrate a VM between two KVM hosts without having a shared storage? First we should probably discuss the actual software setup in a bit more detail. I’m running Ubuntu hosts with KVM and manage my VMs using libvirt, either by using virsh on the command line or preferably using GUI via Virtual Machine Manager / virt-manager. I’ve also enabled libvirtd tcp socket on a dedicated VLAN on the 10G trunk between the hosts to make the copying faster. I did this by overriding a few settings in the ubuntu included systemd libvirtd-tcp.socket: systemctl edit libvirtd-tcp.socket


And enable the socket: systemctl enable libvirtd-tcp.socket.

The magic is then accomplished by adding the option --copy-storage-all to the virsh migrate command, something like this:

virsh migrate --p2p testvm1 --undefinesource --persistent \
 --copy-storage-all qemu+ssh://<other-host-ip>/system tcp://<other-host-ip>

There are lots of options to virsh migrate depending on what you want to do. In my case I want to migrate the VM testvm1 to a new host, and I want to have the VM running permanently on the new host until I make some other change. undefinesource and persist will make it so that the VM configuration is removed from the old host and only appears on the new host after, where it will be kept in a persistent state (instead of just a temporary move). copy-storage-all will make sure to copy the contents of all local disks attached to the VM, but you have to make sure to create empty qcow2/raw disk files on the new host with the correct sizes before you can start the migration! It’s important they have the same name and size as on the original host. I also include some options to have the actual data transfer on this dedicated copy VLAN.

In my case most VMs are very small, and a copy including storage of a very small VM can complete in something like 30 seconds. A large VM with a few hundred gigabytes of storage takes almost 20 minutes to copy. The storage I’m using is a few generations older NVMe SSD, in RAID1 setup they manage to write at around 1GB/s which fits pretty nicely with 1x10G NIC for the network. I actually connected 2x10G and was hoping to get 2GB/s but the write speed after RAID was a bit slower, but still good enough for my scenario.

The actual news in this post is that yesterday was the first time I actually completely powered off the KVM1 host to connect a UPS, and I managed to use this storage live migration to move triton (the SSH server) off from the KVM host, perform my maintenance, and move triton back without disconnecting any IRC sessions! So both KVM migration and firewall failover worked, yay! It’s still kind of a tricky maneuver so I fully expect I will mess it up next time πŸ˜€ Stay tuned

Posted in hardware, maintenance, network | Leave a comment

Fail2ban routing actions

I said I should be doing some more technical posts so here we go! Blinkenshell runs the SSH server on a non-standard port (mainly port 2222, but also port 443 for people trying to avoid some firewalls), and the reason for doing this is to avoid some automated bots that go around the internet scanning for open SSH servers and trying different bruce-force attacks to log in. I still think this non-standard port helps a bit, but there’s always some more persistent attackers out there that find SSH servers running on other ports and start hammering away there as well, and for this reason we have fail2ban. Fail2ban will scan through any logfiles you specify and apply certain filters/patterns to find failed login attempts, and if there are repeated failed attempts it will try to perform different actions like blocking the source IP using iptables. For Blinkenshell I try to avoid running too many programs on the main SSH server triton because of security reasons, so our setup is a little bit different.

The first part of the puzzle is rsyslog running on triton sending syslog messages over UDP to a separate log collection server. This is useful for several reasons like trying to figure out why a server crashed or to get forensic data after a host was compromised. In this case we’re going to use it to be able to run fail2ban on a separate host from the one we’re trying to protect. So we install fail2ban on the log collection server and set up a new “jail” that will listen for logs coming from triton. Something like this:

port = ssh,2222
logpath = /var/log/remote/triton.log
enabled = true
filter = sshd[mode=aggressive]
banaction = route

The second part of the puzzle is hinted at the last line “banaction = route” here. Since fail2ban will not run on the same server as the server we’re trying to protect any iptables rules that we install locally on the fail2ban server will not stop the attackers. The idea here is to add any banned IPs to the local routing table, and then send these routes to the firewall and drop the traffic there. This will require a routing protocol daemon running on the fail2ban server that talks to a routing daemon on the firewall.

In my case the fail2ban server runs on Linux so here I’ll choose FRRouting (FRR) for the routing daemon. My preferred choice of routing protocol for cases where you want to specify some routing policy is BGP, so I’ll enable the bgpd in /etc/frr/daemons and systemctl restart frr. You can then enter a cisco-style cli using the command “vtysh”, and go into configure mode using “configure”. Sample config:

ip prefix-list BLACKHOLE seq 5 deny le 32
ip prefix-list BLACKHOLE seq 10 deny le 32
ip prefix-list BLACKHOLE seq 15 permit ge 32
route-map BLACKHOLE permit 10
 match ip address prefix-list BLACKHOLE
router bgp <myasn>
 neighbor <firewallip> remote-as <firewallas>
  address-family ipv4 unicast
   redistribute kernel route-map BLACKHOLE

To try and avoid any accidental dropping of legitimate traffic I’ll add a little route-map to deny my local prefixes first, and then allow any /32 routes. The “redistribute kernel” line is what will actually take the routes that fail2ban added to the kernel routing table using “banaction = route” and add them to the BGP table.

On the firewall end I’m using OpenBSD so there we will be using openbgd instead of FRR, so it’s a completely different syntax! I’m actually already running bgpd for other things on the firewall (maybe more updates on this later!), but the relevant parts to this fail2ban blackhole thing are:

prefix-set accept-blackhole-in {
Β Β Β prefixlen = 32
neighbor <fail2ban ip> {
    remote-as <fail2ban as>
    descr "fail2ban"

allow from <fail2ban ip> prefix-set accept-blackhole-in set pftable "blackhole"

Again, another filter to only accept /32 routes so we don’t ruin other routing by some misconfiguration. The other key part here is: set pftable “blackhole”. This will add any routes received from this neighbor to a table in pf. We can then refer to this table when writing firewall rules in pf like so:

table <blackhole> persist
block in quick log on $if from <blackhole> to any

Or if you want you can re-route the traffic to some honeypot etc with “route-to” options in pf. This will block all traffic from the banned IP in the firewall, so that IP address will not be able to reach any other services hosted behind the firewall either. This could be seen as extra protection, but it could also cause even more confusion for users that accidentaly type the wrong password too many times and then can suddenly not reach any other services either πŸ™‚ We’ll see how it goes!

Let me know if you’re interested in more technical stuff like this

Posted in Uncategorized | Leave a comment

New server up and running

I’m happy to report that the big maintenance window yesterday was successful and we are now running live on the new server hardware! I’ve spent a lot of time earlier this week doing final preparations and planning the exact steps to take during the migration because I knew there was going to be a lot of work to do during the migration and many things that could go wrong. I felt a bit nervous going to bed on Friday evening before the big day, but I also knew I had done a lot of preparation so I still slept very well πŸ™‚

Saturday morning started with some final preparations relating to the network setup. I had to disconnect both internet and my management connection to the new server to be able to then change it over to the final network configuration. I didn’t want to have to spend lots of time just trying to get back in to the server if anything went wrong so I set up an of out-of-bands connection to the server via a separate laptop for emergencies, then I rebooted the firewall on the new server to apply all the final changes and crossed my fingers I would still be able to log in after moving some network cables over to their new ports, and it all worked out on the first try!

The next step was to copy over a lot of data to the new server. I had tested this out with some less important virtual machines earlier so I knew roughly what to expect. I shut down the web server at 09:43 (according to twitter) and started the copy which took around 20 minutes. Here I also have to convert from vmdk format of disks used by ESX into the qcow2 format used by the KVM hypervisor. Things progressed pretty much as expected here and I continued with the mail services, directory services, ircd and lastly shut down the SSH/triton server.

Once all the services were down I could start copying all of the user home directories. I had prepared this by doing a ZFS snapshot and transferring it to the new server the day before. I did a final snapshot and then started an incremental “zfs send” operation to sync over any files that had changed in the last day. This seemed to work well, but then there was an error message for just one directory. I went to investigate and found that some of the directories did not have the snapshot that I thought I transferred the day before. This is exactly the kind of problem I did not want to run in to at this point πŸ™‚ I knew doing a full copy of all the directories would take somewhere around two hours and I did not feel like sitting around waiting for that long so I devised a little script that would transfer just the missing directories over which was much faster, crisis averted! πŸ™‚ At this point it’s around 12:31.

The next step is to actually disconnect the old server from the internet entirely and move over to the new server and firewall setup. This involved some more network patch cabling, lots of firewall rules and some routing. This went pretty well but there’s always some firewall rules that you miss, which sometimes requires doing some tcpdump work to figure out what’s actually going on. Anyway, around 14:01 things were starting to look pretty good in this area as well.

Next was lots of messing around with NFS exports, since I moved from a Solaris based OS called Nexenta to Linux the options for sharenfs had changed somewhat. Also more firewall rules.

Mail was the first service I wanted to get up and running so that’s what I started working on next. I know mail servers should try and resend mail for something like two days before giving up, but if I ran in to any problems I wanted to have as much time as possible to figure it out before any emails would get lost. This went well and I only had minor configurations to update to get it up and running. I also started up directory services and ircd which went pretty well, just some regular OS updates etc. Now it’s around 16:22.

Web services were next, here I had to do some more troubleshooting and I had actually forgotten to copy some data for the wiki so I had go back and move that over. At 18:23 web was back up and I was starting to feel pretty confident πŸ™‚

Lastly I started up the new SSH shell server that I have been preparing for about a month. It has the same hostname and ssh-keys etc as the old triton server and I tried to replicate the environment as well so hopefully it’s not totally strange. It’s running Ubuntu linux instead of Gentoo as I mentioned in the previous post. Here I had actually misconfigured the primary IP-address and when I went to change it I messed up the NFS mounts which make things very weird I had a hard time ever shutting the server off because of hanging processes. Eventually I got back using the correct IP and I sent a message on twitter at 20:48 letting people know it was possible to log back in again. I took a well deserved break and had a a fancy hipster beer and watched some Netflix to relax πŸ™‚

As far as I know things are mostly working fine on triton now, but there has been reports of some weird color garbage on the terminal after detaching from gnu screen (time to change to tmux?). I haven’t able to figure this one out so if you know what’s causing it please message me. Also two users had problems this morning that their /home got unmounted so I had to manually remount it. I’m not sure what was causing this but I’ll keep an eye on it. Other than that it’s mostly been some missing packages etc that I have been installing as we go.

There’s still a lot more to do before the move is complete but so far I’m very happy with things and I’m a bit less worried about the old server breaking down. Next on the agenda for me is getting blinkenbot and signup back up and running. There’s also work to get IPv6 back, and also lots of work on the back end infrastructure. Please let me know if you want to read more about the new setup, I’m thinking I should write more about how the final setup looks like now (or when it’s more finished).

Posted in Uncategorized | 2 Comments

Server updates 2021

Some of you might have caught a teaser picture I pasted on IRC a couple of weeks back of some server internals. This is actually what’s going to become the new main server for Blinkenshell in 2021! The current server is doing a fine job even though it’s very old, but I think it’s finally time to get a replacement to extend the life for hopefully many years. I’ve been thinking of new server hardware for a very long time, probably a year at least, and finally decided to start ordering some parts at the end of last year.

The last parts for the new server(s) arrived around the end of January and so I’ve been working very hard for the last month to get things up and running, performing different tests and experimenting with new ideas. Things are going to stay mostly the same but a lot of back end stuff is getting reworked. Everything is not decided yet but I want to post more build posts after I’ve “launched” things.

One change that you will probably notice in some way is that the SSH server “triton” is gonig to switch from Gentoo Linux to Ubuntu which means software package management is very different. I’m also trying my best use more standard components like systemd here, and even though it was a lot of pain in the beginning getting it to work the way I wanted now I’m actually pretty pleased with it. We’re also replacing SELinux with AppArmor, but I hope users will not notice too much change there. A lot of things are still very much custom though, like the kernel, AppArmor profiles, shell setup and so on, and security is still top of mind!

I’ve been working extra hard these last few weeks because I don’t want this move to drag on forever. My plan is actually to switch the SSH server and all of the underlying infrastructure next weekend on Saturday the 27th of February. This is going to be a huge maintenance window and I expect most services to be offline for the entire weekend. Hopefully mail services can get back online on the same day, and SSH services by evening Sunday the 28th (no promises though!). I don’t expect all services to be back online that soon however, things like blinkenbot, signup and even IPv6 might take quite a while longer before they are fully functional again.

I would also like to recommend everyone to make an extra backup of your files before the 27th just in case, I have no reason to believe there would be any data loss from the move but it’s always good to keep backups and you might want to access something while the server is down.

Posted in hardware, maintenance | Leave a comment

HTTPS for User Websites

User websites has been a part of Blinkenshell since the very early days, and the format has stayed pretty much the same. However, the web has evolved a lot and nowadays many visitors expect HTTPS by default. Because of this I’ve setup a Let’s Encrypt wildcard certificate for * and I hope to migrate over user websites there.

For now, any new users signing up will automatically get the new domain name <username> and HTTPS. Existing accounts still keep their <username> and no HTTPS for the time being, but contact me if you want to migrate over to the new domain name and HTTPS. I can also set up a redirect from the old domain name so external links will not break when making the move.

On another note, Blinkenshell is still running PHP 5.x but will have to migrate over to PHP 7.x very shortly so if you have some old code running make sure it’s compatible as soon as possible!

Posted in Uncategorized | 1 Comment

Server maintenance 14th January

The server maintenance of Triton (SSH) is done for now. If you want updates about when we’ll have update (and when it’s done) make sure to follow Blinkenshell on TwitterΒ and hang around in the IRC-channel.

As you might have guessed this is in response to the Meltdown and Spectre attacks. I was not able to successfully run any Meltdown attacks on Triton before the patch because of some other hardening, but I’m sure it was theoretically vulnerable anyway so we definitely needed to patch.

Patching was delayed a bit because I also needed to rip out the old hardening/RBAC system based on Grsecurity and replace it with SElinux. Grsecurity has decided to not provide any free versions of their software and only provide updates to their paying enterprise customers. They’ve previously talked about still providing an option for non-commercial, use but they failed to get anything out even though it’s almost a year since they announced this. It doesn’t even seem like they will provide the community with patches for the very serious bugs Meltdown and Spectre, and they also removed all the old software archives. Basically they have abandoned their old community users which is a pity, but fortunately there are other alternatives out there. These mailing list messages might shed some light on the “conflict”.

Some of the kernel-hardening work has been included in mainline, and more will hopefully come via some kernel hardening projects. As for the RBAC Blinkenshell will move to SElinux which is also included in mainline Linux and fully supported. This might result in a lot of weird problems and errors in the beginning, but we’re starting out pretty light on the policy. Please report any issues to independence.

I also want to say this is probably not the last patch for Meltdown/Spectre, and we will probably have to patch again in the not too distant future so expect more downtimes coming up. In the meantime enjoy updated versions of irssi (1.0.6), weechat (2.0.1) and a lot of other updates!

Posted in Uncategorized | 1 Comment

More changes to the vouching system

I’ve evaluated the changes to the vouching system in the signup system that was made a few months back and I’m mostly happy with the changes, but as you might have guessed from the title there are a few things that I felt had to change. It seems like it’s gotten a bit too hard to pass the vouching step since we started requiring three vouches, therefore I’m going to change it back to two vouches to pass the step. I still want the community to be able to grow with new members. It should be hard to get an account, but not too hard πŸ™‚

I’ve also changed negative vouches so they give -1 point instead of -2. Some people get very discouraged by a negative vouch, and it required a lot to get past a -2 vouch. Also it seems fair that each vouch is worth the same, negative or positive.

Lastly, you can no longer vouch users directly from IRC. You have to make an effort and log in to the signup program and actually give a short comment/reason for your vouch. This is to make sure that you actually considered the vouched before giving it.

I hope you think the changes are good, feedback is always welcome! Comment on the blog,Β  send an email or discuss on IRC (main chat or pm is fine)

Posted in Uncategorized | Leave a comment

Shirt design contest

Okay, it’s finally time to have a vote on the best shirt motif design. There was only two valid submissions emailed to me, so I’ve included the original design that I made as well. Go and check out the designs!

When you have made up your mind go and vote for your favorite design (Google account required, if you don’t have a Google account E-mail me your vote).

The winning design will be put up at the Blinkenshell T-shirt Shop at Spreadshirt:


Posted in fun, merchandise | Leave a comment