I had to reboot the server this morning. One apache process got stuck in “uninterruptible sleep”, which is pretty bad because it’s impossible to kill it without rebooting. This has happened once before, so I’m starting to think we have some odd bug in the kernel that shows up after a month or so of uptime. I recompiled the kernel with the old SLAB allocator again, instead of SLUB (which is maybe not perfectly stable in 2.6.22).
Edit 2008-09-19: It seems there was a bug in Grsec/PaX that could cause this problem.
The downtime was a bit longer than expected because since I couldn’t kill the process, it was impossible to shut down the system properly. Then I wanted to make sure the filesystem was ok before I started up everything again.
I do not think this problem is related to some connectivity issues that have been reported lately. They are more likely related to some problem at my ISP, maybe rebooting some router or similar (the downtime is usually 2-3 min or something, and it’s happened a couple of times during the last few weeks).