When Mythtv Servers Go Belly Up
Oh, the irony. A little over a week ago I found myself thinking that I hadn't written a blog post in quite a while and I realized that it was because my setup was running so smoothly. There was just nothing to write about because I hadn't needed to do anything with it for quite a while. Little did I know, that was about to change.
It all started off like any other night. Things were running smoothly and it was time to sit down to a bit of TV watching. After thinking to myself how well things were going, I thought that I would log into my Mythtv server and see how long things had been running between reboots. This is when things got a little funky. First, I tried VNC to get a remote desktop: Connection Refused! This happens every now and then, no big deal. I figured that I'd just log in over SSH and relaunch the X-server. Unfortunately, that didn't work either. Uh oh. That's not normal. At this point, I was starting to feel a bit concerned. As casually as I could, I headed off to the server to see what was up. Normally, I would just hit the reset button and connect to it remotely again but this time I decided to pull out a keyboard and mouse for some direct access. Anyone want to guess what I saw? That's right, garbage: Major screen corruption. OK, whatever, I could just reboot it and everything would be fine. That's what I thought at first. After hitting the reboot everything appeared to be running fine. The system booted right up without a sweat. But, as soon as it got to the login screen, it shut off. What the...?! So, I hit the power button again and this time it lasted for about two seconds before shutting off. I hit it again and it lasted about one second. No good! Something was obviously wrong here.
So now it's time to put on my computer debugging hat. What does it mean if something works for a bit and then later stops, and each time the duration of workingness (yes, I will assume that's a word) gets shorter and shorter? I figured it was a thermal issue and let the system cool for a while. After about half an hour, I tried again. I hit the power button and the system ran for about ten seconds. I hit it again and it was back to one. Hmm... I felt I was on to something so I let it sit overnight. At this point I returned upstairs and was forced to utter the words that all of us dread: "Honey, the backend is down. And it might not be back up for a bit". As you can imagine this did not go over well and I was inspired to get things working again as quickly as possible. I need not worry about sleeping...
I tried turning things on again the next morning after letting it sit all night thinking about what a bad computer it was being, but it did not help. It still would only power up for a few seconds and then shut down again. So, now it was time to start removing hardware to see if I could find the problem. First thing to go were the hard drives because they were easy to disconnect. That didn't help. Next went the tuners: two PVR150s and a PVR350. No difference there either. Next, it was time to start pulling memory. That did not have any affect either. At this point, all I had left was the mother board, processor, power supply and a couple of fans. I took a power supply from another system and plugged it in to the same result: power for a second and then death. So, I knew it was either the motherboard or processor. Since I suspected a thermal issue from the start, I removed the CPU heatsink and reapplied the thermal grease. This may have helped cool things a bit because the grease that was there had gotten a bit dried out but it had no affect on the overall problem that I was seeing.
At this point, I was stumped so I decided to try another tactic. I pulled out my development system, plugged in all of the tuners, hard drives and the power supply from my Mythtv server and powered it up. Needless to say, everything started right up. Luckily, I was able to get this far just in time to start recording shows for the evening. It was a hack but it was running. And this is where things are sitting now. I'm thinking about buying some new components to replace what died on me so if you have suggestions, leave a comment below.