posted on Saturday, April 09, 2005 10:57 AM
by
amachanic
Windows XP Crash: Lessons Learned
Yesterday morning I had to deal with a non-bootable Windows XP machine. Every time it turned on, it would get to the Windows XP spash screen, sit there for a while, then flash a BSOD and restart -- the BSOD flashed just long enough to see that the screen was blue, and maybe the words "dump" or "kernel" if you looked fast enough. But not enough to get any real data.
Nothing new had been installed on the machine, and it had booted fine the night before. Typical bit-rot situation. Very annoying.
This wasn't the first time I've ever had to deal with XP spontaneously deciding not to boot... I've had this happen on numerous occasions. And here's what usually happens: I throw in the XP CD ROM, boot it, and try to get it to launch auto-repair mode from the install screen. But nine times out of ten, that option doesn't show up. I'm not sure what makes that option show up or not, but apparently on my computers it just doesn't.
So at this point, I usually just shrug and re-install XP. I specify a new computer name and a new default user name so that none of the documents will be overwritten, and resign myself to a few days of re-installing all of the apps I use. One side benefit of this is that I now use very few apps!
But yesterday, that wasn't an option. It wasn't my computer, and documents on the computer needed to be completed by mid-afternoon for a major deadline. Ugh...
I booted off the CD ROM, and as usual the repair option didn't bother coming up. So I started Googling for some solutions... And found a few web pages with advice on how to get that option to come up.
Turns out, you need to use the Recovery Console for that. I'd booted into it a few times in the past, but never bothered learning how to use it...
Lesson 1: To boot into the recovery console, you need the administrator password. Oops. I hadn't written it down when I installed XP on this machine. I have about 10 different passwords I generally use, but you're only allowed 3 tries per boot. And each boot cycle takes a LOT longer than it needs to -- who knew there were so many different disk drivers required to start up Windows? (Anyone who's recently booted off the CD ROM knows exactly what I'm talking about) -- On the fourth try, 30 minutes or so into this exercise, I finally figured out what the password was.
Into the recovery console I went, and after trying several different "tricks" from various web pages, and rebooting a bunch of times, the option still didn't appear. So I kept searching. Finally, when I was about ready to just re-install and try to very quickly get the needed documents back in shape for the deadline, I found this excellent, utterly-lifesaving article by Charlie White. Following his sage advice I went back into the Recovery Console where I backed up the registry, restored a recovery version, booted into XP (and strangely, after rebooting I had to use a different administrator password! I'm not sure why), recovered to a point from a few days ago, and rebooted the machine back into XP, good as it had been before the crash.
Lesson 2: If you search the Web, you will probably find someone who knows more about the subject than you do, and who can save you a lot of time. Re-installing was the way I knew to fix the problem; this is due solely to the fact that I've never bothered actually searching for a better solution before!
Great stuff. But why did it crash to begin with? I was unable to find any log entries or other diagnostic data, but I figured I should run a disk scan to check for issues (CHKDSK /R). And that made the problem instantly apparent. The scan reported a single bad cluster in the SOFTWARE registry file.
Lesson 3: If your hard disk starts making clicking noises, that means that bad things are about to happen! Turns out, this disk had been clicking for about a week before the crash.
... And were there any data backups? Of course not, this is a home computer! I still don't know what to do about that; another computer decided to crash this morning, so it's becoming painfully apparent that my house is cursed -- and I need a backup solution.
Lesson 4: As if I haven't learned this one about a million times before... Backup is key! But I need a real-time solution of some sort. I believe Microsoft is working on some sort of "data integrity server" (?) -- I'm not sure if that will be suitable for home networks or only targetted at enterprise users, though.
Finally, I'd like to whine about the registry a bit. After this disaster, it's clear to me that a monolithic solution like the registry is just begging for problems. A single faulty cluster brought down the entire machine! I'm hoping we'll see some kind of solution for these types of issues in Longhorn.
Anyway, I'm now typing on the fixed computer, waiting for the other computer to finish its scan (two faulty clusters already found)... I hope everyone else is having a more productive weekend!