Sunday, 23 November 2008

BIOS LBA / Hard disk, apparent failure problem "Blinking cursor of doom"

A rather interesting low level problem: If you ever get a machine inexplicably stuck after POST with a blinking cursor in the top right and nothing more and want to know why or fix it, please read on!

Whilst there are a number of possible causes (like incorrect BIOS settings) I'm assuming all of those have been checked and ruled out, if so and the problem still persists then the following might be an explanation:

A while back I upgraded my ageing Dell Ispiron 6000 laptop hard drive (80gb) with a 160gb drive. I cloned the original drive to retain everything but made the partition larger to make the extra space available. The machine ran XP Pro SP2 and had been stable for a couple of years, since I first got it (after a battle with a nasty virus that I evenually managed to win a few months previously).

An easy enough job and all was good, for a while. One day the machine needed a reboot, after shutting down it came back with the Blinking cursor of Doom!

This is where, straight after POST (Power On Self Test) the machine totally hangs (before Windows starts up). My first thoughts (after oh crap) where along the lines of, has the drive died, mbr gone wrong, virus trashed the drive, partition table messed up, file system corrupt etc?

After many hours, I'd done a full chkdsk /r, fixmbr, checked the MBR (Master Boot Record) and Partition Table information, made sure the partition was active and so on, I even moved and resized the partition slightly, swapped the drive into another machine (was fine), tried a different drive in the laptop, also fine. Checked my BIOS (Basic Input Output System) for updates and so on - nothing seemed to be wrong - weird!

The MBR is the sector at cylinder 0, head 0, sector 1 - it's launched according to the BIOS boot sequence, if the BIOS finds a valid MBR it's copied to memory and executed as part of the boot sequence of the machine.

I also looked and the Windows MBR code (disassembly) from here http://mirror.href.com/thestarman/asm/mbr/STDMBR.htm to check all paths through the boot loader - all likely paths result in a message of some sort like the archaic "Please insert boot disk" type messages. I just got a blank screen with flashing cursor - how come?

I could see all the files on the disk, the disk and it's contents appears fine - so the question was what on earth was stopping the disk from being bootable!?. It defied any logic, surely if the file system is intact, has no errors, the partition table was fine and everything was set for the system to boot from it, why didn't it?

After much time was spent on it (I really didn't want to have to set everything up again) eventually I decided it would be quicker to re-ghost from the original 80gb drive and get back to where I was 2 or 3 months previously, then copying over any new data from the fine but un-bootable 160gb drive.

I did this and all was well again (but in the back of my mind I was never happy I'd not managed to get to the bottom of it).

My machine was working but I was puzzled. The machine starts up in the System BIOS, it runs various checks and initialises the hardware and subsystems before trying to find a bootable volume. Assuming a bootable hard disk is found, the MBR is executed, the partiton tables are examined and active partition is bootstrapped, the first "file" lauched in a windows 2000/XP/NT platform being the ntldr program responsible for starting windows (and capable of multiboot).

Whilst dissatisfied I'd not understood the problem, as days and weeks passed, the problem faded into a dim and distant memory...


--


Then, about 3 - 4 months later - after updating ITunes and needed to reboot, the same thing stuck again! Now this time, whilst rather disappointed, I knew there had to be an explanation. Maybe it was hardware related, maybe the machine was unreliable - how could I trust it again if it did this every so often?

So, this time around I thought I need to get to the bottom of this - there must be a reason, it's happened twice in a row, and suspiciously in roughly similar time spans (3 months or so) from the original disk clone - this had to be a clue.

So, after chkdsk /r etc just to check everything was ok, I got to work.

The disk was un-bootable, but the file system was fine. It hung immediately after POST, but with the briefest of flashes of the hard disk light. This had to be something low level, before windows, after POST - not much happens, especially knowing the MBR and partition tables were intact and fine too.

I renamed c:\ ntldr, IO.SYS and NTDETECT.COM, the low level bootstrap files for Windows.

After booting, the machine did exactly the same. Ahah this was a clue of sorts, if the disk was bootable but it wasn't getting as far as running ntldr then the fault was prior to that. If MBR and partition table were ok, then the fault was prior to that also, so, the BIOS?

I checked again for BIOS updates, nothing released matched the problem.

Then, after much Googling came the clue. Some BIOSes have an intrinsic 137gb limit.

That's 137gb of addressable disk, which might translate to less taking into account the overhead of the file system and block sizes etc.

Whilst Windows is happy to grow the file system behond this limit, eventually the BIOS will reach this maximum and barf. So, it could explain it....

There was only one way to find out. I resized the partition (made smaller) and created a new one in the new spare space. I copied files over from the original partition and then resized it smaller again, growing the new one until they were about the same size (2 x 80gb partitions rather than 1 x 160gb).

I restart with the downsized partition this time the machine comes up with a more helpful missing ntldr error message - ahah! it's tried to boot from the disk!

Booting up for a final time from the boot disk I renamed the files (ntldr etc) back. Reboot one last time and hey presto, it's working fine. Nothing lost, and this time I only lost a few hours (3 or 4) from my life and most importantly, I got some closure on the problem and an understanding of what happened and how to avoid it forever.

Greatly satisfied and relieve, I thought I'd share this experience in case it helps someone else in a similar position. Like anything, when you know the answer it's easy, but I've worked with computers for years and for a while there I was completely stumped with a problem that seemed, rather un-computer like, to defy all logic!

Whilst I'm entirely happy with two smaller partitions, the longer term solution is known as 48bit LBA, AKA Big LBA - http://en.wikipedia.org/wiki/Advanced_Technology_Attachment


References:

http://en.wikipedia.org/wiki/Master_boot_record
http://en.wikipedia.org/wiki/Booting#Boot_loader
http://en.wikipedia.org/wiki/Windows_NT_Startup_Process

http://mirror.href.com/thestarman/asm/mbr/STDMBR.htm
http://www.ata-atapi.com/hiwmbr.html

http://en.wikipedia.org/wiki/Advanced_Technology_Attachment

.

5 comments:

mystevico said...

man..i would like to thank you for saving alot of my time off this dumb problem. ive been trying to solve this issue for a long time..but couldnt find a solution. will try your method out later and tell you the outcome.
thanks again...dude.

LouisB said...

Happy to help! I know it caused me much pain - I hope it helps you out!

jwhiteheadcc said...

What's funny is that it's happening to me and it doesn't seem to matter if the partition (or even the drive) is small enough.
We're talking about anything from 2GB upto 80GB drives. Well below the 128GB (137GB market speak) limit.

I'll let you know how it turns out. Last time this happened I just used a Windows XP setup disk, and then once it rebooted I deleted all the files and put the ones I wanted on there. This is just plain annoying!

I think the 6 sectors right after the boot record are blank when you partition in Disk Management, but are properly copied when using a boot CD.

jwhiteheadcc said...

Confirmed - I've been running several days with no problems. I just used the recovery console and used the FixBoot command to replace the boot record and NTLDR module (not the same as the NTLDR file - this is 6 sectors).

The curser stuck the upperlefthand corner can be caused by there being an infinite loop caused by 'empty' machine code being loaded like mine did. It literally tried to run "00 00 00 00 00 ..." which of course entered an infinite loop on this CPU family.

LouisB said...

Thanks for the details jwhiteheadcc, all good information.

The problem you describe is different to the one I experienced, a rather more straightforward one. In my case, fixmbr, fixboot, replacing ntldr and so on were some of the very first things I tried, and hence the confusion when these sort of corrective actions made no difference whatsoever.

A corrupt bootsector is a more common failure for which the system can usually be easily recovered using the command line.

Interestingly I did run through the bootloader (in the BIOS) and very few paths result in no output message at all!

Most result in arcane messages like "insert boot disk" etc

However this is good to know, there's possibly several underlying causes that can lead to this symptom.

I'd always recommend checking the more obvious failures like bootsector corruption before resorting to fixes for more obscure problems.