As electronic devices with no moving parts, memory modules seldom malfunction if they are installed properly. When problems do occur, they may be as obvious as a failed RAM check at boot or as subtle as a few corrupted bits in a datafile. The usual symptom of memory problems is that Windows displays the Blue Screen of Death. Sadly, there are so many other possible causes of a BSOD that it's of little use as a diagnostic aid.
When Bad Memory Turns Good
As odd as it sounds, faulty memory is seldom the cause of memory problems. When you experience memory errors, the most likely cause is a marginal, failing, or overloaded power supply. The next most likely cause is system overheating. In particular, if the system works normally when first turned on but develops problems after it's been running for a while, power supply or heat problems are the most likely cause. Only after you have eliminated these possibilities should you consider the possibility that the memory itself is defective.
As a first step in diagnosing memory problems, run Memtest86 (http://www.memtest86.com). Memtest86 is available as executables for DOS, Windows, and Linux, but the most useful form is the bootable ISO image, which can load even on a system with memory problems so severe that Windows or Linux cannot load and run. If you have a Knoppix disk handy, insert that, power up the system, type memtest at the boot prompt, and press Enter. However you get it running, configure Memtest86 to do deep testing and multiple loops. Let it run overnight, and log the results to disk.
When you examine the log, note the addresses where errors occurred. If errors occur reproducibly at the same address or nearby addresses, it's likely that the memory module is defective. If the errors occur at seemingly random addresses, it's more likely that the problem is the power supply or a system temperature that's too high. One possibility, of course, is that the system temperature spikes only when you're gaming or doing graphics work (running the CPU and video card flat out). This effect can make temperature-related component problems difficult to isolate.
The POST Check
During POST (Power-On Self Test), most systems test the memory. Although the POST memory test is not nearly as exhaustive as running a memory diagostic utility, it is useful as a "tripwire" test to warn you if severe memory problems occur. Many system BIOSs allow you to disable or abbreviate the POST memory test. We recommend leaving it enabled unless you have so much memory installed that the time required to test it at boot-up is excessive.
If the errors are random, take steps to eliminate the power or heat problem. If the errors occur at reproducible addresses, it's time to start pulling DIMMs. When troubleshooting memory problems, always
- Use standard antistatic precautions. Ground yourself by touching the case frame or power supply before you touch a memory module.
- Remove and reinstall all memory modules to ensure they are seated properly. While you're doing that, it's a good idea to clean the contacts on the memory module. Some people gently rub the contacts with a pencil eraser. We've done that ourselves, but memory manufacturers recommend against it because of possible damage to the contacts. Also, there is always the risk of a fragment from the eraser finding its way into the memory slot, where it can block one or more contacts. Better practice is to use a fresh dollar bill, which has just the right amount of abrasiveness to clean the contacts without damaging them, as shown in Figure 6-7.
Figure 6-7: Use a new dollar bill to polish the DIMM contacts
The next steps you should take depend on whether you have made any changes to memory recently.
If you suspect memory problems but have not added or reconfigured memory (or been inside the case), it's unlikely that the memory itself is causing the problem. Memory does simply die sometimes, and may be killed by electrical surges, but this is uncommon, because the PC power supply itself does a good job of isolating memory and other system components from electrical damage. The most likely problem is a failing power supply. Try one or both of the following:
- If you have another system, install the suspect memory in it. If it runs there, the problem is almost certainly not the memory, but either an inadequate power supply or high temperatures inside the case.
- If you have other memory, install it in the problem system. If it works, you can safely assume that the original memory is defective. More likely is that it will also fail, which strongly indicates power supply or heat problems.
If you have neither another system nor additional memory, and if your system has more than one memory module installed, use binary elimination to determine which module is bad. For example, if you have two modules installed, simply remove one module to see if that cures the problem. If you have four identical modules installed, designate them A, B, C, and D. Install only A and B, restart the system, and run the memory tests again. If no problems occur, A and B are known good and the problem must lie with C and/or D. Remove B and substitute C. If no problems occur, you know that D is bad. If the system fails with A and C, you know that C is bad, but you don't know whether D is bad. Substitute D for C and restart the system to determine if D is good.
WINDOWS XP IS UNFORGIVING
Windows 95, 98, 98SE, and ME do not stress memory. If you upgrade to Windows XP or Linux, memory errors may appear on a PC that seemed stable. People often assume that they did something while installing the new OS to cause the errors, but that is seldom true. Such errors almost always indicate a real problem a marginal power supply, overheating, or defective memory. The problem was there all along, but Windows 9X simply ignored it.
If you experience problems when adding memory, note the following:
- If a DIMM appears not to fit, there's good reason. DIMMs are available in many different and mutually incompatible types. Every DIMM has one or more keying notches whose placement corresponds to protrusions in the memory slot. If the keying notches in the DIMM match the slot protrusions, the DIMM is compatible with that slot and can be seated. If the DIMM keying notches don't match the socket protrusions, the DIMM is the wrong type and is prevented physically from seating in that slot.
- Make sure that the DIMM seats fully in the memory slot and that the retaining arms snap into place to secure the DIMM. A partially seated DIMM may appear to be fully seated, and may even appear to work. Sooner or later (probably sooner), problems will develop with that module.
- Verify that the modules are installed in the proper slots to match one of the supported memory configurations listed in your motherboard manual.
- If the system displays a memory mismatch error the first time you restart, that usually indicates no real problem. Follow the prompts to enter Setup, select Save and Exit, and restart the system. The system should then recognize the new memory. Some systems require these extra steps to update CMOS.
- If the system recognizes a newly installed module as half actual size and that module has chips on both sides, the system may recognize only single-banked or single-sided modules. Some systems limit the total number of "sides" that are recognized, so if you have some existing smaller modules installed, try removing them. The system may then recognize the double-side modules. If not, return those modules and replace them with single-side modules.