Microsoft Research has published a very interesting peer reviewed paper where they look at the causes of crashes in over a million consumer PCs. This is the most comprehensive investigation into crashes on home PCs that anyone has ever undertaken. The paper has a bunch of cool findings including what you can do to reduce the chances your PC will crash or die completely.
Apart from hardware completely killing itself, we have all been victims of software crashes. It is easy to blame these on the programmer but sometimes the hardware itself has done something to pull the rug out from under your program. Sometimes it is just an essay you have been working on, but every now and then it is right before your game gets saved.
Microsoft found, after studying the crash reports from more than a million PCs that there are a few things we can do to make our machines live longer. Some people will intuitively have guessed some of these conclusions but this is the first time we have had some facts to back them up.
When Intel or AMD make a CPU they perform a bunch of tests at the factory to choose which clock speed the CPU can reliably run at. You do not have to run your CPU at this manufacturer specced speed though. For years people have been overclocking their CPUs to run faster. When overclocking a CPU you ramp up the speed iteratively until you find a speed which seems to give you system stability. Often you will upgrade cooling or buy a bigger power supply to keep the CPU running at the higher speed. Microsoft found that over an 8 month period a CPU that is overclocked is up to 20 times more likely to have a failure than at a vendor rated speed. Microsoft don’t names names, but one CPU vendor does better when overclocked (figure it out).
People have been wondering if reducing the clock speed of a CPU will make it more reliable. The idea is that reducing the speed means CPUs do not get as hot and require less electricity, the reverse of overclocking. Microsoft found that running your CPU at a lower speed meant machines were up to 80% less likely to crash in the 8 month period.
The paper shows that reducing the CPU speed reduces the failure rate and suggests that you canminimize the likelihood of a CPU crash by operating at the slowest CPU speed to achieve the desired performance.
Matching CPU Speed to System Demand
As Microsoft points out, we PC folks already have a technology in our computers that can control the CPU speed called DVFS. The difficult thing is matching the speed of the CPU to the demand. If you set the speed too low you will be slowing the machine down, if you set it too high then you are needlessly taxing the CPU as well as wasting energy.
MiserWare has a consumer version of its power management software called Granola Personal which was designed to match CPU speed to system demand. We made it primarily to reduce your energy footprint but it will do exactly what Microsoft suggests, set your CPU to the lowest speed it needs to run at.
Granola is available for PCs running Windows XP-7 and Linux Ubuntu, Fedora, Debian and RHEL. By default after it is installed it will manage your CPU in “MiserWare mode” which will change the CPU speed to match demand. You can also force your CPU to its lowest speed if you wish, using the settings window which looks like the screenshot. Matching your CPU speed to demand has the bonus of saving you some money on your power bill.
If you have a large number of PCs like a university lab and do not want to have to install Granola Personal on every machine there is a simpler to version for you to use. Granola Enterprise (free to try) is designed to be rolled out on large numbers of machines and be centrally managed.