How to work with Windows 10’s Reliability Monitor

29.10.2015

Reliability Monitor is a built-in part of Windows that's been around since the introduction of Windows Vista back in January 2007. It's always been a somewhat hidden feature of the Windows operating system, and therefore easy for users and admins alike to overlook. Nevertheless, it's a great tool that provides all kinds of interesting insight into system history and stability (see Figure 1). Reliability Monitor is particularly useful when troubleshooting glitchy systems, and can provide insights into possible causes as well as important clues to fixing things.

Reliability Monitor is part and parcel of the Reliability & Performance Monitor snap-in for the Microsoft Management Console (MMC). That said, Reliability Monitor comes pre-defined with all modern Windows versions, so there's no need to launch MMC, and then to start adding and configuring snap-ins to make Reliability Monitor work.

Instead, Reliability Monitor taps into the Windows Event Manager to elicit data about your system, with a focus on events that impact reliability, as well as performance counters and configuration data. Reliability monitor tracks five different categories of information, namely:

Monitoring results are compiled over time, where trouble-free operation increases the stability index, and errors or failures decrease the stability index as they occur. A value of 10 is as high as Reliability Monitor goes, and a value of 1 is low as things get. In actual practice, 10 values on stable, lightly exercised systems are common; and heavily exercised and somewhat abused test systems will throw readings of about 1.7 or thereabouts.

Interestingly, though Reliability Monitor visually tracks errors in the five categories already discussed, it provides details in only three categories in text form at the bottom of its console window, where details or solution lookup is available on an item-by-item basis. Those three categories are:

Reliability Monitor stores reliability history in its own internal file format, but you can use the "Save reliability history…" button at the lower left in the console window at any time to save a snapshot of that data in XML format. This saves only the hourly values for the Reliability Index that the program calculates while a PC is running (not all of the event data from which the index is calculated) in highly human readable form, as the brief snippet in Figure 2 shows:

As you can see, for each hour of trouble-free operation, the Reliability Index gains .03 in value. Losses for errors vary by severity, but typically fall within a range of -0.2 to -1.0.

As is the case with many Windows tools and utilities, there are many ways to launch Reliability Monitor on a PC. My favorite is simply to type "reli" in the search box, and let Windows produce the "View reliability history" prompt that launches this console in response. The explicit, step-by-step way to get to this program is as follows:

Either way, you'll be presented with the Reliability Monitor interface for the local PC. For access to remote PCs, you can establish an RDP session with the target PC, then run Reliability Monitor within that window. It works with equal facility through RDP (or other remote access tools) just as it does locally.

For a proper demonstration of what Reliability Monitor can tell you, and how it points to causes and possible cures, let's take a look at Reliability Monitor data for one of my most heavily used and abused test machines:

As the figure shows, on September 24 this machine had an unusual occurrence of a miscellaneous failure. Clicking the arrow at the left side of the graph moved the timeline back to include that day in the view. Double-clicking on the item in the detail list below that read "Disk failure." This produced the following Description text:

This event signaled a serious enough hardware problem with a SATA device on my test PC that resulted in an immediate loss of access to the drive's contents. Looking further back in the history, this was not foreshadowed by other, less severe SMART errors that might have signaled immanent drive failure on a conventional hard disk (this device was a synthetic SSD that consisted of a RAID 0 array of 2 identical mSATA SSDs, where the controller card itself failed). Failing conventional drives would typically provide warning with increasing (and increasingly severe) SMART errors before failing outright, and you'd be able to pick this up in Reliability Monitor.

You can see that this PC has serious stability problems with the built-in Photos app on this machine. As a result of ongoing errors in using the program on this machine, I've now switched to a different application (IrfanView, with choice of default app for viewing images also choosing that program) for viewing photos and images on that machine.

Although you can't always fix the problems that Reliability Monitor will catch, you can apply the punchline of the well-known joke to guiding (or channeling) user behavior: "Patient says to doctor: 'It hurts when I do this' (demonstrates by action). Doctor says to patient: 'Don't do that!'"

Sometimes, managing reliability boils down to managing the behavior of system users, especially when fixing somebody else's unstable software is not a viable option, but where counseling avoidance and providing alternatives (along with proper defaults) steers clear of that problematic software.

In general, working with Reliability Monitor requires looking at the causes of errors, and deciding what might be done to address them. When fixes are possible, they will usually be fairly easy to figure out. Often, though, one must simply steer clear of programs or features that don't work the way they should so as to avoid unnecessary errors. As is so often the case with Windows: "If you can't fix it, avoid it," is a watchword to live by.

(www.cio.com)

Ed Tittel, Kim Lindros

Per E-Mail versenden

Artikel als PDF kaufen

Über den Autor