How To Determine 'System Reliability'

Jan. 1, 2003
How To Determine System Reliability by John S. Usher, Ph.D., PE Professor Department of Industrial Engineering University of Louisville [email protected]

How To Determine ‘System Reliability’

by John S. Usher, Ph.D., PE


Department of Industrial Engineering

University of Louisville

[email protected]

There is a wide variety of issues that should be addressed when talking about material handling system reliability. Some of the most important are:

1. To define "failure." This is the key issue, and for many systems, it is also very difficult. Consider a light bulb; we know when it fails, and we could even (fairly easily) measure the time to failure. But what about a complex material handling system. When is it "failed?" For example, consider a 5-vehicle AGVS; if a single vehicle fails, is the system failed? Or a 5-ailse AS/RS; if one crane goes down, is the system even affected? Probably not, if the repair takes only 16 minutes, but what if the repair takes 16 weeks? Generally, for complex systems, failures should be stated in terms of specific component failures or be related to system perfromance. For example, if a vehicle fails to operate, that's a system failure. Or if throughput drops below 200 packages per hour (for any reason), that's a failure. The point is, both customer and supplier must agree on the definition of "failure."

2. Customers should demand that the supplier generate accurate predictions about the likelihood of system failure, the effect of those failures, and the time (and cost) to repair those failures. These can be done in a number of ways: reliability, availability and maintainability are commonly used concepts. However, what generally happens is the customer says "I want 99 percent availability," and the supplier says "Yeah, we can do that," but neither party ever really analyzes it.

The supplier simply use intuition, experience, etc., and hopes the system is designed well enough to meet the goal. Most of the time the supplier is right, but occasionally he is not. That's when the system does not live up to expectations and there is trouble. There is a variety of techniques that can be used for correcting this problem and getting everything on the table in plain sight, including: block diagrams, fault trees, FMEAs, computer simulation, etc. Whichever methods are used, particular emphasis needs to be directed at characterizing:

a) The "frequency of failure;"

b) The "effects of failure;”

c) The "time to repair the system.

These three are a function of system design and operation. In addition, customers need to ask questions like: Is the system robust to variation in operating conditions? Is it modular and, thus, easy to repair? If one component fails, does the whole system fail or can other parts still operate? Knowing the answers to these questions helps the customer determine answers to other questions like: How many repair technicians should I staff? How many spare parts do I keep on hand? How often do I do preventive maintenance? These are critical because they ultimately affect the return on their investment.

3. Both customer and supplier should utilize experienced reliability engineers to define goals for reliability and availability, test programs, etc. Many companies get themselves into trouble because they assume that their design engineers can do the reliability work. Unfortunately, most design engineers have never studied reliability theory or probabilistic modeling (most of that is taught in industrial engineering programs). As a result, many contracts I have reviewed are seriously flawed when they are analyzed carefully, for a number of reasons: incorrect terminology, non-standard methodologies, incorrect calculations, etc. This is disturbing to me because there is a wide array of well-known reliability standards and textbooks that could help the situation.