Failure Rate
From Wikipedia, the free encyclopedia.
Failure rate describes how frequently something fails. Failure rate, often denoted by the symbol , is important in the fields of reliability theory and product warranties. The failure rate depends on the failure distribution, which describes the probability of failure prior to a specified time. Another way of expressing the failure rate is the mean time between failure (MTBF), which is the "average" time between failures. The failure rate is not always constant, so the hazard function is used to describe the instantaneous failure rate at any point in time. The bathtub curve, a particular form of the hazard function, is a typical representation of the failure rate of a system during its product life cycle.
Failure rate definitions
editFailure rate can be defined as,
- The total number of failures within an item population, divided by the total time expended by that population, during a particular measurement interval under stated conditions. (MacDiarmid, et al.)
Failure rate can also be defined mathematically as the probability that a failure per unit time occurs in a specified interval, which is often written in terms of the reliability function, , as,
- ,
or,
where,
- and are the beginning and ending of a specified interval of time, and
- is the reliability function, i.e. probability of no failure before time .
Failure distributions
editFailure rate depends on its failure distribution , , which is a cumulative distribution function that mathematically describes the probability of failure prior to time ,
- .
The failure distribution function is the integral of the failure density function, ,
- .
There are many failure distributions (see List of important probability distributions). A common failure distribution is the exponential failure distribution,
- ,
which is based on the exponential density function.
In this case, the parameter in the exponential distribution is the failure rate.
Mean-time-between-failure (MTBF)
editMain article Mean time between failure
A common use of the failure rate is to determine the mean-time-between-failure (MTBF), which may be thought of as the "average" time between failures. The MTBF is simply the recipricol of the failure rate,
- .
The MTBF is often denoted by the symbol, , or,
- .
Since failure rate and MTBF are simply recipricols, both notations are found in the literature depending on which notation is most convenient for the application.
The MTBF can be defined in terms of the expected value of the failure density function,
A common misconception about the MTBF is that it specifies the time (on average) when half of the items will fail. This is only true for certain symetrical distributions. In many cases, such as the non-symetrical exponential distribution, this is not true. For example, for an exponential failure distribution, the probability that an item will fail by the MTBF is approximately 0.63.
MTTF, mean-time-to-failure is sometimes used instead of MTBF in cases where a system is replaced after a failure, whereas MTBF denotes time between failures where the system is repaired.
There are many variations of MTBF, such as mean-time-between-system-abort (MTBSA) or mean-time-between-critical-failure (MTBCF). Such nomenclature is used when it is desirable to differentiate among types of failures. For example, in an automobile, the failure of the FM radio does not prevent the primary operation of vehicle, so that it may be desirable to differentiate the failure rates of critical versus non-critical failures.
Hazard function
editOne drawback of the MTBF is that it assumes that the failure rate is constant for all intervals. However, the failure rate of a system may vary with time, such that a single number does not accurately describe the failure rate during all intervals of time. For example, as an automobile grows older, the failure rate in its 5th year of service may be greater than its failure rate during its 1st year of service, so that it is not desirable to use the same failure rate to describe both of these intervals of time.
By calculating the failure rate for smaller and smaller intervals of time, , the interval becomes infinitestimally small. This results in the hazard function, which is the instantaneous failure rate at any point in time,
- , or,
- .
If the hazard function is constant, then the failure rate is the same for any equal period of time. This implies that failures occur with equal frequency during any equal period of time. The exponential failure distribution has a constant hazard rate.
For other distributions, such as the log-normal distribution, the hazard function is not constant, which means that the failure rate varies with time.
Bathtub curve
editMain article Bathtub curve
The bathtub curve describes a particular form of the hazard function that comprises three parts:
- The first part is a decreasing failure rate, known as early failures or infant mortality.
- The second part is a constant failure rate, known as random failures.
- The third part is an increasing failure rate, known as wearout failures.
The bathtub curve is often employed to represent the failure rate of a product during its lifecycle, namely, the product experiences early "infant mortality" failures when first introduced, then exhibits random failures with constant failure rate during its "useful life", and finally experiences "wear out" failures as the product exceeds its design lifetime.
The bathtub curve is often modeled by a piecewise set of three hazard functions,
While the bathtub curve is useful, not every product or system follows a bathtub curve hazard function. ASQC paper
Failure rate data
editFailure rate data can be obtained in several ways. The most common means are:
- Historical data about the device or system under consideration.
- Many organizations maintain internal databases of failure information on the devices or systems that they produce, which can be used to calculate failure rates for those devices or systems. For new devices or systems, the historical data for similar devices or systems can serve as a useful estimate.
- Government and commercial failure rate data.
- Handbooks of failure rate data for various components are available from government and commercial sources. MIL-HDBK-217, Reliability Prediction of Electronic Equipment, is a military standard that provides failure rate data for many military electronic components. Several failure rate data sources are available commercially that focus on commercial components, including some non-electronic components.
- Testing.
- The most accurate source of data is to test samples of the actual devices or systems in order to generate failure data. This is often prohibitively expensive or impractical, so that the previous data sources are often used instead.
Units
editFailure rates can be expressed using any measure of time, but hours is the most common unit in practice. Other units, such as miles, revolutions, etc., can also be used in place of "time" units.
Failure rates are often expressed in engineering notation as failures per million, or , especially for individual components, since their failure rates are often very low.
Additivity
editUnder certain engineering assumptions, the failure rate for a complex system is simply the sum of the individual failure rates of its components, as long as the units are consistent, e.g. failures per million hours. This permits testing of individual components or subsystems, whose failure rates are then added to obtain the total system failure rate.
Example
editA certain component is tested to estimate its failure rate. 10 components are tested until they either fail or reach 1000 hours, at which time the test is terminated for that component. The results are as follows:
Component | Hours | Failure |
Component 1 | 1000 | No failure |
Component 2 | 1000 | No failure |
Component 3 | 467 | Failed |
Component 4 | 1000 | No failure |
Component 5 | 630 | Failed |
Component 6 | 590 | Failed |
Component 7 | 1000 | No failure |
Component 8 | 285 | Failed |
Component 9 | 648 | Failed |
Component 10 | 882 | Failed |
Totals | 7502 | 6 |
Estimated failure rate is,
,
or 799.8 failures for every million hours of operation.
See also
edit- Failure
- Failure mode
- Bathtub curve
- Infant mortality
- Reliability
- Reliability theory
- Survival analysis
References
edit- Blanchard, Benjamin S. (1992), Logistics Engineering and Management, Fourth Ed., pp 26-32, Prentice-Hall, Inc., Englewood Cliffs, New Jersey.
- Ebeling, Charles E., (1997), An Introduction to Reliability and Maintainability Engineering, pp 23-32, McGraw-Hill Companies, Inc., Boston.
- Kapur, K.C., and Lamberson, L.R., (1977), Reliability in Engineering Design, pp 8-30, John Wiley & Sons, New York.
- MacDiarmid, Preston; Morris, Seymour; et. al., (no date), Reliability Toolkit: Commercial Practices Edition, pp 35-39, Reliability Analysis Center and Rome Laboratory, Rome, New York.
Online
- Mondro, Mitchell J, (June 2002), Approximation of Mean Time Between Failure When a System has Periodic Maintenance, IEEE Transactions on Reliability, v 51, no 2. (available from MITRE Corp.)
- Reliability Prediction of Electronic Equipment, MIL-HDBK-217F(2), (DOD download site.)
- Bathtub curve issues by ASQC.
External links
edit- MTBF and Reliability FAQ and forum
- Google Answers (TM) question on MTBF.
- PowerPoint (TM) presentation tutorial on failure rate by Cree, Inc.
- Usenet FAQ about MTBF.
Articles to link to this article
- Failure - add a note about failure rate; and/or a "see also failure rate"
- Failure distribution - maybe just a redirect?
- Bathtub curve
- Hazard function -redirect?
- Exponential distribution - update note about exponential having a constant failure rate & link here.
- MTTR
- MTBF
- Failure mode - or maybe just through "failure" instead
- MIL-HDBK-217 - new article needed
- Warranty