Good estimates of the reliability of a system make use of test data and expert knowledge at all available levels. Furthermore, by integrating all these information sources, one can determine how best to allocate scarce testing resources to reduce uncertainty. Both of these goals are facilitated by modern Bayesian computational methods. We demonstrate these tools using examples that were previously solvable only through the use of ingenious approximations, and employ genetic algorithms to guide resource allocation.

Assessing the reliability of systems represented by reliability block diagrams remains important. Take for example, U.S. military weapon systems and nuclear power plants. In making these assessments, often there are information and data available at all levels of these systems, whether they be at the component, subsystem, or system level. For example, there may be data from component and subsystem tests as well as expensive full system tests. In this paper, we are concerned with assessing the reliability of a system by combining all available information and data at whatever level they are available; here we consider the case where we have success/failure test data.

Much of the reliability literature ([

In the next section, we introduce the statistical model that combines all available multilevel data and briefly present MCMC for analyzing such data. Then, we illustrate this methodology by making reliability assessments for an air-to-air heat-seeking missile system and a low-pressure coolant injection system in a nuclear power plant first considered by [

Once multilevel data and information can be analyzed, the question arises of what additional tests should be done when new funding becomes available. That is, what tests will reduce the system reliability uncertainty the most? In this paper, we show how a genetic algorithm using a preposterior-based criterion can address this resource allocation question. Reference [

To combine multilevel data for system reliability assessment, we use the framework in [

Series-parallel system reliability block diagram.

We begin by considering the binomial data model when data are available at a node. At the

Next, we consider prior distributions for node reliabilities. For components, we use beta prior distributions in terms of an estimated reliability

We also allow the possibility that information (expert knowledge) is available on the reliabilities of subsystems and/or the full system; we assume that this information is independent of the test data and any information used to build the prior distributions for the component reliabilities. (Frequently, we will not use any such information: in particular, expert opinion about upper-level nodes will often be based on the same information that led to the prior distributions for component reliabilities. This information should not be used twice, so a simple solution is to exclude the upper-level expert opinion.) Assume that the information takes the form of an estimated reliability

A variety of models might be employed for the

Data for series-parallel system.

Node | Data | |
---|---|---|

0 | 15/20 | 0.8 |

1 | 0.9 | |

2 | 10/10 | 0.9 |

3 | 34/40 | 0.9 |

4 | 47/50 | 0.9 |

5 | 3/5 | 0.95 |

6 | 8/8 | 0.95 |

7 | 16/17 | 0.95 |

To combine the data with the expert knowledge represented as above, we use Bayes theorem

A fully Bayesian analysis of the model described above, which simultaneously combines all available multilevel data and information, is nontrivial. The posterior distribution is not analytically tractable: up to a normalizing constant, it is

The same MCMC algorithm just described for making draws from the joint posterior distribution can be used for making draws from the joint prior distribution

Plot of series-parallel system reliability priors (dashed lines) and posteriors (solid lines) for nodes 0–7.

Plot of series-parallel system

In assessing the system reliability for the series-parallel system of Figure

Next, we consider two substantive applications from the literature [

Reference [

Data for series system example.

Node | Data | ||
---|---|---|---|

0 | 115/265 | 265 | |

1 | 8/8 | ||

2 | 7/8 | ||

3 | 191/205 | 257/269 | 269 |

4 | 55/66 | 66 | |

5 | |||

11 | 30/30 | 0.5 | 1 |

12 | 80/80 | 0.5 | 1 |

13 | 39/40 | 0.5 | 1 |

14 | 30/30 | 0.5 | 1 |

15 | 90/90 | 846/848 | 848 |

16 | 10/10 | 0.5 | 1 |

17 | 29/30 | 0.5 | 1 |

18 | 20/20 | 0.5 | 1 |

19 | 5/5 | 0.5 | 1 |

21 | 50/50 | 399/402 | 402 |

22 | 50/50 | 278/302 | 302 |

23 | 99/100 | 1098/1102 | 1102 |

24 | 23/25 | 654/690 | 690 |

25 | 50/50 | 299/301 | 302 |

26 | 55/55 | 348/352 | 352 |

31 | 129/130 | 246/250 | 250 |

32 | 130/130 | 245/250 | 250 |

33 | 129/130 | 247/250 | 250 |

34 | 129/130 | 272/276 | 276 |

35 | 130/130 | 357/360 | 360 |

36 | 247/250 | 254/257 | 257 |

37 | 129/130 | 250/252 | 252 |

38 | 249/250 | 250/252 | 252 |

39 | 330/330 | 341/352 | 352 |

41 | 797/802 | 802 | |

42 | 796/802 | 802 | |

43 | 794/802 | 802 | |

44 | 791/802 | 802 | |

45 | 386/402 | 402 | |

51 | 1026/1122 | 1122 | |

52 | 1087/1092 | 1092 | |

53 | 1084/1092 | 1092 |

Series system example reliability block diagram.

To compare with [

Comparison of posteriors for series system example (0.05, 0.5, 0.95 quantiles).

Node | Fully Bayesian | Reference [ |
---|---|---|

0 | (0.393, 0.436, 0.479) | (0.403, 0.463, 0.525) |

1 | (0.588, 0.655, 0.723) | (0.701, 0.851, 0.947) |

2 | (0.820, 0.848, 0.873) | (0.830, 0.858, 0.883) |

3 | (0.901, 0.917, 0.931) | (0.908, 0.927, 0.944) |

4 | (0.886, 0.908, 0.925) | (0.858, 0.898, 0.931) |

5 | (0.926, 0.945, 0.961) | (0.889, 0.904, 0.918) |

Plot of series system example reliability posteriors. (Dashed lines are from [

Note that there is quite a difference for the subsystem 1 results. The difference in location is due to the fact that the approximations used in [

Reference [

Data for complex series-parallel system example.

Node | Data | ||
---|---|---|---|

0 | |||

1 | |||

2 | |||

11 | |||

12 | 242.87/244.66 | 244.66 | |

111 | 1.55/1.58 | 1.58 | |

112 | 1.55/1.58 | 1.58 | |

121 | 240/240 | 470.13/471.90 | 471.90 |

122 | 240/240 | 14232.34/14234.12 | 14234.12 |

1111 | 236/240 | 191.17/191.79 | 191.79 |

1112 | 240/240 | 14232.34/14234.12 | 14234.12 |

1121 | 238/240 | 191.17/191.79 | 191.79 |

1122 | 240/240 | 14232.34/14234.12 | 14234.12 |

21 | |||

22 | 242.87/244.66 | 244.66 | |

211 | 1.55/1.58 | 1.58 | |

212 | 1.55/1.58 | 1.58 | |

221 | 240/240 | 470.13/471.90 | 471.90 |

222 | 240/240 | 14232.34/14234.12 | 14234.12 |

2111 | 240/240 | 191.17/191.79 | 191.79 |

2112 | 240/240 | 14232.34/14234.12 | 14234.12 |

2121 | 238/240 | 191.17/191.79 | 191.79 |

2122 | 240/240 | 14232.34/14234.12 | 14234.12 |

Complex series-parallel system example reliability block diagram.

Martz and Waller [

We treat the precisions as constants as in [

Posterior summaries for complex series-parallel system example (0.05, 0.5, 0.95 quantiles).

Node | |
---|---|

0 | (0.99990, 0.99996, 0.99998) |

1 | (0.98835, 0.99354, 0.99678) |

2 | (0.98812, 0.99342, 0.99680) |

11 | (0.99962, 0.99985, 0.99995) |

12 | (0.98835, 0.99354, 0.99678) |

111 | (0.97371, 0.98505, 0.99311) |

112 | (0.97990, 0.98990, 0.99568) |

121 | (0.98853, 0.99373, 0.99696) |

122 | (0.99962, 0.99985, 0.99996) |

1111 | (0.97389, 0.98519, 0.99329) |

1112 | (0.99962, 0.99986, 0.99996) |

1121 | (0.98010, 0.99008, 0.99584) |

1122 | (0.99961, 0.99985, 0.99996) |

21 | (0.99983, 0.99995, 0.99999) |

22 | (0.98812, 0.99342, 0.99680) |

211 | (0.98681, 0.99468, 0.99847) |

212 | (0.98040, 0.98994, 0.99599) |

221 | (0.98826, 0.99359, 0.99697) |

222 | (0.99962, 0.99985, 0.99996) |

2111 | (0.98697, 0.99485, 0.99865) |

2112 | (0.99961, 0.99986, 0.99996) |

2121 | (0.98054, 0.99011, 0.99614) |

2122 | (0.99962, 0.99985, 0.99996) |

Plot of complex series-parallel system example reliability posteriors for subsystems and system.

In Section

Thus, we assume that there is a cost for collecting additional data with higher-level data being more costly than lower-level data. Consider the following costs as an example of the costs for testing at each node. Recall that node 0 is the system, nodes 1 and 2 are subsystems and nodes 3–7 are components:

We evaluate a candidate allocation (i.e., a specified number of tests for each of the eight nodes) using a preposterior-based criterion as follows. We take a draw from the current joint posterior distribution (based on the current data) of the node reliabilities and draw binomial data according to the candidate allocation. Then we combine these new data with the current data using the same prior distributions to obtain an updated posterior distribution of the node reliabilities; again we use MCMC to obtain

Briefly, we describe how a GA can be used to find a nearly optimal allocation. A GA operates on a “population” of candidate allocations, where a candidate allocation is a vector of node test sizes. The GA begins by constructing an initial population or generation of

In the implementation, there are a number of issues regarding the choice of

One might ask if there are any general insights regarding resource allocation with assessment of system reliability in mind. If we consider testing at the same level, for components (or subsystem), the component (or subsystem) with the most uncertainty will require more testing than the others. If the subsystems are connected in series, but some subsystems have components connected in series where as other subsystems have components connected in parallel, in terms of component testing, the parallel configured subsystems will require less testing; this can be explained by examining the subsystem reliability expression, which shows that the reliability of series configured subsystems is of second order in their component reliabilities, where as that for parallel configured subsystems is of first order. The allocation will also depend on the testing costs relative to the amount of uncertainty reduction that it provides. If we consider a series configured subsystem, if the subsystem cost exceeds the sum of the components costs, then performing components tests will be recommended; if the subsystem cost is less than the sum of the components costs, then performing some subsystem tests may be recommended if they provide relatively more information. But for complicated systems with many subsystems and components whose costs are all different, it will be difficult to choose an optimal allocation with these rules of thumb. However, the proposed methodology balances all these costs and information across the entire system in finding a nearly optimal allocation.

Next, we illustrate the GA for the resource allocation problem described above for the series-parallel system depicted in Figure

Based on the proposed methodology described above, the GA produced the traces presented in Figures

GA evolution of uncertainty criterion.

GA evolution of resource allocation. Nodes 2 and 5–7 test sizes are identified.

For relatively complex systems, we have illustrated how to respond to the challenge of integrating all information available at the various levels of a system in order to estimate its reliability. Bayesian models have always been natural for doing this integration, and the computational tools have now caught up to make this practical. Moreover, because we are able to analyze such data, we can now consider the problem of allocating additional resources that best reduce the uncertainty in the system reliability assessment.

We have discussed the case of binomial test data only for systems represented by reliability block diagrams. Reference [

The authors thank C. C. Essix for her encouragement of this work and Vivian Romero for her assistance in producing the reliability block diagram figures used in this paper. We also thank the referees for helpful comments that improved the presentation of this paper.