Skip to main content

Reliability Estimation in Series Systems: Maximum Likelihood Techniques for Right-Censored and Masked Failure Data

This is my master’s thesis in mathematics. The problem: you have a series system (fails when any component fails), you can observe system-level failure times, but you often can’t tell which component actually caused the failure. The failure cause is “masked.” On top of that, some systems are still running at the end of the study, so their lifetimes are right-censored. You want to estimate the reliability of individual components from this incomplete data.

The challenge

Estimating component reliability is hard when:

  • You only observe system-level failure data
  • The exact component cause of failure is ambiguous (masked)
  • System lifetimes are right-censored
  • Sample sizes are small

A series system fails when any component fails, so disentangling which components are weakest from system-level observations is a non-trivial inference problem.

Likelihood model for masked data

I developed a likelihood model that handles two types of incompleteness.

Right-censoring: the system is observed until time τ\tau, but may not have failed yet:

Si=min{τi,Ti} S_i = \min\lbrace \tau_i, T_i\rbrace

δi=1Ti<τi \delta_i = \mathbb{1}_{T_i < \tau_i}

Component cause masking: when the system fails, you observe a candidate set Ci\mathcal{C}_i containing the failed component, but can’t pinpoint the exact cause.

Under three conditions (which hold in many industrial settings), the likelihood contribution simplifies to:

Li(θ)[j=1mRj(si;θj)]×[jCihj(si;θj)]δi L_i(\theta) \propto \left[\prod_{j=1}^m R_j(s_i; \theta_j)\right] \times \left[\sum_{j \in \mathcal{C}_i} h_j(s_i; \theta_j)\right]^{\delta_i}

where RjR_j is the reliability function and hjh_j is the hazard function of component jj. The three conditions are: the candidate set always contains the true failed component, masking probability is uniform across components in the candidate set, and masking probabilities don’t depend on the system parameters θ\theta.

Weibull series systems

I focused on components with Weibull lifetimes: TijWeibull(kj,λj)T_{ij} \sim \text{Weibull}(k_j, \lambda_j). The shape parameter kjk_j tells you the failure behavior: k<1k < 1 is infant mortality, k=1k = 1 is random failures (exponential), k>1k > 1 is wear-out.

System reliability when all components are Weibull:

RTi(t;θ)=exp{j=1m(tλj)kj} R_{T_i}(t; \theta) = \exp\left\lbrace -\sum_{j=1}^m \left(\frac{t}{\lambda_j}\right)^{k_j}\right\rbrace

The hazard function is additive:

hTi(t;θ)=j=1mkjλj(tλj)kj1 h_{T_i}(t; \theta) = \sum_{j=1}^m \frac{k_j}{\lambda_j}\left(\frac{t}{\lambda_j}\right)^{k_j-1}

Simulation studies

I ran extensive simulations varying three factors:

Right-censoring impact (q = 60% to 100%): Scale parameters showed positive bias with censoring. Shape parameters were more sensitive than scale parameters. The most reliable component was most affected by censoring. Convergence rate exceeded 95% for q >= 0.7.

Masking probability (p = 10% to 70%): Scale parameter confidence intervals were correctly specified up to p = 0.7 (over 90% coverage). Shape parameter CIs were correctly specified up to p = 0.4. Convergence rate exceeded 95% for p <= 0.4. Bias increases with masking probability.

Sample size (n = 50 to 500): At n = 250, the estimator was essentially unbiased despite moderate censoring and masking. Convergence rate exceeded 95% for n >= 100. Precision improved rapidly with sample size. Small samples (n < 100) need caution.

BCa bootstrap confidence intervals

I used bias-corrected and accelerated (BCa) bootstrap for confidence intervals. It adjusts for bias in the bootstrap distribution and accounts for parameter-dependent distribution shape.

Results: good coverage probability (above 90%) even for small samples. Scale parameters were better calibrated than shape parameters. Coverage approached the nominal 95% as sample size increased.

What works, what doesn’t

The MLE approach works well for moderate sample sizes (n >= 100), gives reasonable precision with well-specified confidence intervals, achieves high convergence rates under realistic conditions, and is robust for well-designed series systems.

Where it struggles: shape parameters are harder to estimate than scale parameters, the most reliable components need more data (they fail less often, so you see fewer of their failures), small samples need careful interpretation, and severe masking or censoring degrades things.

Practical applications

This methodology applies to industrial reliability testing with incomplete failure diagnosis, electronic systems with board-level diagnostics, mechanical systems where root cause analysis is expensive, or any series system with Weibull component lifetimes.

Implementation

Full R implementation: wei.series.md.c1.c2.c3

Includes analytical score function for efficient optimization, L-BFGS-B solver with bound constraints, BCa bootstrap confidence intervals, Bernoulli masking model for simulations, and a complete simulation framework.

Full thesis

View complete PDF (40 pages)

For more on this line of research, see my research page and publications.

Discussion