Master's Project: Reliability Estimation in Series Systems

February 19, 2024

I presented my master’s project in October 2023, finishing up my MS in statistics/mathematics at SIUE. The associated paper is titled “Reliability Estimation in Series Systems: Maximum Likelihood Techniques for Right-Censored and Masked Failure Data.”

The Problem

In reliability engineering, you often find yourself in an annoying situation: a system fails, but you do not know which component caused the failure. This is called masked failure data. On top of that, some systems are still running when you stop observing them, so you only know they survived at least that long. That is right censoring. Both are common in practice. Identifying the exact failed component is expensive or sometimes impossible.

The project builds a likelihood-based framework that handles both masking and censoring simultaneously, models component lifetimes with Weibull distributions, derives closed-form Fisher information for the exponential special case, and provides bootstrap methods for uncertainty quantification. I implemented it all in an R package so practitioners can actually use it.

This connects to several other posts and projects:

See the full project page here.

mdrelax: When Masking Conditions Don't Hold

December 3, 2025

mdrelax extends my work on series system reliability by handling cases where the standard masking assumptions break down.

Background: The C1-C2-C3 Framework

My master’s thesis developed maximum likelihood techniques for series systems with masked failure data. The standard framework assumes three conditions:

  • C1: The failed component is always in the candidate set
  • C2: Non-informative masking (uniform probability within candidate set)
  • C3: Masking mechanism is independent of system parameters

When these hold, the masking probabilities factor out and you can ignore them for parameter estimation. The expo-masked-fim paper derives closed-form Fisher Information for the exponential case, and maskedcauses implements the general framework.

The Problem

In practice, C2 and C3 are often violated.

Informative masking (C2 violation): Diagnostic tests may be better at identifying certain failure modes than others. A component that fails catastrophically is easier to identify than one that degrades subtly.

Parameter-dependent masking (C3 violation): The masking mechanism might depend on component reliabilities. Components with shorter lifetimes fail more often, so technicians get more practice diagnosing them.

If you pretend C2 and C3 hold when they don’t, your parameter estimates are biased. Sometimes badly.

What mdrelax Does

The package implements likelihood-based inference with relaxed conditions:

library(mdrelax)

# Generate masked data with Bernoulli candidate sets
md <- md_bernoulli_cand_C1_C2_C3(data, p = 0.3)

# Sample candidate sets
md <- md_cand_sampler(md)

# MLE for exponential series system
fit <- md_mle_exp_series_C1_C2_C3(md)

# Fisher information matrix
fim <- md_fim_exp_series_C1_C2_C3(md, params(fit))

Key Features

  • Flexible masking models: Bernoulli, rank-based, KL-divergence constrained
  • Identifiability analysis: Tools to check when parameters can actually be estimated
  • Fisher information: Efficiency analysis under relaxed conditions
  • Simulation utilities: Monte Carlo studies for method validation

Relationship to Other Work

This package sits at the end of a progression toward generality:

ProjectFocus
expo-masked-fimClosed-form FIM for exponential case
maskedcausesGeneral R framework for masked data likelihood
reliability-estimation-in-series-systemsMaster’s thesis implementation
wei.series.md.c1.c2.c3Weibull series systems under C1-C2-C3
mdrelaxRelaxed conditions (C2, C3 violations)

The progression:

  1. Exponential + C1-C2-C3: Closed-form solutions
  2. Weibull + C1-C2-C3: Numerical MLE
  3. Weibull + relaxed conditions: mdrelax

Each step trades analytical tractability for realism.

When to Use It

Use mdrelax when you suspect:

  • Diagnostic accuracy varies by component type
  • Masking patterns correlate with component reliabilities
  • Standard C1-C2-C3 assumptions are too restrictive for your data

The trade-off is real: relaxed models have more parameters and may need larger samples for reliable estimation. But biased estimates from wrong assumptions aren’t free either.

Read More

Weibull Distributions: From Reliability Theory to My Own Survival Curve

April 18, 2022

The Weibull distribution models time-to-failure. In reliability engineering, that means component lifetimes. In medicine, it means survival times. I have been working with Weibull models for my thesis on series system reliability. Then I got diagnosed with cancer, and now every time I work with survival curves, I am looking at mathematical abstractions of something very concrete: how long until failure?

The Mathematics

The Weibull CDF:

F(t) = 1 - exp(-(t/λ)^k)

Two parameters:

  • λ: scale (characteristic lifetime)
  • k: shape (how failure rate changes over time)

The shape parameter k tells you the whole story:

k < 1: Decreasing hazard. If you survive early on, your risk goes down. This is the infant mortality pattern.

k = 1: Constant hazard. Memoryless. This is just the exponential distribution.

k > 1: Increasing hazard. Things wear out.

The Hazard Function

The hazard function is what makes Weibull useful for survival analysis:

h(t) = (k/λ)(t/λ)^(k-1)

This is the instantaneous failure rate: given that you have survived to time t, what is the probability you fail in the next instant?

For cancer, this is the number that matters. Some cancers have increasing hazard (the longer you have it, the worse things get). Others have decreasing hazard after initial treatment, meaning if you make it past the critical period, prognosis improves. Knowing which pattern applies to your disease changes how you think about time.

Personal Context

When you study survival analysis academically, it is abstract. When you are living it, every curve is personal.

I look at Kaplan-Meier plots and see myself somewhere on that curve. I work with hazard functions and think: is my k > 1 or k < 1? Am I in the wearing-out regime or the if-you-make-it-past-this-it-gets-better regime?

The math does not change. But the meaning does.

The Irony

I chose reliability engineering for my thesis before the cancer diagnosis. I was studying component failures in series systems, where if any one part fails, the whole system fails.

Then I became a series system. Organs, treatment response, immune function. All have to work. Failure of any one is catastrophic.

The mathematics I was studying abstractly became uncomfortably literal.

Read More

Bootstrap Methods: When Theory Meets Computation

September 10, 2021

The bootstrap is a trade: mathematical complexity for computational burden. Instead of deriving analytical formulas for sampling distributions, you simulate them.

The Idea

If you don’t know the sampling distribution of a statistic, approximate it by resampling from your data.

  1. Draw samples with replacement from the original data
  2. Compute your statistic on each resample
  3. The distribution of resampled statistics approximates the true sampling distribution

That’s it. The justification is more subtle than the procedure. Under regularity conditions, the bootstrap distribution converges to the true sampling distribution as sample size grows. This is non-parametric inference: you use the empirical distribution as a stand-in for the true distribution, without assuming a parametric form.

When I Use It

Bootstrap is my default tool when:

  • I need confidence intervals for statistics with no closed-form variance
  • Asymptotic theory doesn’t apply (small samples, non-standard statistics)
  • I’m doing model selection via bootstrap cross-validation
  • I’m working with censored data where standard errors are intractable

That last case is the one that matters most for my research.

The Computational Trade

Better to get the right answer slowly than the wrong answer quickly.

Deriving an analytical variance formula is hard. Sometimes it’s impossible for the statistic you actually care about. Bootstrap says: just compute the statistic 10,000 times on resampled data and look at the spread. With modern hardware, 10,000 resamples takes seconds.

The trade is almost always worth it.

My Thesis Work

My research uses bootstrap heavily. I’m working on reliability estimation for series systems where components fail and you don’t know which one caused the system failure. This is the masked failure data problem.

For these models, the MLE exists and you can compute it, but the standard variance formulas don’t. The Fisher information matrix involves expectations over the masking distribution that don’t simplify to anything closed-form.

Bootstrap gives me confidence intervals anyway. Resample the masked failure data, recompute the MLE on each resample, and use the distribution of bootstrapped MLEs to construct intervals. It’s not elegant, but it works, and “works” is the right criterion when the alternative is “no confidence intervals at all.”

Reliability Analysis and the Problem of Censored Data

August 14, 2019

One of the most interesting statistical problems I have encountered is reliability analysis with censored data: situations where you know something didn’t fail, but not when it will fail.

The Censoring Problem

Imagine testing light bulbs. You run them for 1000 hours. Some fail during the test. Others are still working when you stop.

For the survivors, you know:

  • They lasted at least 1000 hours
  • You do not know their actual lifetime

This is right censoring. The true value lies somewhere to the right of your observation. You have a lower bound, not a measurement.

Why This Matters

Censored data is everywhere:

  • Medical studies (patients still alive at study end)
  • Engineering tests (components that have not failed)
  • Customer retention (users still active)

The naive responses are both wrong. Ignoring censored observations wastes information. Treating them as failures introduces bias. You need a framework that uses the partial information you actually have.

Maximum Likelihood to the Rescue

The solution is maximum likelihood estimation with likelihood contributions that account for censoring:

  • Failure observations contribute the probability density \(f(t)\). You observed the exact failure time, so you know the probability of failing at that time.
  • Censored observations contribute the survival probability \(S(t)\). You know the unit survived to time \(t\), so its contribution is the probability of surviving at least that long.

The likelihood for the whole sample is:

$$L = \prod_{i: \text{failed}} f(t_i) \prod_{j: \text{censored}} S(t_j)$$

This lets you extract information from both failed and surviving units. The censored observations pull the estimated reliability upward; the failures pull it downward. Maximum likelihood balances them.

Series Systems Complexity

It gets more interesting with series systems, systems that fail when any component fails. If you observe system failure but do not know which component caused it, you have masked failure data.

This is the problem I am most interested in: extracting component-level reliability from system-level failures when the cause is ambiguous. The masking adds a latent variable, and the likelihood becomes a mixture. You can handle it with EM algorithms or direct optimization, but the combinatorics grow quickly with system size.

This work is laying groundwork for what will become a major focus of my mathematical statistics degree.