Masked Failure Data: Looking Back, Looking Forward

February 18, 2026

I have been working on the same statistical problem since 2020. I am now a PhD student in CS. The problem has not changed, but my understanding of it has, and the tools I have built around it look nothing like what I started with.

The problem: a series system fails when any component fails. You observe system-level failure times. But you often cannot tell which component caused the failure (masking). Some systems are still running when testing ends (censoring). Given this incomplete data, estimate component reliability.

This is not a tutorial. It is a map of where things stand and where they are going.

Read More

Observation Functors: Composable Censoring for Series System Simulation

February 13, 2026

Last week I announced maskedcauses, the R package for estimating component reliability from masked series system failures. That post covered the three likelihood models and the path to CRAN.

This post is about what happened next: the package now supports four observation types (exact, right-censored, left-censored, and interval-censored) via composable observation functors. Along the way, I wrote four vignettes, removed the md.tools dependency, and developed a verification methodology for keeping prose honest about simulation results.

Read More

maskedcauses: Maximum Likelihood Estimation for Masked Series System Failures

February 5, 2026

Note (February 2026): This package has been renamed from likelihood.model.series.md to maskedcauses.

Two days ago, I submitted likelihood.model to CRAN, the foundation package for composable statistical inference. Next in line: maskedcauses, which implements maximum likelihood estimation for series systems where component failure causes are masked.

This package is the practical result of my master’s thesis work. Three years of theoretical development, now packaged for anyone analyzing masked failure data.

The Problem: Masked Component Failures

A series system fails when any of its \(m\) components fails. In reliability testing, you observe the system fail at time \(t\), but two layers of uncertainty obscure the full picture:

  1. Right-censoring: Some systems are still running when testing ends. You know they survived at least until time \(\tau\), but not how much longer they would have lasted.

  2. Masked cause of failure: When a system fails, you often can’t identify which component caused it. Diagnostic tests might narrow it down to a candidate set of possible causes, but the true failure component remains ambiguous.

This happens constantly in practice. Electronic systems fail with only board-level diagnostics. Industrial machinery fails without root-cause teardown. Medical devices fail with symptoms pointing to multiple possible subsystems.

The question: given this incomplete information, can you still estimate the lifetime distribution of each component?

The Package: Three Likelihood Models

maskedcauses provides three models with different complexity-accuracy tradeoffs:

ModelParametersUse Case
exp_series_md_c1_c2_c3\(m\) rates \((\lambda_1, \ldots, \lambda_m)\)Memoryless components (constant failure rate)
wei_series_md_c1_c2_c3\(2m\) params \((k_1, \beta_1, \ldots, k_m, \beta_m)\)Weibull with per-component shapes
wei_series_homogeneous_md_c1_c2_c3\(m+1\) params \((k, \beta_1, \ldots, \beta_m)\)Weibull with shared shape parameter

Each model implements the full inference stack: loglik(), score(), hess_loglik(), rdata(), and assumptions().

The C1-C2-C3 Conditions

The models assume three conditions that simplify the likelihood:

  • C1: The failed component is in the candidate set with probability 1
  • C2: Given the failed component is in the candidate set, masking probability is uniform across candidates
  • C3: Masking probabilities are independent of system parameters \(\theta\)

Under these conditions, the masking mechanism factors out of the likelihood. You can estimate component parameters without modeling the diagnostic process itself. That’s why the package name includes “c1_c2_c3”.

Read More

Weibull Distributions: From Reliability Theory to My Own Survival Curve

April 18, 2022

The Weibull distribution models time-to-failure. In reliability engineering, that means component lifetimes. In medicine, it means survival times. I have been working with Weibull models for my thesis on series system reliability. Then I got diagnosed with cancer, and now every time I work with survival curves, I am looking at mathematical abstractions of something very concrete: how long until failure?

The Mathematics

The Weibull CDF:

F(t) = 1 - exp(-(t/λ)^k)

Two parameters:

  • λ: scale (characteristic lifetime)
  • k: shape (how failure rate changes over time)

The shape parameter k tells you the whole story:

k < 1: Decreasing hazard. If you survive early on, your risk goes down. This is the infant mortality pattern.

k = 1: Constant hazard. Memoryless. This is just the exponential distribution.

k > 1: Increasing hazard. Things wear out.

The Hazard Function

The hazard function is what makes Weibull useful for survival analysis:

h(t) = (k/λ)(t/λ)^(k-1)

This is the instantaneous failure rate: given that you have survived to time t, what is the probability you fail in the next instant?

For cancer, this is the number that matters. Some cancers have increasing hazard (the longer you have it, the worse things get). Others have decreasing hazard after initial treatment, meaning if you make it past the critical period, prognosis improves. Knowing which pattern applies to your disease changes how you think about time.

Personal Context

When you study survival analysis academically, it is abstract. When you are living it, every curve is personal.

I look at Kaplan-Meier plots and see myself somewhere on that curve. I work with hazard functions and think: is my k > 1 or k < 1? Am I in the wearing-out regime or the if-you-make-it-past-this-it-gets-better regime?

The math does not change. But the meaning does.

The Irony

I chose reliability engineering for my thesis before the cancer diagnosis. I was studying component failures in series systems, where if any one part fails, the whole system fails.

Then I became a series system. Organs, treatment response, immune function. All have to work. Failure of any one is catastrophic.

The mathematics I was studying abstractly became uncomfortably literal.

Read More

Reliability Analysis and the Problem of Censored Data

August 14, 2019

One of the most interesting statistical problems I have encountered is reliability analysis with censored data: situations where you know something didn’t fail, but not when it will fail.

The Censoring Problem

Imagine testing light bulbs. You run them for 1000 hours. Some fail during the test. Others are still working when you stop.

For the survivors, you know:

  • They lasted at least 1000 hours
  • You do not know their actual lifetime

This is right censoring. The true value lies somewhere to the right of your observation. You have a lower bound, not a measurement.

Why This Matters

Censored data is everywhere:

  • Medical studies (patients still alive at study end)
  • Engineering tests (components that have not failed)
  • Customer retention (users still active)

The naive responses are both wrong. Ignoring censored observations wastes information. Treating them as failures introduces bias. You need a framework that uses the partial information you actually have.

Maximum Likelihood to the Rescue

The solution is maximum likelihood estimation with likelihood contributions that account for censoring:

  • Failure observations contribute the probability density \(f(t)\). You observed the exact failure time, so you know the probability of failing at that time.
  • Censored observations contribute the survival probability \(S(t)\). You know the unit survived to time \(t\), so its contribution is the probability of surviving at least that long.

The likelihood for the whole sample is:

$$L = \prod_{i: \text{failed}} f(t_i) \prod_{j: \text{censored}} S(t_j)$$

This lets you extract information from both failed and surviving units. The censored observations pull the estimated reliability upward; the failures pull it downward. Maximum likelihood balances them.

Series Systems Complexity

It gets more interesting with series systems, systems that fail when any component fails. If you observe system failure but do not know which component caused it, you have masked failure data.

This is the problem I am most interested in: extracting component-level reliability from system-level failures when the cause is ambiguous. The masking adds a latent variable, and the likelihood becomes a mixture. You can handle it with EM algorithms or direct optimization, but the combinatorics grow quickly with system size.

This work is laying groundwork for what will become a major focus of my mathematical statistics degree.