Weibull Distributions: From Reliability Theory to My Own Survival Curve

April 18, 2022

The Weibull distribution models time-to-failure. In reliability engineering, that means component lifetimes. In medicine, it means survival times. I have been working with Weibull models for my thesis on series system reliability. Then I got diagnosed with cancer, and now every time I work with survival curves, I am looking at mathematical abstractions of something very concrete: how long until failure?

The Mathematics

The Weibull CDF:

F(t) = 1 - exp(-(t/λ)^k)

Two parameters:

  • λ: scale (characteristic lifetime)
  • k: shape (how failure rate changes over time)

The shape parameter k tells you the whole story:

k < 1: Decreasing hazard. If you survive early on, your risk goes down. This is the infant mortality pattern.

k = 1: Constant hazard. Memoryless. This is just the exponential distribution.

k > 1: Increasing hazard. Things wear out.

The Hazard Function

The hazard function is what makes Weibull useful for survival analysis:

h(t) = (k/λ)(t/λ)^(k-1)

This is the instantaneous failure rate: given that you have survived to time t, what is the probability you fail in the next instant?

For cancer, this is the number that matters. Some cancers have increasing hazard (the longer you have it, the worse things get). Others have decreasing hazard after initial treatment, meaning if you make it past the critical period, prognosis improves. Knowing which pattern applies to your disease changes how you think about time.

Personal Context

When you study survival analysis academically, it is abstract. When you are living it, every curve is personal.

I look at Kaplan-Meier plots and see myself somewhere on that curve. I work with hazard functions and think: is my k > 1 or k < 1? Am I in the wearing-out regime or the if-you-make-it-past-this-it-gets-better regime?

The math does not change. But the meaning does.

The Irony

I chose reliability engineering for my thesis before the cancer diagnosis. I was studying component failures in series systems, where if any one part fails, the whole system fails.

Then I became a series system. Organs, treatment response, immune function. All have to work. Failure of any one is catastrophic.

The mathematics I was studying abstractly became uncomfortably literal.

Read More

Reliability Analysis and the Problem of Censored Data

August 14, 2019

One of the most interesting statistical problems I have encountered is reliability analysis with censored data: situations where you know something didn’t fail, but not when it will fail.

The Censoring Problem

Imagine testing light bulbs. You run them for 1000 hours. Some fail during the test. Others are still working when you stop.

For the survivors, you know:

  • They lasted at least 1000 hours
  • You do not know their actual lifetime

This is right censoring. The true value lies somewhere to the right of your observation. You have a lower bound, not a measurement.

Why This Matters

Censored data is everywhere:

  • Medical studies (patients still alive at study end)
  • Engineering tests (components that have not failed)
  • Customer retention (users still active)

The naive responses are both wrong. Ignoring censored observations wastes information. Treating them as failures introduces bias. You need a framework that uses the partial information you actually have.

Maximum Likelihood to the Rescue

The solution is maximum likelihood estimation with likelihood contributions that account for censoring:

  • Failure observations contribute the probability density \(f(t)\). You observed the exact failure time, so you know the probability of failing at that time.
  • Censored observations contribute the survival probability \(S(t)\). You know the unit survived to time \(t\), so its contribution is the probability of surviving at least that long.

The likelihood for the whole sample is:

$$L = \prod_{i: \text{failed}} f(t_i) \prod_{j: \text{censored}} S(t_j)$$

This lets you extract information from both failed and surviving units. The censored observations pull the estimated reliability upward; the failures pull it downward. Maximum likelihood balances them.

Series Systems Complexity

It gets more interesting with series systems, systems that fail when any component fails. If you observe system failure but do not know which component caused it, you have masked failure data.

This is the problem I am most interested in: extracting component-level reliability from system-level failures when the cause is ambiguous. The masking adds a latent variable, and the likelihood becomes a mixture. You can handle it with EM algorithms or direct optimization, but the combinatorics grow quickly with system size.

This work is laying groundwork for what will become a major focus of my mathematical statistics degree.