Estimating_es_conf_moving_avg_bootstrap

Below you will find pages that utilize the taxonomy term “Estimating_es_conf_moving_avg_bootstrap”

Bootstrap Methods: When Theory Meets Computation

September 10, 2021

The bootstrap is a trade: mathematical complexity for computational burden. Instead of deriving analytical formulas for sampling distributions, you simulate them.

The Idea

If you don’t know the sampling distribution of a statistic, approximate it by resampling from your data.

Draw samples with replacement from the original data
Compute your statistic on each resample
The distribution of resampled statistics approximates the true sampling distribution

That’s it. The justification is more subtle than the procedure. Under regularity conditions, the bootstrap distribution converges to the true sampling distribution as sample size grows. This is non-parametric inference: you use the empirical distribution as a stand-in for the true distribution, without assuming a parametric form.

When I Use It

Bootstrap is my default tool when:

I need confidence intervals for statistics with no closed-form variance
Asymptotic theory doesn’t apply (small samples, non-standard statistics)
I’m doing model selection via bootstrap cross-validation
I’m working with censored data where standard errors are intractable

That last case is the one that matters most for my research.

The Computational Trade

Better to get the right answer slowly than the wrong answer quickly.

Deriving an analytical variance formula is hard. Sometimes it’s impossible for the statistic you actually care about. Bootstrap says: just compute the statistic 10,000 times on resampled data and look at the spread. With modern hardware, 10,000 resamples takes seconds.

The trade is almost always worth it.

My Thesis Work

My research uses bootstrap heavily. I’m working on reliability estimation for series systems where components fail and you don’t know which one caused the system failure. This is the masked failure data problem.

For these models, the MLE exists and you can compute it, but the standard variance formulas don’t. The Fisher information matrix involves expectations over the masking distribution that don’t simplify to anything closed-form.

Bootstrap gives me confidence intervals anyway. Resample the masked failure data, recompute the MLE on each resample, and use the distribution of bootstrapped MLEs to construct intervals. It’s not elegant, but it works, and “works” is the right criterion when the alternative is “no confidence intervals at all.”

IEEE Paper: Estimating Encrypted Search Confidentiality via Bootstrap

November 2, 2016

This is my first IEEE publication, co-authored with Professor Hiroshi Fujinoki. The problem: if you encrypt search queries but an adversary can observe the ciphertext traffic, how many queries do they need before a frequency attack succeeds?

We used the Moving Average Bootstrap (MAB) method to estimate that threshold. The idea is that encrypted search leaks frequency information (how often each ciphertext appears), and an adversary can correlate those frequencies against known plaintext distributions. The bootstrap lets us estimate confidence intervals on the number of observations needed without closed-form solutions.

View PDF

This came out of my MS thesis work on encrypted search at SIU. The core question (how much does encrypted search actually leak?) turns out to be harder than it sounds, because the answer depends on the plaintext distribution, the query distribution, and how patient the adversary is. The bootstrap approach gives us a way to answer it empirically.

For more related work, see my research page and publications.