Statistics

Browse posts by category

February 19, 2024

Fine-Tuning Tiny LLMs for ElasticSearch DSL

I am creating a tiny LLM for ElasticSearch DSL as a proof of concept.

February 19, 2024

Approximations of Solomonoff Induction

I experiment with simple predictive / generative models to approximate Solomonoff induction for a relatively simple synthetic data-generating process.

large language models solomonoff induction synthetic data algorthmic data n-gram models

December 15, 2023

When Simplicity Unlocks Insight: Closed-Form Fisher Information for Masked Exponential Data

Sometimes making stronger assumptions doesn’t limit you—it illuminates the problem. This paper, developed before my master’s thesis, shows what happens when you simplify both the distribution (exponential) and the masking model: you get …

statistics fisher information masked data series systems reliability

March 29, 2023

Model Selection in Weibull Series Systems

In my paper, Reliability Estimation in Series Systems, I discarded a lot of research that may be interesting to pursue further. This one is about using homogeneous shape parameters for the Weibull series system, which can greatly simplify the …

maximum-likelihood-estimation data-generating-process statistics siue

June 30, 2022

likelihood.model: Composable Statistical Inference in R

Most R packages hardcode specific likelihood models. likelihood.model provides a generic framework where likelihoods are first-class composable objects—designed to work seamlessly with algebraic.mle for maximum likelihood estimation.

The Core Concept …

R statistics maximum likelihood inference

March 25, 2022

hypothesize: A Consistent API for Statistical Tests in R

R’s hypothesis testing functions are inconsistent—t.test() returns different structures than chisq.test(), making generic workflows painful. hypothesize provides a unified API so any test returns the same interface: p-value, test statistic, …

R statistics hypothesis testing likelihood ratio test

October 30, 2021

Computational Statistics - SIUe - STAT 575 - Problem Set 2

This problem set covers the E-M algorithm for right-censored normal data with known variance.

statistics R computation EM algorithm statistical inference

October 30, 2021

Review: A Symbolic Representation of Time Series, with Implications for Streaming Algorithms

In [1], the authors present a method for constructing a symbolic (nominal) representation for real-valued time series data. A symbolic representation is desirable because then it becomes possible to use many of the effective algorithms that require …

statistics time series computation symbolic data mining

August 20, 2021

dfr.dist: Specify the Hazard Function Directly

Most survival analysis forces you to pick from a catalog—Weibull, exponential, log-normal. dfr.dist flips this: you specify the hazard function directly, and it handles all the math.

The Core Insight

Instead of choosing Weibull(shape, scale), you …

R statistics reliability survival analysis dynamic failure rate

May 15, 2021

algebraic.mle: MLEs as First-Class Algebraic Objects

Maximum likelihood estimators have rich mathematical structure—they’re consistent, asymptotically normal, efficient. algebraic.mle exposes this structure through an algebra where MLEs are objects you compose, transform, and query.

The Core Idea …

R statistics maximum likelihood algebra

February 1, 2021

algebraic.dist: Treating Distributions as First-Class Algebraic Objects in R

Most statistical software treats probability distributions as static parameter sets you pass to sampling or density functions. algebraic.dist takes a different approach: distributions are algebraic objects that compose, transform, and combine using …

statistics probability distributions R functional programming