statsandstuff a blog on statistics and machine learning

The relationship between correlation, mutual information, and p-values

Feature selection is often necessary before building a machine learning or statistical model, especially when there are many, many irrelevant features. To be more concrete, suppose we want to predict/explain some response \(Y\) using some features \(X_1, \ldots, X_k\). A natural first step is to find the features that are “most related” to the... Read more

Controlling error when testing many hypotheses

In a hypothesis test, we compute some test statistic \(T\) that is designed to distinguish between a null and alternative hypothesis.  We then compute the probability p(T) of observing a test statistic as large or more extreme as T under the null hypothesis, and reject the null hypothesis if the p-value p(T) is sufficiently small. (As an aside, ... Read more

Calibration in logistic regression and other generalized linear models

In general, scores returned by machine learning models are not necessarily well-calibrated probabilities (see my post on ROC space and AUC).  The probability estimates from a logistic regression model (without regularization) are partially calibrated, though.  In fact, many generalized linear models, including linear regression, logistic regress... Read more

Geometric interpretations of linear regression and ANOVA

In this post, I explore the connection of linear regression to geometry.  In particular, I discuss the geometric meaning of fitted values, residuals, and degrees of freedom.  Using geometry, I derive coefficient interpretations and discuss omitted variable bias.  I finish by connecting ANOVA (both hypothesis testing and power analysis) to the un... Read more

Inference based on entropy maximization

Entropy For a discrete random variable, the surprisal (or information content) of an outcome with probability \(p\) is \(-\log p\).  Rare events have a lot surprisal.  For a discrete random variable with \(n\) outcomes that occur with probabilities \(p_1, \ldots, p_n\), the entropy \(H\) is the average surprisal \[H(p_1,\ldots,p_n) = \sum_{i=1}... Read more