statsandstuff a blog on statistics and machine learning

Randomized response

Randomized response is a technique, introduced in the mid-1960s, to survey people about sensitive topics.  Suppose a teacher decides to survey his students on whether they cheat.  Rather than directly asking each student “Have you cheated?,” a question cheaters are unlikely to answer truthfully, the teacher can employ randomized response.  Rando... Read more

Multiple Hypothesis Tests

In hypothesis testing, rejecting the null hypothesis typically means we found something interesting — a drug performs better than the placebo, there is a difference between two fertilizers, the new textbook helps students achieve better test scores than the current one, etc. Rejecting the null hypothesis when it is true amounts to a false discov... Read more

Interpreting regression coefficients

Suppose we regress a response \(Y\) on covariates \(X_j\) for \(j = 1 \ldots p\).  In linear regression, we get the model \[Y = \beta_0 + \beta_1 X_1 + \ldots \beta_p X_p.\] How do we interpret \(\beta_j = 0\)?  Does it mean that the \(j\)th covariate is uncorrelated with the response? The answer is no!  It means the \(j\)th covariate is uncor... Read more

Fun with GPS

In this post, I want to discuss the number of GPS satellites you need to determine your location. I illustrate with an example. Suppose Joe is lost in London, but knows where certain landmarks are, e.g., he knows where the library is, and where the museum and Buckingham palace are.  Joe stops me in the street to find his bearings.  I tell him,... Read more

Distinguishing proportions: the risk ratio

Suppose I conduct an experiment to determine whether to use font A or font B in an online ad.  After running the experiment, I find that there is a 1% chance that a user clicks on the font A ad, and a 0.8% chance that the user clicks on the font B ad. We can can compare these click rates on an absolute scale (font A increases the click rate by ... Read more