Why you can’t calculate aging trajectories with a standard regression

I found myself in a little Twitter discussion last week about using regression to analyze player aging. I argued that regression won’t give you accurate results, and that the less elegant “delta method” is the better way to go.Although I did a small example to try to make my point, Tango suggested I do a bigger simulation and a blog post. That’s this.

Read Full Story

The Best Of Both Worlds: Hierarchical Linear Regression in PyMC3

The best of both worlds: Hierarchical Linear Regression in PyMC3¶
Today’s blog post is co-written by my student Danne Elbers who is doing her masters thesis with me on computational psychiatry using Bayesian methods. This post also borrows heavily from a Notebook by Chris Fonnesbeck.
The power of Bayesian modelling really clicked for me when I was first introduced to hierarchical modelling.

Read Full Story

To do: Construct a build-your-own-relevant-statistics-class kit.

Alexis Lerner, who took a couple of our courses on applied regression and communicating data and statistics, designed a new course, “Jews: By the Numbers,” at the University of Toronto:
But what does it mean to work with data and statistics in a Jewish studies course?

Read Full Story

De l’abus de notation dans les modèles de régression

De manière un peu rituelle, je commence toujours mon cours de régression en revenant sur un point important de la statistique : les abus de notation !  Car tout le monde utilise les mêmes lettres (surtout les grecques) pour désigner des objets de nature différente. Dans la majorité des livres, on pourra nous dire sur la même page que widehat{theta}=2.35 et que text{Var}(widehat{theta})=1.

Read Full Story

Why Do We Plot Predictions on the x-axis?

When studying regression models, One of the first diagnostic plots most students learn is to plot residuals versus the model’s predictions (that is, with the predictions on the x-axis). Here’s a basic example.
# build an “ideal” linear process.
set.seed(34524)
N = 100
x1 = runif(N)
x2 = runif(N)
noise = 0.25*rnorm(N)
y = x1 + x2 + noise
df = data.

Read Full Story

How to Evaluate the Logistic Loss and not NaN trying

A naive implementation of the logistic regression loss can results in numerical indeterminacy even for moderate values. This post takes a closer look into the source of these instabilities and discusses more robust Python implementations.
hljs.initHighlightingOnLoad();
MathJax.Hub.Config({
extensions: [“tex2jax.

Read Full Story

Logistic Regression in R: A Classification Technique to Predict Credit Card Default

Logistic regression is one of the statistical techniques in machine learning used to form prediction models. It is one of the most popular classification algorithms mostly used for binary classification problems (problems with two class values; however, some variants may deal with multiple classes as well). It’s used for various research and industrial problems.

Read Full Story