Currently set to No Index

The hardest logarithm to compute

Suppose you want to compute the natural logarithms of every floating point number, correctly truncated to a floating point result. Here by floating point number we mean an IEEE standard 64-bit float, what C calls a double. Which logarithm is hardest to compute?
We’ll get to the hardest logarithm shortly, but we’ll first start with a warm up problem.

Read Full Story

Conversion rates – you are (most likely) computing them wrong

How hard can it be to compute conversion rate? Take the total number of users that converted and divide them with the total number of users. Done. Except… it’s a lot more complicated when you have any sort of significant time lag.
Prelude – a story
Fresh out of school I joined Spotify as the first data analyst. One of my first projects was to understand conversion rates.

Read Full Story

Computed IDs and privacy implications

Thirty years ago, a lot of US states thought it would be a good idea to compute someone’s drivers license number (DLN) from their personal information [1]. In 1991, fifteen states simply used your Social Security Number as your DLN. Eleven other states computed DLNs by applying a hash function to personal information such as name, birth date, and sex.

Read Full Story

From Infinite Matrices to New Integration Formula

This is another interesting problem, off-the-beaten-path. It ends up with a formula to compute the integral of a function, based on its derivatives solely. 
For simplicity, I’ll start with some notations used in the context of matrix theory, familiar to everyone: T(f) = g, where f and g are vectors, and T a square matrix.

Read Full Story

Computing PCA with SVD

PCA is a great tool for performing dimensionality reduction. Two reason you might want to use SVD to compute PCA:
SVD is more numerically stable if the columns are close to collinear. I have seen this happen in text data, when certain terms almost always appear together.
Spark’s PCA implementation currently doesn’t support very wide matrices. The SVD implementation does however.

Read Full Story