The Insights & Analytics Ecosystem of the Future

I submit that our Market Research industry has been experiencing what I’m calling an “Inverted Moore’s Law.”
I’m sure you are familiar with the concept of Moore’s Law… generated by Gordon Moore, co-founder of Intel in 1965, and largely proven today, that computing power would double every two years while costs halved.

Read Full Story

The software engineering rule of 3

Here’s a dumb extremely accurate rule I’m postulating* for software engineering projects: you need at least 3 examples before you solve the right problem.
This is what I’ve noticed:
Don’t factor out shared code between two classes. Wait until you have at least three.
The two first attempts to solve a problem will fail because you misunderstood the problem. The third time it will work.

Read Full Story

Netflix Prize Summary: Factorization Meets the Neighborhood

(Way back when, I went through all the Netflix prize papers. I’m now (very slowly) trying to clean up my notes and put them online. Eventually, I hope to have a more integrated tutorial, but here’s a rough draft for now.)
This is a summary of Koren’s 2008 Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model.

Read Full Story

I’m looking for data engineers

I’m interrupting the regular programming for a quick announcement: we’re looking for data engineers at Better. You would be the first one to join and would work a lot directly with me.
Some fun things you could work on (these are all projects I’m working on right now):
Building a forecasting model using MCMC to predict volume the next few months.

Read Full Story

New approximate nearest neighbor benchmarks

As some of you may know, one of my side interests is approximate nearest neighbor algorithms. I’m the author of Annoy, a library with 3,500+ stars on Github as of today. It offers fast approximate search for nearest neighbors with the additional benefit that you can load data super fast from disk using mmap.

Read Full Story

Taking a year to explain computer things

I’ve been working on explaining computer things I’m learning on this blog for 6 years.
I wrote one of my first posts, what does a shell even do? on
Sept 30, 2013. Since then, I’ve written 11 zines, 370,000 words on this blog, and
given 20 or so talks. So it seems like I like explaining things a lot.

Read Full Story

Software Engineering for Data Scientists and Successfully Delivering Data Science Projects – 2 courses for May

In May I’m running 2 courses:
May 9th – Successfully Delivering Data Science Projects
May 17th – Software Engineering for Data Scientists
The first is aimed at data scientists who have had a bad or wobbly delivery who want to learn better ways to design projects, derisk their stages and deliver more reliably.

Read Full Story

How to Build Your Own Blockchain Part 4.1 — Bitcoin Proof of Work Difficulty Explained

If you’re wondering why this is part 4.1 instead of part 4, and why I’m not talking about continuing to build the local jbc, it’s because explaining Bitcoin’s Proof of Work difficulty at a somewhat lower level takes a lot of space. So unlike what this title says, this post in part 4 is not how to build a blockchain. It’s about how an existing blockchain is built.

Read Full Story

Takeaways from the World’s largest Kaggle Grandmaster Panel – Open Source Leader in AI and ML

Personally, I’m a firm believer and fan of Kaggle and definitely look at it as the home of Data Science. Kaggle Grandmasters are the heroes of Kaggle or definitely mine. I’ve been on a pursuit to depict and understand their journey into the field also if they’re still humans or have passed onto an alternate reality (not still sure about that one).

Read Full Story

SQLite is really easy to compile

In the last week I’ve been working on another SQL website
(https://sql-steps.wizardzines.com/, a list of SQL examples). I’m running all
the queries on that site with sqlite, and I wanted to use window functions in
one of the examples (this one).
But I’m using the version of sqlite from Ubuntu 18.04, and that version is too
old and doesn’t support window functions.

Read Full Story

Miscellaneous unsolicited (and possibly biased) career advice

No one asked for this, but I’m something like ~12 years into my career and have had my fair share of mistakes and luck so I thought I’d share some of those things.
Honestly, I feel like I’ve mostly benefitted from luck. Some of the things I did on a whim turned out to be excellent choices many years later. Some of the things were clear blind spots in hindsight.

Read Full Story