Webinar: Growing Your Online Sales Starts With the Upcoming Holiday Season

There’s a lot riding on your online holiday strategy. The last three months of the year account for 30% of online CPG sales, and they welcome a flood of first-time online buyers. A consumer’s online experience—with your brand or an important e-tailer like Amazon—can make or break 2020 engagement.

Read Full Story

Some Observations on Winsorization and Trimming

Over the last few months, I’ve had a lot of conversations with people about the use of winsorization to deal with heavy-tailed data that is positively skewed because of large outliers. After a conversation with my friend Chris Said this past week, it became clear to me that I needed to do some simulation studies to understand the design space of techniques for dealing with outliers.

Read Full Story

Getting Started with Apache Spark and Neo4j Using Docker Compose

I’ve received a lot of interest in Neo4j Mazerunner since first announcing it a few months ago. People from around the world have reached out to me and are excited about the possibilities of using Apache Spark and Neo4j together. From authors who are writing new books about big data to PhD researchers who need it to solve the world’s most challenging problems.

Read Full Story

Presentation Tips for Data Professionals

Not so long ago, I saw one of my data scientist friends speak in front of his whole company. He worked for two months on a great data project and he got some very cool findings with significant business value. He had 15 minutes to present everything. Unfortunately, he is not an experienced public speaker and his presentation didn’t go well.

Read Full Story

Analyzing the Graph of Thrones · William Lyon

A few months ago, mathematicians Andrew Beveridge and Jie Shan published Network of Thrones in Math Horizon Magazine where they analyzed a network of character interactions from the novel “A Storm of Swords”, the third book in the popular “A Song of Ice and Fire” and the basis for the Game of Thrones TV series.

Read Full Story

RAIN Project: evolution of the game development dream

Eleven months ago on a long train ride home, I wrote the first lines of code for a small platforming game. Little did I know that this prototype was the start of something much more than a just game — it was a dream that would become shared within an amazing team, and it was the greatest step in a personal journey that had begun over eight years ago.

Read Full Story

Dealing with Corrupt Files in Hadoop

As I’ve been working with Hadoop a lot in the last several months, I’ve come to realize that it doesn’t deal gracefully with corrupt files (e.g., mal-formed gzip files). I would throw a cluster at a couple hundred thousand files (of which one or two were bad) and the job would die two hours into execution, throwing EOFException errors all over the place.

Read Full Story

Pandas & Burritos – Analyzing Chipotle Order Data

A few months back the New York Times ran an article titled “At Chipotle, How Many Calories Do People Really Eat?” which took a look at the average amount of calories a typical order at Chipotle contained. They found that:
The typical order at Chipotle has about 1,070 calories. That’s more than half of the calories that most adults are supposed to eat in an entire day.

Read Full Story

Omakase 2: Suwapond

With this month’s Omakase game, I wanted to take a super simple game idea, and working on polishing the feel of the core mechanic — not necessarily the mechanic itself, but how satisfying it is to the player.
I was also traveling for a good half of the month, so I put a few constraints on myself to not go overboard:
I will write the movement mechanics from scratch.

Read Full Story

Box-Plots for Education Recap

I stumbled across a brand new ML competition platform a couple of months ago, DrivenData, which describes itself as hosting “data science competitions to save the world”. Basically think Kaggle for non-profits. They had launched their first prize awarding comp, Box-Plots for Education which aimed to automatically classify education expenses into various categories.

Read Full Story