Scraping Responsibly with R

I recently wrote a blog post here
comparing the number of CRAN downloads an R package gets relative to its number of
stars on GitHub. What I didn’t really think about during my analysis was whether or
not scraping CRAN was a violation of its Terms and Conditions. I simply copy and
pasted some code from R-bloggers
that seemed to work and went on my merry way.

Read Full Story

Re-exporting the magrittr pipe operator

… or how I stoped worrying and wrote a blog post to remember it ad infinitum.
Magrittr’s pipe operator is one of those newish R-universe features that I
really want to have around whenever I put some lines into an R-console.
This is even TRUE when writing a package.
So the first thing I do is put magrittr into the DESCRIPTION file and add
an __imports.

Read Full Story

How to Improve Your Subscription-Based Business by Predicting Churn

A couple of weeks ago I wrote a guest post on churn prediction for Kissmetrics, and they just published it.
Churn prediction is one of the most popular Big Data use cases in business. It consists in detecting which customers are likely to cancel a subscription to a service based on how they use the service.

Read Full Story

facebook like translations for your rails app

h1. facebook like translations for your rails app
p(meta). 07 October 2009
Last year I wrote a plugin for “Sanbit”:http://sanbit.com, the language learning site I was working at the time that allowed you to add a facebook style translation system to your application, I call it “Sanbit Translations”:http://github.com/jtoy/sanbit_translations.

Read Full Story

Teach the tidyverse to beginners

A few years ago, I wrote a post Don’t teach built-in plotting to beginners (teach ggplot2). I argued that ggplot2 was not an advanced approach meant for experts, but rather a suitable introduction to data visualization.
Many teachers suggest I’m overestimating their students: “No, see, my students are beginners…”.

Read Full Story

Trump’s Android and iPhone tweets, one year later

A year ago today, I wrote up a blog post Text analysis of Trump’s tweets confirms he writes only the (angrier) Android half.
My analysis, shown below, concludes that the Android and iPhone tweets are clearly from different people, posting during different times of day and using hashtags, links, and retweets in distinct ways.

Read Full Story

Truncated Bi-Level Optimization

In 2012, I wrote a paper that I probably should have called “truncated bi-level optimization”.  I vaguely remembered telling the reviewers I would release some code, so I’m finally getting around to it.
The idea of bilevel optimization is quite simple.  Imagine that you would like to minimize some function .  However, itself is defined through some optimization.

Read Full Story

RAIN Project: evolution of the game development dream

Eleven months ago on a long train ride home, I wrote the first lines of code for a small platforming game. Little did I know that this prototype was the start of something much more than a just game — it was a dream that would become shared within an amazing team, and it was the greatest step in a personal journey that had begun over eight years ago.

Read Full Story

Improving Twitter Search with Real-Time Human Computation

(This is a post from the Twitter Engineering Blog that I wrote with Alpa Jain.)
One of the magical things about Twitter is that it opens a window to the world in real-time. An event happens, and just seconds later, it’s shared for people across the planet to see.
Consider, for example, what happened when Flight 1549 crashed in the Hudson.
http://twitpic.

Read Full Story

Breakfast under Bill – A look at my morning on the front page of Hacker News

Tuesday night I wrote a short blog post about how I used python to find cheap tickets to a music festival. I finished up pretty late so I decided to post it online the next morning. I woke up pretty early and posted the article on a few websites around seven. I started watching my google analytics page and the hits started coming in very fast, much faster than normal.

Read Full Story

Building a budgeting service

A post-hoc analysis, part 2
As I wrote in my last blog post, around 3 years ago I decided to try to build a budgeting service like mint.com for the norwegian market. After around a year, having reached the prototype stage, I decided to take a short break from further building, to think about the business details. This quickly turned into an … extended break.

Read Full Story

pystruct: more structured prediction with python

Some time ago I wrote about a structured learning project I have been working on for some time, called pystruct.After not working on it for some time, I think it has come quite a long way the last couple of weeks as I picked up work on structured SVMs again. So here is a quick update on what you can do with it.

Read Full Story

Overview and benchmark of traditional and deep learning models in text classification

This article is an extension of a previous one I wrote when I was experimenting sentiment analysis on twitter data. Back in the time, I explored a simple model: a two-layer feed-forward neural network trained on keras. The input tweets were represented as document vectors resulting from a weighted average of the embeddings of the words composing the tweet.

Read Full Story

The Data Incubator Unofficial Frequently Asked Questions

About a year ago I wrote a review of The Data Incubator (updated review is here). I always know when the Data Incubator application season is here because I always get a few people who have found my blog reaching out with questions about the process. I decided to put together a short list of some of the most common questions I get asked.
Should I do the program?
The program has pros and cons.

Read Full Story