I’m on my way to Washington, D.C., for the Family Online Safety Institute’s (FOSI) annual conference. I have attended almost all of these events since they started in 2007, usually as a speaker, in my role as CEO of ConnectSafely.org.Read Full Story
There, I said it. I said the “R” word. And no, I’m not talking the political “R” word. I’m talking about the potential of a… r-e-c-e-s-s-i-o-n.Read Full Story
I submit that our Market Research industry has been experiencing what I’m calling an “Inverted Moore’s Law.”
I’m sure you are familiar with the concept of Moore’s Law… generated by Gordon Moore, co-founder of Intel in 1965, and largely proven today, that computing power would double every two years while costs halved.
Here’s a dumb extremely accurate rule I’m postulating* for software engineering projects: you need at least 3 examples before you solve the right problem.
This is what I’ve noticed:
Don’t factor out shared code between two classes. Wait until you have at least three.
The two first attempts to solve a problem will fail because you misunderstood the problem. The third time it will work.
(Way back when, I went through all the Netflix prize papers. I’m now (very slowly) trying to clean up my notes and put them online. Eventually, I hope to have a more integrated tutorial, but here’s a rough draft for now.)
This is a summary of Koren’s 2008 Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model.
Funding open source software development is a complicated subject. I’m excited
to announce that I’ve founded Ursa Labs (https://ursalabs.org), an
independent development lab with the mission of innovation in data science
I’m interrupting the regular programming for a quick announcement: we’re looking for data engineers at Better. You would be the first one to join and would work a lot directly with me.
Some fun things you could work on (these are all projects I’m working on right now):
Building a forecasting model using MCMC to predict volume the next few months.
I’m excited to share with you that I’ll be presenting two unique sessions at DATAVERSITY’s Data Architecture Summit in Chicago, IL on November 13-16, 2017. I’m also happy to be able to offer 10% off your registration package with coupon code ATHENA.Read Full Story
As some of you may know, one of my side interests is approximate nearest neighbor algorithms. I’m the author of Annoy, a library with 3,500+ stars on Github as of today. It offers fast approximate search for nearest neighbors with the additional benefit that you can load data super fast from disk using mmap.Read Full Story
This week I’m preparing a presentation for DataDay Texas about Natural Language Processing with graph databases and Neo4j.Read Full Story
I’m currently competing in the Second Annual Data Science Bowl at Kaggle. This is by far the most difficult competition that I have entered to date. At the time of writing I am placed 62nd out of 755 entries, with only a day remaining to lock down my methodology.Read Full Story
I’ve been working on explaining computer things I’m learning on this blog for 6 years.
I wrote one of my first posts, what does a shell even do? on
Sept 30, 2013. Since then, I’ve written 11 zines, 370,000 words on this blog, and
given 20 or so talks. So it seems like I like explaining things a lot.
In May I’m running 2 courses:
May 9th – Successfully Delivering Data Science Projects
May 17th – Software Engineering for Data Scientists
The first is aimed at data scientists who have had a bad or wobbly delivery who want to learn better ways to design projects, derisk their stages and deliver more reliably.
I’m taking my qualifying exam this Tuesday—which may surprise some of
you that I haven’t already done it! This is mostly due to logistical
kerfuffles as I transferred Ph.D.’s and I also tend to avoid coursework
like the plague.
Each university has its own culture around an oral or qualifying exam.
When I travel, I always want to get the most of the city I’m visiting. One way is to talk to local people and get advices about which spots you shouldn’t miss.Read Full Story
If you’re wondering why this is part 4.1 instead of part 4, and why I’m not talking about continuing to build the local jbc, it’s because explaining Bitcoin’s Proof of Work difficulty at a somewhat lower level takes a lot of space. So unlike what this title says, this post in part 4 is not how to build a blockchain. It’s about how an existing blockchain is built.Read Full Story
This time I’m going to introduce tools for visual data exploration. They represent each time series as a point in 2-dimensional space. When original time series are similar the corresponding points will be close to each other.Read Full Story
Personally, I’m a firm believer and fan of Kaggle and definitely look at it as the home of Data Science. Kaggle Grandmasters are the heroes of Kaggle or definitely mine. I’ve been on a pursuit to depict and understand their journey into the field also if they’re still humans or have passed onto an alternate reality (not still sure about that one).Read Full Story
In the last week I’ve been working on another SQL website
(https://sql-steps.wizardzines.com/, a list of SQL examples). I’m running all
the queries on that site with sqlite, and I wanted to use window functions in
one of the examples (this one).
But I’m using the version of sqlite from Ubuntu 18.04, and that version is too
old and doesn’t support window functions.
No one asked for this, but I’m something like ~12 years into my career and have had my fair share of mistakes and luck so I thought I’d share some of those things.
Honestly, I feel like I’ve mostly benefitted from luck. Some of the things I did on a whim turned out to be excellent choices many years later. Some of the things were clear blind spots in hindsight.