Currently set to No Index

Document worth reading: “A Survey of Utility-Oriented Pattern Mining”

The main purpose of data mining and analytics is to find novel, potentially useful patterns that can be utilized in real-world applications to derive beneficial knowledge. For identifying and evaluating the usefulness of different kinds of patterns, many techniques/constraints have been proposed, such as support, confidence, sequence order, and utility parameters (e.g.

Read Full Story

Formatting numbers at the command line

The utility numfmt, part of Gnu Coreutils, formats numbers. The main uses are grouping digits and converting to and from unit suffixes like k for kilo and M for mega. This is somewhat useful for individual invocations, but like most command line utilities, the real value is using it as part of pipeline.
The –grouping option will separate digits according to the rules of your locale.

Read Full Story

Comment on Breakthrough: How to Avert Analytics’ Most Treacherous Pitfall by Brandon

A different form of statistical analysis could prove benefitial, but I think the main thing to keep in mind is that data mining algorithms just show you what trends there are in the data, rather than prove anything concretely.  If a trend is found in the data, that is the beginning rather than the end of the research.

Read Full Story

The Big Change In Blame

Law is our main system of official blame; it is how we officially blame people for things. So it is a pretty big deal that, over the last few centuries, changes to law have induced big changes in who officially blames who for most things that go wrong. These changes may be having big bad effects.
Long ago most everyone could use law to blame most everyone else.

Read Full Story

Introducing pipe, The Automattic Machine Learning Pipeline

One of the main projects I’ve been working on over the past year.
Data for Breakfast
A generalized machine learning pipeline, pipe serves the entire company and helps Automatticians seamlessly build and deploy machine learning models to predict the likelihood that a given event may occur, e.g., installing a plugin, purchasing a plan, or churning.

Read Full Story

Interacting with ML Models

The main difference between data analysis today, compared with a decade or two ago, is the way that we interact with it. Previously, the role of statistics was primarily to extend our mental models by discovering new correlations and causal rules. Today, we increasingly delegate parts of our reasoning processes to algorithmic models that live outside our mental models.

Read Full Story