How to Create an HTML Form That Sends You an Email

Sometimes, improving UX can cost a lot of money.
And oftentimes, some of the problems website visitors have are easy, simple fixes.
That begs the question: How can you find out if customers are enjoying their website experience?
The answer may be simpler than you think.
Having forms on your website is an effective way to get customer feedback about their experience during their visit.

Read Full Story

Data science is now recognized as the key to bridging the gap between tech and business within the corporate governance of companies

Data science has become an integral part of any enterprise because it solves problems. Companies can forecast the success rate of their strategies and, ultimately, manage operations more efficiently. At present, firms are data-rich. This means that they have a great deal of data that enables them to gain insight through proper analysis.

Read Full Story

Using Apache Spark and Neo4j for Big Data Graph Analytics

As engineers, when we think about how to solve big data problems, evaluating technologies becomes a choice between scalable and not scalable. Ideally we choose the technologies that can scale to a variety of business problems without hitting a ceiling down the road. Database technologies have evolved to be able to store big data, but are largely inflexible.

Read Full Story

Common Pitfalls in Machine Learning

Over the past few years I have worked on numerous different machine learning problems. Along the way I have fallen foul of many sometimes subtle and sometimes not so subtle pitfalls when building models.  Falling into these pitfalls will often mean when you think you have a great model, actually in real-life it performs terribly.

Read Full Story

Memory Latency, Hashing, Optimal Golomb Rulers and Feistel Networks

In many problems involving hashing we want to look up a range of elements from a vector, e.g. of the form (v[h(i,j)]) for arbitrary (i) and for a range of (j in {1, ldots, n}) where (h(i,j)) is a hash function. This happens e.g. for multiclass classification, collaborative filtering, and multitask learning.

Read Full Story

Token Lumpers and Splitters: Spider-Man vs. Superman

One of the perennial problems in tokenization for search, classification, or clustering of natural language data is which words and phrases are spelled as compounds and which are separate words. For instance consider “dodgeball” (right) vs. “dodge ball” (wrong) and “golfball” (wrong) vs. “golf ball” (right)? It’s a classic lumpers vs. splitters problem.

Read Full Story

Performance Measures for Multi-Class Problems

For classification problems, classifier performance is typically defined according to the confusion matrix associated with the classifier. Based on the entries of the matrix, it is possible to compute sensitivity (recall), specificity, and precision. For a single cutoff, these quantities lead to balanced accuracy (sensitivity and specificity) or to the F1-score (recall and precision).

Read Full Story

7 Simple Tricks to Handle Complex Machine Learning Issues

We propose simple solutions to important problems that all data scientists face almost every day. In short, a toolbox for the handyman, useful to busy professionals in any field.
1. Eliminating sample size effects. Many statistics, such as correlations or R-squared, depend on the sample size, making it difficult to compare values computed on two data sets of different sizes.

Read Full Story

A Guide to Gradient Boosted Trees with XGBoost in Python

Source
XGBoost has become incredibly popular on Kaggle in the last year for any problems dealing with structured data. I was already familiar with sklearn’s version of gradient boosting and have used it before, but I hadn’t really considered trying XGBoost instead until I became more familiar with it. I was perfectly happy with sklearn’s version and didn’t think much of switching.

Read Full Story

How to make a self-service BI tools deployment less painful

Why does self-service BI have to hurt so much? In theory, it should be the answer to all your problems. But so many enterprises end up disappointed. They’re overwhelmed with data, yet suffering from a shortage of useful information. They can’t gain insights. And it’s causing data chaos and anarchy.
>>> Read the full article on SearchBusinessAnalytics.

Read Full Story

The epistemology crisis

We have a crisis of epistemology. A tsunami of bad tools, bad ideas, biased actors, and unresolved problems. Among our many issues, we have: Predictions treated as facts, and inherently fuzzy historical data presented without error bars. Small scale studies on college students and professional guinea pigs extrapolated out to whole populations.

Read Full Story

Chasing Fermat’s Last Theorem

Fermat’s Last Theorem is one of the most celebrated problems in mathematics. It is a problem in the field of Number Theory that’s very simple to understand, yet extremely difficult to solve. In fact, it’s so difficult that it remained unsolved for 358 years even though it got a lot of attention from some of the best mathematicians in history. Pierre de Fermat first came up with it in 1637.

Read Full Story