Big Data Sets Impressive New Standards On Integrated Business Systems

In 2013, Wired published a very interesting article about the role of big data in the field of integrated business systems. Author James Kobielus, the lead AI and data analyst for Wikibon and former IBM expert, said that there are a number of ways that integrated business systems are tapping the potential of AI and big data.

Read Full Story

Newsvendor Problem – The Tale of the First Formula in the Textbook

Introduction
Near our offices here at Stitch Fix, there’s a weekly farmers market.
We’ve noticed an interesting pattern. When the market opens, seasonal produce, bread, and nuts fill tables and food stands. By the end of the day, however, the table with the strawberries is almost empty, while you can still select from a range of nuts at another table.

Read Full Story

Recommender Systems: We’re doing it (all) wrong

A few days back, there was an interesting post by Judy Robertson in the Communications of the ACM blog. The post, entitled “Stats: We’re doing it wrong”, builds upon a paper from last year’s CHI conference in which they report that more than 90% of the HCI researchers used the wrong statistical tools when analyzing and reporting on likert scale type of data.

Read Full Story

Portfolio Optimization with Python

There are a lot of interesting applications of convex optimization; in this post I’ll explore an application of convex optimization in finance. I’ll walk through using convex optimization to allocate a stock portfolio so that it maximizes return for a given risk level. We’ll use real data for a mock portfolio, and solve the problem using Python.

Read Full Story

Learning from users faster using machine learning

I had an interesting idea a few weeks ago, best explained through an example. Let’s say you’re running an e-commerce site (I kind of do) and you want to optimize the number of purchases.
Let’s also say we try to learn as much as we can from users, both using A/B tests but also using just basic slicing and dicing of the data.

Read Full Story

Perceptual Straightening of Natural Videos

Video is an interesting domain for unsupervised, or self-supervised, representation learning. But we still don’t know what type of inductive biases will enable us to best exploit the information encoded in the temporal sequence of video frames. Slow Feature Analysis (SFA) and its more recent cousin Learning to Linearize (e.g. Goroshin et al.

Read Full Story

From Infinite Matrices to New Integration Formula

This is another interesting problem, off-the-beaten-path. It ends up with a formula to compute the integral of a function, based on its derivatives solely. 
For simplicity, I’ll start with some notations used in the context of matrix theory, familiar to everyone: T(f) = g, where f and g are vectors, and T a square matrix.

Read Full Story

Deep learning for cancer research

A recent interesting news from Stanford regarding identification of skin cancer using deep leaning for images.A different project featured by NVIDIA is using deep learning for breast cancer research, where they claim that the error went down 85%.Unrelated, I heard today about Grail who raised 100M$ for cancer detection in blood tests.

Read Full Story

Some misc news

I just learned my postdoc roommate Yisong Yue from Caltech released a new interesting paper: Factorized Variational Autoencoders for Modeling Audience Reactions to Movies: a joint work with Disney Research, published @ CVPR 2017. Another interesting paper: Accelerating Innovation Through Analogy Mining, just received the best paper award at KDD 2017.

Read Full Story

Deepgram – Audio Search with Deep Learning

A very interesting podcast by Sam Charrington who is interviewing Scott Stephenson from DeepGram. DeepGram is using deep learning activations for creating indexes that allows to search text in voice recordings.DeepGram have released Kur which is a high level abstraction of deep learning framework to allow quickly defining network layouts.

Read Full Story

Anomaly Detection with Wikipedia Page View Data

Today, the Twitter engineering team released another very interesting Open Source R package for working with time series data: “AnomalyDetection“. This package uses the Seasonal Hybrid ESD (S-H-ESD) algorithm to identify local anomalies (= variations inside seasonal patterns) and global anomalies (= variations that cannot be explained with seasonal patterns).

Read Full Story

How Much Memory Does A Data Scientist Need?

Recently, I discovered an interesting blog post Big RAM is eating big data — Size of datasets used for analytics from Szilard Pafka. He says that “Big RAM is eating big data”. This phrase means that the growth of the memory size is much faster than the growth of the data sets that typical data scientist process. So, data scientist do not need as much data as the industry offers to them.

Read Full Story

Pascal’s Triangle

I had a recent conversation with a friend who asked me “what makes number theory interesting?”. I loved the question, mainly because it gave me an opportunity to talk about math in a positive manner. More importantly though, it was an opportunity to talk about one of my favorite courses in mathematics (along with discrete mathematics and set theory).

Read Full Story