4 Data Science Conferences to Attend in Asia

Previously, I’ve written posts about 2019 and 2020 data science conferences to attend. Researching for both of those posts have given me a fair amount of knowledge on conferences happening in different regions around the world. In this article, I’m going to cover 5 data science conferences in Asia that you should consider attending in 2020.

Read Full Story

5 Data Science Conferences to Attend in Asia

Previously, I’ve written posts about 2019 and 2020 data science conferences to attend. Researching for both of those posts have given me a fair amount of knowledge on conferences happening in different regions around the world. In this article, I’m going to cover 5 data science conferences in Asia that you should consider attending in 2020.

Read Full Story

NLP for Log Analysis – Tokenization

This is part 1 of a series of posts based on a presentation I gave at the Silicon Valley Cyber Security Meetup on behalf of my company, Insight Engines. Some of the ideas are speculative and I do not know if they are used in practice. If you have any experience applying these techniques on logs, please share in the comments below.

Read Full Story

Top 5 Text Analytics Tips of The Year

Happy 2019 & Top Posts of the YearThank you all for your readership in 2018. We’re starting out the New Year with some changes to our website, so please bear with us as we migrate our older blog posts over and get things updated.Our first post of 2019 will be our annual post on the changes in popular slang, a favorite among trend watchers and those following Millennials and Gen Y.

Read Full Story

Selecting good features – Part IV: stability selection, RFE and everything side by side

In my previous posts, I looked at univariate methods,linear models and regularization and random forests for feature selection.In this post, I’ll look at two other methods: stability selection and recursive feature elimination (RFE), which can both considered wrapper methods.

Read Full Story

PROC MIXED: More errors with repeated measures

Since the last few posts detailed errors in repeated measures with PROC GLM , I thought I should acknowledge that people seem to struggle just as much with PROC MIXED.
Forgetting data needs to be multiple rows
This is one of the first points of confusion for students. When you do a PROC MIXED, you need multiple records for each person.

Read Full Story

Where to find terabyte-size dataset for machine learning

In the previous blog posts we played with a large multi-gigabyte dataset. This 34 GB dataset is based on stackoverflow.com data. A couple days ago I found another great large dataset. This is a two terabyte snapshot from Reddit website. This dataset is perfect for text mining and NLP experimentation. The image was taken from this web page.1.

Read Full Story

Random forest interpretation with scikit-learn

In one of my previous posts I discussed how random forests can be turned into a “white box”, such that each prediction is decomposed into a sum of contributions from each feature i.e. (prediction = bias + feature_1 contribution + … + feature_n contribution).I’ve a had quite a few requests for code to do this.

Read Full Story

Random forest interpretation – conditional feature contributions

In two of my previous blog posts, I explained how the black box of a random forest can be opened up by tracking decision paths along the trees and computing feature contributions. This way, any prediction can be decomposed into contributions from features, such that (prediction = bias + feature_1contribution+..+feature_ncontribution).

Read Full Story

Δ ℚuantitative √ourney | Deep Reinforcement Learning in Action (Announcement)

Punchline: Go check out Deep Reinforcement Learning in Action
In part due to the attention my posts on reinforcement learning have attracted, I teamed up
with my friend Alex, a bona fide machine learning engineer most recently at Amazon, to write
a book about Deep Reinforcement Learning. I’m happy to say that we are publishing with the big technical publishing company, Manning.

Read Full Story

Observing the Logic Behind Mergers and Acquisitions

It’s January, which means that I’ve been working on my annual posts on investments and M&A in the social media intelligence market. As always, I find myself mentally dividing the transactions into several buckets: the serial acquirers, the complementary products, the geographic expansions.

Read Full Story