Handling Structured & Imbalanced Datasets with Deep Learning

Handling Structured & Imbalanced Datasets with Deep Learning


Guest Blog from Analytics Vidhya
Introduction
While Deep Learning has shown remarkable success in the area of unstructured data like image classification, text analysis and speech recognition, there is very little literature on Deep Learning performed on structured / relational data. This investigation also focuses on applying Deep Learning on structured data because we are generally more comfortable with structured data than unstructured data.
After extensive investigations, it does seem that Deep Learning has the potential to do well in the area of structured data. We investigate class imbalance as it is a challenging problem for anomaly detection. In this report, Deep Multilayer Perceptron (MLP) was implemented using Theano in Python and experiments were conducted to explore the effectiveness of hyper-parameters.
It was seen that increasing the depth of the neural network helped in detecting minority classes. Cost-sensitive learning technique was also observed to work quite well to deal with the class imbalance. We conclude that adding dropout where feature complexity was relatively higher (KDD 1999 dataset that we have used) does not seem to give any improvement.

Table of Contents :
1.Overview

Dataset Used
Introduction to KDD Cup 1999
Class Imbalance in KDD Cup 1999 Data Set
Results from the winning entry
Evaluation Metrics

2. Methods

Computational tools
Deep Learning library for the implementation
MNIST Experiment
KDD Cup 1999 dataset pre-processing
Details of MLP Experiments on KDD Cup 1999 data

3. Results

Standard one hidden layer MLP for KDD Cup 1999 data
Experiments to deal with class imbalance
Cost-sensitive learning in MLP for KDD Cup 1999 data

4. Deep MLP Experiments

Standard two hidden layers MLP for KDD Cup 1999 data
Standard three hidden layers MLP for KDD Cup 1999 data
Standard four hidden layers MLP for KDD Cup 1999 data
Cost-sensitive learning in four hidden layers MLP for KDD Cup 1999 data
Cost-sensitive learning with dropout in four hidden layers MLP for KDD Cup 1999

5. Discussion and Evaluation
To check out all this information, click here. 
Top DSC Resources

Article: What is Data Science? 24 Fundamental Articles Answering This Question
Article: Hitchhiker’s Guide to Data Science, Machine Learning, R, Python
Tutorial: Data Science Cheat Sheet
Tutorial: How to Become a Data Scientist – On Your Own
Categories: Data Science – Machine Learning – AI – IoT – Deep Learning
Tools: Hadoop – DataViZ – Python – R – SQL – Excel
Techniques: Clustering – Regression – SVM – Neural Nets – Ensembles – Decision Trees
Links: Cheat Sheets – Books – Events – Webinars – Tutorials – Training – News – Jobs
Links: Announcements – Salary Surveys – Data Sets – Certification – RSS Feeds – About Us
Newsletter: Sign-up – Past Editions – Members-Only Section – Content Search – For Bloggers
DSC on: Ning – Twitter – LinkedIn – Facebook – GooglePlus

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Link: Handling Structured & Imbalanced Datasets with Deep Learning