Book: Python Data Science Handbook

Book: Python Data Science Handbook


Jupyter notebook content for my OReilly book, the Python Data Science Handbook.
This repository contains the full listing of IPython notebooks used to create the book, including all text and code. The code was written and tested with Python 3.5, though most (but not all) snippets will work correctly in Python 2.7.
See also the free companion project, A Whirlwind Tour of Python: a fast-paced introduction to the Python language aimed at researchers and scientists.

Table of Contents
Preface
1. IPython: Beyond Normal Python
Help and Documentation in IPython
Keyboard Shortcuts in the IPython Shell
IPython Magic Commands
Input and Output History
IPython and Shell Commands
Errors and Debugging
Profiling and Timing Code
More IPython Resources
2. Introduction to NumPy
Understanding Data Types in Python
The Basics of NumPy Arrays
Computation on NumPy Arrays: Universal Functions
Aggregations: Min, Max, and Everything In Between
Computation on Arrays: Broadcasting
Comparisons, Masks, and Boolean Logic
Fancy Indexing
Sorting Arrays
Structured Data: NumPy’s Structured Arrays
3. Data Manipulation with Pandas
Introducing Pandas Objects
Data Indexing and Selection
Operating on Data in Pandas
Handling Missing Data
Hierarchical Indexing
Combining Datasets: Concat and Append
Combining Datasets: Merge and Join
Aggregation and Grouping
Pivot Tables
Vectorized String Operations
Working with Time Series
High-Performance Pandas: eval() and query()
Further Resources
4. Visualization with Matplotlib
Simple Line Plots
Simple Scatter Plots
Visualizing Errors
Density and Contour Plots
Histograms, Binnings, and Density
Customizing Plot Legends
Customizing Colorbars
Multiple Subplots
Text and Annotation
Customizing Ticks
Customizing Matplotlib: Configurations and Stylesheets
Three-Dimensional Plotting in Matplotlib
Geographic Data with Basemap
Visualization with Seaborn
Further Resources
5. Machine Learning
What Is Machine Learning?
Introducing Scikit-Learn
Hyperparameters and Model Validation
Feature Engineering
In Depth: Naive Bayes Classification
In Depth: Linear Regression
In-Depth: Support Vector Machines
In-Depth: Decision Trees and Random Forests
In Depth: Principal Component Analysis
In-Depth: Manifold Learning
In Depth: k-Means Clustering
In Depth: Gaussian Mixture Models
In-Depth: Kernel Density Estimation
Application: A Face Detection Pipeline
Further Machine Learning Resources
Appendix: Figure Code
The book is available, here.
Top DSC Resources
Article: What is Data Science? 24 Fundamental Articles Answering This Question
Article: Hitchhiker’s Guide to Data Science, Machine Learning, R, Python
Tutorial: Data Science Cheat Sheet
Tutorial: How to Become a Data Scientist – On Your Own
Categories: Data Science – Machine Learning – AI – IoT – Deep Learning
Tools: Hadoop – DataViZ – Python – R – SQL – Excel
Techniques: Clustering – Regression – SVM – Neural Nets – Ensembles – Decision Trees
Links: Cheat Sheets – Books – Events – Webinars – Tutorials – Training – News – Jobs
Links: Announcements – Salary Surveys – Data Sets – Certification – RSS Feeds – About Us
Newsletter: Sign-up – Past Editions – Members-Only Section – Content Search – For Bloggers
DSC on: Ning – Twitter – LinkedIn – Facebook – GooglePlus
Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Link: Book: Python Data Science Handbook