# The Fundamentals of Data Science

The Fundamentals of Data Science

Guest blog post by Mic Farris. Mic is a Decision Science & Analytics Leader at CenturyLink.

Two of the biggest buzzwords in our industry are “big data” and “data science”. Big Data seems to have a lot of interest right now, but Data Science is fast becoming a very hot topic.

Source for picture: click here

I think there’s room to really define the science of data science – what are those fundamentals that are needed to make data science truly a science we can build upon?

What follows are such a set of fundamentals:

Fundamentals of Data Science

Introduction

The easiest thing for people within the big data / analytics / data science disciplines is to say “I do data science”. However, when it comes to data science fundamentals, we need to ask the following critical questions: What really is “data”, what are we trying to do with data, and how do we apply scientific principles to achieve our goals with data?

What is Data?

The Goal of Data Science

The Scientific Method

Probability and Statistics

The world is a probabilistic one, so we work with data that is probabilistic – meaning that, given a certain set of preconditions, data will appear to you in a specific way only part of the time. To apply data science properly, one must become familiar and comfortable with probability and statistics.

The Two Characteristics of Data

Examples of Statistical Data

Introduction to Probability

Probability Distributions

Connection with Statistical Distributions

Statistical Properties (Mean, Mode, Median, Moments, Standard Deviation, etc.)

Common Probability Distributions (Discrete, Binomial, Normal)

Other Probability Distributions (Chi-Square, Poisson)

Joint and Conditional Probabilities

Bayes’ Rules

Bayesian Inference

Decision Theory

This section is one of the key fundamentals of data science. Whether applied in scientific, engineering, or business fields, we are trying to make decisions using data. Data itself isn’t useful unless it’s telling us something, which means we’re making a decision about what it is telling us. How do we come up with those decisions? What are the factors that go into this decision making process? What is the best method for making decisions with data? This section tell us…

Hypothesis Testing

Binary Hypothesis Test

Likelihood Ratio and Log Likelihood Ratio

Bayes Risk

Neyman-Pearson Criterion

Receiver Operating Characteristic (ROC) Curve

M-ary Hypothesis Test

Optimal Decision Making

Estimation Theory

Sometimes we make characterizations of data – averages, parameter estimates, etc. Estimation from data is essentially an extension of decision making, a natural next section from Decision Theory.

Estimation as Extension of M-ary Hypothesis Test

Unbiased Estimation

Minimum Mean Square Error (MMSE)

Maximum Likelihood Estimation (MLE)

Maximum A Posteriori Estimation (MAP)

Kalman Filter

Coordinate Systems

To bring various data elements together into a common decision making framework, we need to know how to align the data. Knowledge of coordinate systems and how they are used becomes important to lay a solid foundation for bringing disparate data together.

Introduction to Coordinate Systems

Euclidian Spaces

Orthogonal Coordinate Systems

Properties of Orthogonal Coordinate Systems (angle, dot product, coordinate transformations, etc.)

Cartesian Coordinate System

Polar Coordinate System

Cylindrical Coordinate System

Spherical Coordinate System

Transformations Between Coordinate Systems

Linear Transformations

Once we understand coordinate systems, we can learn why to transform the data to get at the underlying information. This section describe how we can transform our data into other useful data products through various types of transformations, including the popular Fourier transform.

Introduction to Linear Transformations

Properties of Linear Transformations

Matrix Multiplication

Fourier Transform

Properties of Fourier Transforms (time-frequency relationship, shift invariance, spectral properties, Parseval’s Theorem, Convolution Theorem, etc.)

Discrete and Continuous Fourier Transforms

Uncertainty Principle and Aliasing

Wavelet and Other Transforms

Effects of Computation on Data

An often overlooked aspect of data science is the impact the algorithms we apply have on the information we are seeking to find. Merely applying algorithms and computations to create analytics and other data products has an impact on the effectiveness data-driven decision making ability. This section take us on a journey of advanced aspects of data science.

Mathematical Representation of Computation

Reversible Computations (Bijective Mapping)

Irreversible Computations

Impulse Response Functions

Transformation of Probability Distributions (due to addition, subtraction, multiplication, division, arbitrary computations, etc.)

mpacts on Decision Making

Prototype Coding / Programming

One of the key elements to data science is the willingness of practitioners to “get their hands dirty” with data. This means being able to write programs that access, process, and visualize data in important languages in science and industry. This section takes us on a tour of these important elements.

Introduction to Programming

Data Types, Variables, and Functions

Data Structures (Arrays, etc.)

Loops, Comparisons, If-Then-Else

Functions

Scripting Languages vs. Compilable Langugages

SQL

SAS

R

Python

C++

Graph Theory

Graphs are ways to illustrate connections between different data elements, and they are important in today’s interconnected world.

Introduction to Graph Theory

Undirected Graphs

Directed Graphs

Various Graph Data Structures

Route and Network Problems

Algorithms

Key to data science is understanding the use of algorithms to compute important data-derived metrics. Popular data manipulation algorithms are included in this section.

Introduction to Algorithms

Recursive Algorithms

Serial, Parallel, and Distributed Algorithms

Exhaustive Search

Divide-and-Conquer (Binary Search)

Gradient Search

Sorting Algorithms

Linear Programming

Greedy Algorithms

Heuristic Algorithms

Randomized Algorithms

Shortest Path Algorithms for Graphs

Machine Learning

No data science fundamentals course would be complete without exposure to machine learning. However, it’s important to know that these techniques build upon the fundamentals described in previous sections. This section gives practitioners an understanding of useful and popular machine learning techniques and why they are applied.

Introduction to Machine Learning

Linear Classifiers (Logistic Regression, Naive Bayes Classifier, Support Vector Machines)

Decision Trees (Random Forests)

Bayesian Networks

Hidden Markov Models

Expectation-Maximization

Artificial Neural Networks and Deep Learning

Vector Quantization

K-Means ClusteringComment

Originally posted here.

DSC Resources

Services: Hire a Data Scientist | Search DSC | Classifieds | Find a Job

Contributors: Post a Blog | Ask a Question

Follow us: @DataScienceCtrl | @AnalyticBridge

Popular Articles

Difference between Machine Learning, Data Science, AI, Deep Learning, and Statistics

What is Data Science? 24 Fundamental Articles Answering This Question

Hitchhiker’s Guide to Data Science, Machine Learning, R, Python

Advanced Machine Learning with Basic Excel

Link: The Fundamentals of Data Science