Learning Mathematics for Data Science

Learning Mathematics for Data Science


I am halfway through my journey of being enough Mathematically literate to understand and work comfortably with Data Science books, posts, articles and journals. I wrote about my learning sabbatical earlier here.  Before I go on I want to reiterate few things which have established my way of learning and working.  Whenever I want to learn something new, I always want to learn fundamentals first and then build on them. There is no point in cramming things up or neglecting certain basic concepts and ideas because that not only causes frustration but you also waste lot of time in learning and then relearning 100 times same stuff because you neglected the basics.  That is my personal way of learning and then there is professional way, influenced by way of business and practicality in life. Professionally (Software Development earlier and now Data Science) if I need to learn some very new topic/sub-topic/concept  then I do a very quick glance and bit of reading the basics and then finding correct and established  ways to apply that concept in solving the problem. Aim is more on the applied side of things.

World is moving at high-speed, technology has shaken up the world and with this 4th industrial revolution lines have blurred between your career (a job) and the business. You have to give results quick else it may not only cost you your job but it may also turn up as a huge loss to your employer and if you don’t understand this, if you keep on ignoring it then it may cost you your career too in long term. Practically speaking, a delivered solution to a problem is better than a great solution which takes 10x more time to work and far better and beneficial than the perfect solution which takes an year to arrive.  Take domain for example, its importance in Data Science is already established. Data Science and domain expertise go hand in hand. Now if you are a Data Scientist and were never a Biology student but your employer got a client who is primarily a company dealing in Biology, then, you can either make it as a painful experience or a happier one, you gotta learn either way.  Before you think I am making things up, read about Computational Biology.  With no Biology background you can learn from ground-up few ways:

get a big heavy biology basics/intro book which takes several months to finish
join a 6+ weeks  MOOCs on biology and data science
learn the 4th industrial revolution way

First two options will take 1-2 or more months (ideally it is 4 months) before you can even start working on your project while your employer pays you for not producing anything. It is kind of both unfair and stupid, unfair to your employer and stupid to you because you can use time much more effectively.   You need to learn to separate the immediate requirement of your employer and your need to produce great stuff.  The 4th industrial revolution way says you can do this in a week or two, you don’t need to spend 2 months right away.  Narrow down your focus from Biology to the current requirement of your client and learn the basic terminology and enough fundamentals to get started on it in Python and libraries dealing with problems in Biology (PySB and Biopython e.g) . This has to be the attitude if you want to finish your work which comes with a deadline. And even if it does not come with a deadline, I will still say you go ahead 4th industrial revolution way because it will not only help you  to solve problems  faster but will also save you from the downward-spiral of “I did not learn enough“.  Now this does not mean, you create a bad solution  because you were not a Biology student in your school. If you do that then you are not systematic in your approach to solving problems, which means you have a lot to learn about productivity, efficiency and emotional intelligence. That also means your inability to deliver a really good solution extends beyond your work into your personal, financial and your professional life at a whole another level and  you need to take it very seriously and really start working on yourself right away (your daily routine  thinking needs to change).
You may think all this has nothing to do with Mathematics for Data Science. It does  actually. Mathematics is a way to model the world, it is a way to find patterns in problems and using tried and tested models to solve them. It is both an art and a science. Mathematics gives us tools to solve problems much faster than we can ever imagine, it is also amazingly accurate to handle the mess before things go out of hand. Now how does this apply in our situation ?  I will come to this in a minute.

Like I described in part-1 and part-2 of my blog-posts. I am finished with my Algebra and Calculus study by taking few MOOCs and working through lot of self-study material mixed of books, articles and blogs. I have started onto Probability now. Probability, Statistics and then Linear Algebra are areas of Mathematics  which are overlapped with Data Science (whenever I say Data Science, think Data Analysis and Machine Learning). What I observed is, just like Algebra and Calculus, these topics are quite vast and you can spend almost next 1 year just studying their basics. What do we do then ?   We become smart and practical.  What I use is Two-Fold Technique : 1st-fold,  use 4th industrial revolution way of solving problems that I described above, which means I remember my primary work is Data Analysis and Machine Learning and my goal is to get the necessary fundamentals Mathematics for Data Science so that I don’t stare blank or I don’t get my mental-fuses blown off looking at Mathematical jargon and formulas, and I know the basic axioms and theoretical foundations whenever I come across Math in Data Analysis and Machine Learning articles, blogs and library documentation. If I come across something I don’t know then  I can figure it out because I got the fundamentals.  2nd-Fold:  Lifelong learning of Mathematics, this is the opposite of 1st-Fold, Whenever I am free, I will open a book, watch few videos of a self-paced MOOC  (I no longer subscribe to time-based MOOCs for non Data Science work) or solve some Mathematical problems (Schaum’s series is very well known, if you know anything better, please do let me know).  In a broader sense, I made Mathematics as my  hobby instead of watching videos on YouTube.  Think like a Mathematician thinks about problems, spend as much time as you can on finding this out. I started this journey from Innumeracy: Mathematical Illiteracy and Its Consequences by John Allen Paulos.  In a narrower sense,  look at a Mathematical formula and find out why it was created, read about its history, in which areas it is applied, its scope of applications and its evolution from one domain to another, read it in as much details as possible or read about a concept in brief  from different mathematicians and books, read it in detail. For an excellent article on why you need to learn Math, read The Mathematical hacker by Evan Miller if you got background in programming.  Another very good article on how to learn Math the right way is by Steve Yegge. If you do just 1st-Fold then your knowledge of Mathematics will be quite less and years down the road you will not have enough information when decisions have to be made on using which models to solve which problems. If you just do 2nd-Fold then you will never have enough time to finish anything.  You have to mix-match the two, that’s why I call it Two-Fold method and in fact, it can be applied to not only math but almost any kind of learning.  If you  know any better way of learning Math, I will be more than happy to know.
Since my knowledge of Probability, Statistics and Linear Algebra is almost zero,   I have started one from Purdue University recently, it has its 2nd part too.  One last thing, the best MOOC I ever took was Learning How to Learn at Coursera. And yes, it has nothing to with Mathematics or Data Science but then this is how life is, your skills and achievements in any area/subject largely depend on your personal thinking but that’s totally another thing to talk about in another blog post.  Lifelong Learning, never give up no matter how much frustrated it gets.

Link: Learning Mathematics for Data Science