$

Data Science and Machine Learning: How to Start and Where to Learn

Today Data Science and Machine Learning have become very popular. As a result, many students learning these fields of study dedicate their diploma works to these fields, and after graduation, a growing number of students want to work in these cutting-edge fields, earning high salaries and keeping up with developing technologies.

However, if 10 years ago it was a problem of the lack of information, now there is a request for structuring and help in choosing the education, which will give the necessary minimum skills for such jobs.

This article is written for those who want to try themselves in Data Science and machine learning but do not know how to start and what to learn.

What is Data Science and Machine Learning?

Before talking about machine learning, let’s start by defining the terminology.

Data Science is the generic name for data science disciplines, and Machine Learning is a division of Data Science that deals with building smart models. Such models can be used to predict a user’s purchase of goods, recommendations in social networks (recommendation systems), image recognition, andц so on.

Data Science specialists are engaged in research. In IT companies, such positions correspond to the position of research engineers – they are primarily mathematicians who work with the theoretical part of the algorithms and investigate different patterns.

Machine Learning engineers, in turn, build models based on the obtained data. But this division exists only in theory or only in some countries.

Previously, Data Science and Machine Learning were used as synonyms, but now these concepts are separated. In our reality, jobs that require knowledge of Machine Learning are often called data scientists, and vice versa. So if you want to work with data, you should learn both.

The learning process for Data Science and Machine Learning can be divided into five parts:

  • Mathematics.
  • Programming Language
  • Machine Learning Algorithms
  • Deep Learning.
  • Individual Specializations.

Let’s examine each of them more closely.

Mathematics

To begin with, let’s figure out whether mathematics is needed for Data Science and Machine Learning at all. The short answer is: yes, it is. Indeed, many examples of successful Data Scientists winning prizes in Kaggle competitions without a tech background. But even they would agree that math skills provide a significant advantage when it comes to Data Science.

Although almost all algorithms are implemented in Python and R libraries, understanding basic mathematical concepts will make your study and application tasks much easier. Moreover, most machine learning articles contain mathematical deductions that would be difficult to read without mathematical knowledge.

There are three sections of mathematics that you need to know to be successful:

  • Fundamentals of linear algebra.
  • Foundations of Mathematical Analysis (integration, derivatives, and partial derivatives).
  • Fundamentals of probability theory and mathematical statistics

Programming Language

You should know how to program to work with data. For instance, to load data, parse, synthesize new features, or make any of your ideas come to life. The primary programming language for most Data Science professionals in Python.

Python is a simple language with many libraries for data processing and analysis. Popular R and Matlab are now less common, so if you are just beginning to learn Data Science, focus on Python.

Basic Machine Learning Algorithms

To start your professional path in machine learning, you need to know the basic Machine Learning tasks, the existing algorithms, and the approaches to solving one or another task. You also need to distinguish the algorithms of different specializations and understand their advantages and disadvantages.

Coursera has a great course with a straightforward presentation that will help you understand all of these aspects. Even though this course uses Octave rather than Python, you should take it. This way, you will learn the basics and principles of machine learning, as well as gain the necessary knowledge of linear algebra. The course does not require any prior preparation and is suitable for anyone planning to study Data Science.

The theoretical part of the course on Coursera is free, while the practical part is paid. Besides, there are various specialized courses from the universities like Stanford, Harvard, Michigan, Duke University, and so on.

Also, do not forget that machine learning is a practical discipline, so it is crucial to implement your knowledge in real data. Make it a rule to check out Kaggle – a competition platform for Data Science. There, you’ll find plenty of datasets where you can parse other participants’ solutions and practice your analytical skills. Eventually, you can try your luck in some open competition.

Deep Learning

With a basic understanding of Machine Learning and knowledge of Python, you can start exploring Deep Learning. It is one section of Machine Learning based on neural networks. Here I recommend studying the Deep Learning Specialization course on Coursera.

Individual Specializations

Individual specializations in Machine Learning can be performed when you have studied the material in blocks 1-3 and solved several application cases.

To sum up, your education may look as follows:

Mathematics – is the foundation of Data Science and Machine Learning, which you will need to understand in-depth Data Analysis and Machine Learning fundamentals.

  • Programming language – learn Python.
  • Basic Machine Learning algorithms – an obligatory point to start with if you are a complete beginner in data science.
  • Deep Learning – start after you have mastered point 3.
  • Separate specializations – start after you have master points 3 and 4.

Is it Hard to Learn Data Science?

It all depends on your background and mindset. With well-developed analytical and math skills, your path to Data Science will be pretty easy. If you are currently at school or university, try participating in math competitions. This will help you form the basis of analytical thinking and make it much easier to master the profession in the future.

If you have decided to switch to Data Science from another field, I would recommend solving practical tasks at Kaggle. Working on them and looking at other people’s solutions help develop logical and analytical skills. Pay attention to the blogs of different Data Scientists and YouTube channels with analyses and descriptions of how they built the model and what kind of logic they put into the solution.

Besides, there is a lot of freely available data that you can practice on. For example, take the COVID-19 incidence statistics and find a pattern (such a contest was recently conducted on Kaggle). You can look at other people’s solutions, analyze the logic, and gradually improve your knowledge of algorithms. With constant practice and analytical thinking, you’ll soon start to make your first progress in Data Science.

What to Read?

While profile literature can help you learn, keep in mind that technology evolves extremely fast, and the information in books gets outdated. Practice, understanding the subject area, the challenges, and the tools you have is essential for succeeding in Data Science.

Still, I recommend reading:

  • Hands-on Machine Learning with Scikit-Learn and TensorFlow;
  • Deep Learning (Adaptive Computation and Machine Learning series);
  • arXiv – a resource with scientific articles on Machine Learning and other sciences, including the basic ones.

For premium readers

This publication available for premium readers only. You can buy access to this publication or to all publications of this author.

Only this publication

€3.3

All publications of this author

€0.8/day.