Deep Learning: An Overview
Let us begin with machine learning
Machine learning means that machines can learn to use big data sets to learn rather than hard-coded rules. It is the core of artificial intelligence and the fundamental way to make computers intelligent. It mainly uses induction, synthesis rather than deduction. Machine learning allows computers to learn by themselves. This type of learning benefits from the powerful processing power of modern computers and can easily handle large data sets.
What is supervised/unsupervised learning?
Supervised learning applies a tagged data set containing input values and expected output values. When training AI with supervised learning, we need to enter a value for it and tell it the expected output value. If the output value generated by the AI is incorrect, it will adjust its calculation. This process will iterate as the dataset is updated until the AI no longer makes mistakes.
A typical application for supervised learning is the weather forecast AI application. AI uses historical data to learn how to predict weather. The training data includes input values (air pressure, humidity, wind speed, etc.) and output values (temperature, etc.).
Formal introduction to deep learning
The concept of deep learning stems from the research of artificial neural network. It is a new field in machine learning research. The purpose is to establish and simulate the neural network of human brain for analytical learning. It mimics the mechanism of human brain to interpret data, such as images. Sound and text, supervised learning and unsupervised learning mentioned above are two of the learning methods in deep learning.
As a machine learning method, deep learning allows us to train AI to predict output values with a given input value. Both supervised learning and unsupervised learning can be used to train AI, and the way to train is to use neural networks.
Neural network implementation
Like humans or animals, the “brain” of an artificial intelligence system has a neuron-like concept. Neurons are divided into three different levels:
1. Input layer 2. Hidden layer (may have multiple) 3. Output layer
The input layer receives the input data, and the input layer receives the information and passes it to the first hidden layer. The hidden layer performs mathematical operations on the input data. How to determine the number of hidden layers and the number of neurons in each layer is still a challenge to construct a neural network. Finally, the output layer returns the output data.
Suppose we need to design a tool that can predict flight fares. Its neural network looks like this: The word “depth” in deep learning refers to more than one hidden layer in a neuron.× Dismiss alert. Each connection between neurons is closely related to weight, which determines the importance of the input values. The initial weights are randomly set. For example, when it is necessary to predict the ticket price of a flight, the departure date is one of the most important factors, so the connection between the neurons of the departure date will have a large weight. Each neuron has an activation function. Without a certain amount of mathematical knowledge, it is difficult to understand these functions. However, this article is for beginners, so I won’t explain esoteric mathematics here.
In simple terms, one of the goals of these functions is to “normalize” the output values of neurons. Once a set of input data passes through all levels of the neural network, the AI returns the output value through the output layer. This will ensure that the final output is in line with expectations, and the answer is no.
You also need to train the neural network
This neural network is trained only by a large amount of data, so that the functions in the algorithm can be more closely corrected to get the correct result. To train the AI, we need to provide it with input values from the dataset and then compare the output of the AI to the output of the dataset. Since the AI has not yet been trained, there are many errors in the output value. Once all the data in the entire data set has been entered, we can create a function that shows us how much the difference between the AI output value and the true output value. This function is called “Cost Function”. Ideally, we want the cost function to be zero, but only when the output value of the AI is the same as the output value of the data set.
How to reduce the cost function?
We mentioned the concept of “weight” in the above. In the operation of reducing the cost function, weight plays a crucial role. Changing the weights between neurons can adjust the cost function, and we can change them randomly until the cost function is close to zero, but this method is inefficient.
In this case, a good method called Gradient Descent came out. Gradient descent is a way to find the minimum value of a function. We need to find the minimum value of the cost function in the model, which depends on the gradient descent. Gradient descent works by changing the weight in small increments after each iteration of the data set. By calculating the derivative (or gradient) of the weighted cost function, we can see in which direction the minimum can be found.
To minimize the cost function, we need to iterate over the data set multiple times, which requires the system to have powerful computing power. Using gradient descent update weights can be done automatically, which is the magic of deep learning! In addition, there are many types of neural networks. Different AIs use different neural networks. For example, computer vision technology uses Convolutional Neural Networks, and natural language processing uses Recurrent Neural Networks.
Deep learning requires a neural network to mimic the intelligence of an animal.
There are three types of neurons in a neural network, the input layer, the hidden layer (which can have multiple levels), and the output layer.
The connections between neurons are related to weights, which determine the importance of the input values.
Applying an activation function to the data allows the neuron’s output values to be “normalized”.
To train a neural network, you need a big data set.
Iterating the dataset and comparing the AI output to the dataset output will produce a cost function that shows the difference between the AI’s output and the real output.
After each iteration of the data set, the weight between the neurons is reduced by the gradient, reducing the value of the cost function.