Basics of machine learning - Traffic prediction with ML and Azure ML Studio - Part 1
What is machine learning
Machine learning is an area of artificial intelligence devoted to algorithms that can improve their cognitive abilities through the experience. Using the training data, machine learning algorithms build a mathematical model. This model allows the prediction of output data based on the input data, without having to program it directly for this purpose.
Machine learning through the analysis of large data sets allows you to create dynamic patterns, presenting intelligent results, the basis of which is the use of statistical analysis operating on a large scale and with appropriate efficiency, difficult to obtain with other methods.
An example of the use of machine learning can be the estimation of stock values (where the current data is treated as a time series) or the prediction of a physical value based on known data. In the latter case, however, an argument is often presented against the use of machine learning in physical applications, due to the fact that such a model works like a black box - only the input data and the result are known, while physicists often want to understand what aspects of the input data affect the final value.
Depending on the available data and the required application, machine learning can be divided into four basic techniques:
- supervised learning,
- semi-supervised learning,
- unsupervised learning,
- reinforcement learning.
Supervised learning
Supervised learning is a learning process based on known inputs and outputs. This training method introduces signed training data, which is a pair containing the inputs and their associated label (expected result).
This solution is good when you know what you have to learn the system.
The machine learning algorithm analyzes the training data, and the result of this analysis is a model that maps that data to the expected output. The correctness of the model should be checked on the test data with which it has not had contact so far in order to check whether the generated results are in line with our expectations. The test data should be correctly classified based on the data available so far. This requires the generalizability of the model.
Supervised learning algorithms are often used in conjunction with traffic data. They are used, among other things, to recognize vehicles and pedestrians, determine their speed or predict the intensity of traffic at a specific time of the day.
Semi-supervised learning
Semi-supervised learning combines supervised and unsupervised learning. This approach combines a small amount of labeled data with a large amount of unlabeled data during training. Such a combination of data can significantly improve the accuracy of learning. Semi-supervised learning is a combination of transductive and inductive learning. The goal of transductive learning is to infer labels only for unlabeled data. Inductive learning, on the other hand, finds general rules and structures, and then performs prediction through generalizations.
The use of this type of learning is often associated with the impossibility or high cost of obtaining a complete labeled training set, while unlabeled data will be readily available. In such situations, semi-supervised learning has great practical value.
Semi-supervised learning is focused on the study of how computers and natural systems (such as the human brain) learn in an environment with both labeled and unlabeled data. The goal of such learning is to understand how mixed data can change cognitive processes, and to create algorithms using this.
Unsupervised learning
Unsupervised learning is a method that allows you to automatically analyze large data sets in search of connection patterns between variables. This method is most often used to classify and group data based on their statistical parameters.
Unlike supervised learning, the data used in training is not labeled, so we do not have the expected model output. This method can be especially useful when data from different sources must be integrated with each other in order to obtain a coherent and comprehensive model.
Unsupervised learning is used, inter alia, to define feelings and identify the emotional state based on the analysis of materials made available, for example, on social media. This allows you to automate customer satisfaction surveys.
Reinforcement learning
Reinforced learning does not require labeled input and output pairs and does not require suboptimal actions, that have to be corrected. It tries to find a balance between the exploration of new data and the exploitation of current knowledge.
This type of machine learning is used, among others, in systems that interact with the environment. During training, the system makes both good and wrong decisions. Correct actions are confirmed (reinforced) and the system learns how to react.