Statistical Machine Learning Essentials - Articles

Articles - Statistical Machine Learning Essentials

Statistical machine learning refers to a set of powerful automated algorithms that are used to predict an outcome variable based on multiple predictor variables. The algorithms automatically improve their performance through “learning” from the data, that is they are data-driven and do not seek to impose linear or other overall structure on the data (P. Bruce and Bruce 2017). This means that they are non-parametric.

The different machine learning methods can be used for both:

classification, where the outcome variable is a categorical variable, for example positive vs negative
and regression , where the outcome variable is a continuous variable.

In this part, we’ll cover the following methods:

K-Nearest Neighbors, which predict the outcome of a new observation x as the average outcome of the k most similar observations to x (Chapter @ref(knn-k-nearest-neighbors)).
Decision trees, which build a set of decision rules describing the relationship between predictors and the outcome. These rules are used to predict the outcome of a new observations (Chapter @ref(decision-tree-models)).
Ensemble learning, including bagging, random forest and boosting. These machine learning algorithm are based on decision trees. They produce many tree models from the training data sets, and use the average as the predictive model. These results to the top-performing predictive modeling techniques. See Chapter @ref(bagging-and-random-forest) and @ref(boosting).

References

Bruce, Peter, and Andrew Bruce. 2017. Practical Statistics for Data Scientists. O’Reilly Media.

KNN: K-Nearest Neighbors Essentials

By kassambara, The 11/03/2018 in Statistical Machine Learning Essentials

The k-nearest neighbors (KNN) algorithm is a simple machine learning method used for both classification and regression. The kNN algorithm predicts the outcome of a new observation by comparing... [Read more]

CART Model: Decision Tree Essentials

By kassambara, The 11/03/2018 in Statistical Machine Learning Essentials

The decision tree method is a powerful and popular predictive machine learning technique that is used for both classification and regression. So, it is also known as Classification and... [Read more]

Bagging and Random Forest Essentials

By kassambara, The 11/03/2018 in Statistical Machine Learning Essentials

In the Chapter @ref(decision-tree-models), we have described how to build decision trees for predictive modeling. The standard decision tree model, CART for classification and regression trees,... [Read more]

Gradient Boosting Essentials in R Using XGBOOST

By kassambara, The 11/03/2018 in Statistical Machine Learning Essentials

Previously, we have described bagging and random forest machine learning algorithms for building a powerful predictive model (Chapter @ref(bagging-and-random-forest)). Recall that bagging... [Read more]