What you will learn
Large amount of data are recorded every day in different fields, including marketing, bio-medical and security. To discover knowledge from these data, you need machine learning techniques, which are classified into two categories:
- Unsupervised machine learning methods:
These include mainly clustering and principal component analysis methods. The goal of clustering is to identify pattern or groups of similar objects within a data set of interest. Principal component methods consist of summarizing and visualizing the most important information contained in a multivariate data set.
These methods are “unsupervised” because we are not guided by a priori ideas of which variables or samples belong in which clusters or groups. The machine algorithm “learns” how to cluster or summarize the data.
- Supervised machine learning methods:
Supervised learning consists of building mathematical models for predicting the outcome of future observations. Predictive models can be classified into two main groups:
regression analysis for predicting a continuous variable. For example, you might want to predict life expectancy based on socio-economic indicators.
Classification for predicting the class (or group) of individuals. For example, you might want to predict the probability of being diabetes-positive based on the glucose concentration in the plasma of patients.
These methods are supervised because we build the model based on known outcome values. That is, the machine learns from known observation outcomes in order to predict the outcome of future cases.
Here, we present a practical guide to machine learning methods for exploring data sets, as well as, for building predictive models.
You’ll learn the basic ideas of each method and reproducible R codes for easily computing a large number of machine learning techniques.
Key features of the tutorials
Our goal was to write a practical guide to machine learning for every one.
The main parts of the book include:
- Unsupervised learning methods, to explore and discover knowledge from a large multivariate data set using clustering and principal component methods. You will learn hierarchical clustering, k-means, principal component analysis and correspondence analysis methods.
- Regression analysis, to predict a quantitative outcome value using linear regression and non-linear regression strategies.
- Classification techniques, to predict a qualitative outcome value using logistic regression, discriminant analysis, naive bayes classifier and support vector machines.
- Advanced machine learning methods, to build robust regression and classification models using k-nearest neighbors methods, decision tree models, ensemble methods (bagging, random forest and boosting)
- Model selection methods, to select automatically the best combination of predictor variables for building an optimal predictive model. These include, best subsets selection methods, stepwise regression and penalized regression (ridge, lasso and elastic net regression models). We also present principal component-based regression methods, which are useful when the data contain multiple correlated predictor variables.
- Model validation and evaluation techniques for measuring the performance of a predictive model.
- Model diagnostics for detecting and fixing a potential problems in a predictive model.
The book presents the basic principles of these tasks and provide many examples in R. This book offers solid guidance in data mining for students and researchers.
- Covers machine learning algorithm and implementation
- Key mathematical concepts are presented
- Short, self-contained chapters with practical examples. This means that, you don’t need to read the different chapters in sequence.
At the end of each chapter, we present R lab sections in which we systematically work through applications of the various methods discussed in that chapter.