While machine learning algorithms have the potential to solve an enormous number of business problems, many companies struggle to identify where their limited resources for machine learning solutions are best applied. Thankfully, the problems that are most effectively solved with machine learning, share a few key characteristics.

In financial services, for instance, fraud detection is a promising area for machine learning because companies have thousands of data points’ worth of transaction information, and fraudulent transactions share a few key indicators. On the other hand, predicting the stock market is not a compelling machine learning use case; despite the abundance of data available, market prices are subject to a nearly unlimited number of unpredictable factors. A telecommunications company might be interested in using machine learning to predict customer churn. Using a model to identify customers at high risk of churning is highly effective, and analysts can also target high-value customers at risk of churning in order to maximize profit. Healthcare presents a wide variety of machine learning use cases, one of which is identifying defective sensors. Use cases such as patient diagnosis and the creation of treatment plans are promising areas of development for machine learning, but algorithms won’t replace the judgement of medical professionals any time soon. One of the most powerful applications of machine learning is predictive maintenance. Consider the oil and gas industry, where efficiency and value depend heavily on asset reliability. The ability to predict asset failure and take action before it occurs is an excellent opportunity to employ machine learning, since historical data on equipment maintenance and failure is often readily accessible.

The business problems best suited for machine learning solutions are similar in nature; most seek to answer a targeted, specific question—the answer to which can be captured in an abundance of historical data—and have the potential to increase efficiency and add value for the company. Two primary machine learning methods are supervised and unsupervised learning. Supervised learning requires training data labeled according to the target variable, and a supervised learning model learns to predict the target on unseen data. Unsupervised learning does not require labeled training data and does not output a prediction; instead, unsupervised learning algorithms derive patterns and other useful insights from a dataset.

Classes of ML

Supervised Learning

At its core, supervised learning is the method of using historical data to answer a question about future behavior, such as “will this piece of equipment fail?” or “will this customer leave the company?”. A supervised learning model learns from training data with the target variable and predicts the variable for new data points. Supervised learning falls into one of four categories: binary classification, multi-class classification, multi-label classification, and regression. Every class of ML problem can be solved with multiple ML algorithms, but a single algorithm is not usable for every class of ML. The subsequent post in this series will cover ML algorithms in greater detail and how to decide which algorithm to use.

Binary Classification: Binary classification sorts data into one of two categories, so each input returns either a “yes” or “no” output. Logistic regression, decision trees, and Naïve Bayes algorithms can all solve binary classification problems.

For the problem of predicting customer churn in telecommunications, a binary classification model would use historical data with customer profiles and information on which customers churned in order to predict if a current customer will churn.

Multi-class classification: Multi-class classification is made more complex by the presence of more than two output labels; instead of a “yes” or “no” output, the model must predict one of multiple output categories for each input value. Naïve Bayes and decision tree algorithms are better suited for multi-class classification, while logistic regression is less effective.

For example, a retail enterprise might use multi-class classification to predict whether a customer will be a high, medium, or low spender.

Regression: Regression is the method of predicting a continuous (as opposed to categorical) value given features of the historical data. A regression model might predict how a price changes over time or the number of days before a piece of equipment fails. A regression model predicts the numerical value of a dependent variable based on a series of independent variables, or predictors. Linear regression is the simplest algorithm for solving regression problems. Regularization techniques can improve the accuracy of linear regression models by reducing the risk of overfitting the model. (An overfitted model is trained so closely on the training data that it does not generalize well enough to make accurate predictions on unseen data.) Decision trees are popular for use on more complex regression problems.

For an oil and gas company looking to increase uptime through predictive maintenance, a regression model can predict the time left before a piece of equipment is likely to fail, which helps maintenance teams prioritize which repairs are most urgent and complete them before the failure.

Unsupervised Learning: Clustering

Clustering helps analysts recognize patterns that can explain certain phenomena, identify outliers in a dataset, and inform decisions based on customer demographics. A clustering model groups similar data based on a number of features, and analysts can derive insights about each group. Clustering is a powerful tool for anomaly detection—consider a sensor health use case in healthcare. Clustering data from one type of sensor helps indicate which sensors are malfunctioning, as they may all display a similar set of anomalous behaviors that are difficult or impossible to identify by examining data from a single sensor.

Unsupervised Learning: Dimensionality Reduction

Dimensionality reduction can greatly improve the performance of other models. When a dataset has many features, a larger amount of training data becomes necessary to accurately reflect all possible combinations of features and outcomes; otherwise, the model is unlikely to perform well on unseen data. Dimensionality reduction is the process of reducing the number of features in a data set by removing irrelevant features (feature selection) or consolidating into a smaller number of more powerful features (feature extraction). While dimensionality reduction is unlikely to be used in isolation, it can greatly improve the success of other models trained with the dataset. Dimensionality reduction is particularly important in cases where model interpretability, or the ability to explain why a model made the decision it did, is essential. The problem of fraud detection is a compelling use case for dimensionality reduction in conjunction with a clustering or classification algorithm; when a model identifies a transaction is as fraudulent, it’s critical that its reason for doing so is clear and explainable. Using a feature selection algorithm to eliminate features that don’t directly affect the likelihood of fraud allows analysts to easily identify why another algorithm identified a transaction as fraudulent.