Introduction to Classification
Learning Objectives
After this unit, students should be able to
- define the classification problem.
- state the types of classification problems.
A classification problem in machine learning involves predicting the category or class of an input based on its features. In these problems, the goal is to map input data to predefined labels or categories. The output of the regression problem is a score, which is a numeric variable, whereas the output of the classification problem in a label, which is a categorical variable.
Classification problems are classified based on the number of classes as follows:
-
Binary classification. It aims at assigning one of the two possible labels to every datapoint. Typical examples include spam filters (classify the input email as a spam or not a spam), medical diagnosis (classify a tumor is benign or malignant), customer churn prediction (classify if the customer of an e-commerce platform will take), image classification (classify an image as one of the two possible classes).
-
Multi-class classification. It aims at assigning one of the \(K\) possible labels to every datapoint. Typical examples include topic modeling (classify the specified document in one of the possible topics), image classification (classifying images of animals into their categories), genre tagging (classifying a movie into one of the possible genres).
-
Multi-label classification. It aims at assigning one or more possible labels to every datapoint. This differs from multi-class classification, where each instance is assigned to only one label. Typical examples include topic modeling wherein a document may belong multiple topics at the same time, medical diagnosis wherein every patient may be labeled with multiple symptoms, tagging posts in social media where a single post carries multiple tags.
-
Anomaly detection. It can be considered a special kind of binary classification. The subtle difference exists in the representation of data from anomalous classes. Typically anomalous data is sparser than the normal data. Typical example include fraud detection, quality control systems in manufacturing industries, network security monitors, etc.
Classification problems are classified based on the membership of the datapoint to the classes as follows:
-
Deterministic Classifier. Deterministic classifier assigns a specific class label to an input without ambiguity.
-
Probabilistic Classifier. Probabilistic classifier assigns a probability distribution over the possible classes for an input. They make predictions based on the likelihood of each class.