Multi-Label Classification

read 5 mts

Classification has been a go-to approach for various problems for many years now. However, with problems becoming more and more specific a simple classification model can’t be the solution for all of them. Rather than having one label/class for an instance, it’s more appropriate to assign a subset of labels for an instance. This is exactly how multi-label classification is different from multi-class classification. For example, Classifying if a piece of audio file is a music file or not is a classification problem while classifying all the genres of forms in a piece of fusion music is a multi-label classification.

Multi Label Classification

How can we assign multiple labels to one instance?

An intuitive approach would be to transform a multi-label problem into multiple single-label problems so existing binary classifiers can be used. But scikit-learn provides library scikit-multilearn for multi-label classification.

Let’s discuss various approaches to solve the multi-label classification:

1. Power Transformations
2. Adaptive Algorithm

Power Transformations

As the name suggests, we try to apply transformations on multiple labels to transform them into a single label problem. Let’s see the types of transformation we can apply.

I. Binary Relevance

This is very similar to the “One Vs Rest” Approach where we build multiple independent classifiers for an instance and choose the ones with maximum confidence. The underlying correlation between the labels is ignored by assuming that labels are mutually exclusive. 

For example in the music genre classification, we make a binary classification of each genre for each instance. Say, “if the music is pop or not”, “if the music is classic on not” and so on for all the labels present.

X1[pop, jazz]
X3[jazz, instrumental]

Transformed data of multiple binary classifications.

Scikit-multilearn module we can train a binary relevance classifier with existing single label classifiers.

II. Classifier Chains

In this transformation, we construct a chain of binary classifiers.  Classifiers train a chain of classifiers where the training space includes the independent variables and the previous classifiers in the chain. Let’s understand this better with a diagram below:

C1, C2, and C3 are single classifiers of the multi-label classification.

The drawback of missing label correlation is addressed in classified chains. Classifier chains are not suitable for data with a large number of labels as quality strongly depends on the label ordering in the chain. Let’s try to implement classifier chains using Scikit-multilearn learn library.

Click here to be part of INSOFE’s exciting research through our doctoral program for working professionals – World’s first Doctorate in Business Administration (DBA) in Data Science

III. Label Powerset

In this approach, we transform the multi-label classification problem into a multi-class classification problem. Every unique combination in the training set is assigned a label. This transformation captures the correlation among the classifiers. Let look at an example of this transformation.

Training Data


Label Powerset Transformation


One potential problem of this transformation is the complexity of the model. As the training samples increase, the number of classes also increases which may lead to computational infeasibility. It is also very prone to underfitting with large label spaces. Implementation of label powerset using the Scikit-multilearn library is given below:

There are many more methods to solve a multi-label classification such as “LabelSpacePartitioningClassifier”, “MajorityVotingClassifier” and more. [1]Can be referred for a detailed description of these methods.

Adaptive Algorithm

These algorithms focus on adapting binary classification algorithms to the multi-label classification generally by modifying cost functions.

Let’s see an implementation of a multi-label version of KNN represented as MLkNN.

Multi-Label Deep Learning

Scikit-multilearn also supports single-class and multi-class DNNs to solve multi-label problems through power transformation methods. The major difference in defining the multi-label classification is the activation function. As we consider the labels are independent of each other, we can use the sigmoid activation function. This will turn multi-label classification to n binary classification problems (n is the number of labels)and predict the probability of each label. As this, a binary classification now can use “binary_crossentropy” as a loss.

Let’s look at the implementation of this in Keras:

We can also use the Label Powerset multi-label to multi-class transformation approach, and all other available the advanced label space division methods available in scikit-multilearn. Note: The second parameter of our Keras wrapper to true, as the base problem is multi-class.

In this article, I introduced the multi-label classification problem and various approaches to solving these problems via multi-learn library by sckiti learn in python. Hope this gives you a head start when you face these kinds of problems. 




Leave a Reply

Your email address will not be published. Required fields are marked *