Active learning is a data labeling / annotation method in machine learning. Instead of labeling the data randomly or exhaustively, active learning uses a machine learning model to find difficult samples that need human labeling.
Here’s roughly how it works.
- Train an ML model with the currently labeled samples.
- Using the model, predict on all the unlabeled data.
- Pick the sample that the model is most uncertain about. For binary classification, this would be the sample that was closest to
0.5
. - Ask the human to label that sample to get the ground truth. Add this new sample to your training set.
- Go to 1 and repeat until there is enough data to train your model to sufficient accuracy.
https://distill.pub/2020/bayesian-optimization/