Limited data volume and labels

The typical enterprise machine learning project will be medium scope, cost conscious and specialized.  In those circumstances it will be common to have either a limited quantity of training data or a good quantity of training data with a limited quantity of that data being labeled.

Therefore the mass scale “supervised learning” that might be applied to a moonshot project won’t work.  The good news is that there are alternatives:

Semi-supervised techniques can use  a small amount of labeled data with a large amount of unlabeled data.  Some of these techniques can help identify latent features that might be useful in providing explainability and assessing generalizability.

Transfer learning can take advantage of a general purpose model trained on a large set of labeled public data to more efficiently create a specialized model to meet the needs of your specific enterprise.  Andrew Ng gives an example of  this video.

Unsupervised learning can be used to identify patterns and generate insights that are helpful for decision support purposes even when they don’t lead in one step to high accuracy fully automated systems.

In pedagogical learning the learning process is seeded with concepts provided by domain experts.  While this limits the solution space it enables much more efficient learning.  The Inkling language is an example of this approach.

Reinforcement learning is a learn as you go approach.  Think of it as on-the-job-learning for AI.

Tools and methodologies that work well for mass scale supervised learning may not work well with these other techniques.  This is part of the reason why enterprise projects need a distinct approach from the moonshots that we often envision as typical machine learning projects.

Leave a Reply