Rather than put guardrails around a model some researchers suggest how to train a model to give low-confidence indicators for results generated from out-of-distribution inputs. This paper is a good example.
Deep neural networks “perform well only when evaluated on instances very similar to those from the training set. When evaluated on slightly different distributions, neural networks often provide incorrect predictions with strikingly high confidence … systems quickly degrade in performance as the distributions of training and testing data differ slightly from each other … This problem is one of the most central challenges in deep learning … vanilla neural networks spread the training data widely throughout the representation space, and assign high confidence predictions to almost the entire volume of representations. This leads to major drawbacks since the network will provide high-confidence predictions to examples off the data manifold, thus lacking enough incentives to learn discriminative representations about the training data. To address these issues, we … encourages the neural network to be uncertain across the volume of the representation space unseen during training. This leads to concentrating the representations of the real training examples in a low dimensional subspace, resulting in more discriminative features”