This paper by Papernot and McDaniel highlights a fundamental failure of many machine learning models: they fail to identify when an input is “out-of-distribution” and therefore give inaccurately confident predictions when presented with those inputs.
Their proposed solution is interesting. However there will certainly be other proposed solutions. We will see which proves most robust in practice.
What is more interesting is their good summary of the problem:
More generally, as the ML community moves towards an end-to-end approach to learning, models are taking up roles that used to be fulfilled by pre-processing pipelines. Features are no longer manually engineered to extract a representation of the input. A good example of that is machine translation—significant progress was made by replacing systems engineered for several decades with a holistic sequence-to-sequence model. This lack of pre-processing pipeline generally means that the input domain is less constrained. Despite this, models are deployed with little input validation, which implicitly boils down to expecting the classifier to correctly classify any input that can be represented by their input layer. This goes against one of the fundamental assumptions of machine learning: models should be presented at test time with inputs that fall on their training manifold. Hence, if we deploy a model in an environment were inputs may fall outside of this data manifold, we need mechanisms for figuring out whether a specific input/output pair is acceptable for a given ML model. In security, we sometimes refer to this as admission control. This is what we’d like to achieve by estimating the training data support for a particular prediction.
This problem is most dramatically highlighted via adversarial attacks, such as those illustrated in this paper and this one. However, real world variations of input without any bad actors involved can trigger similar failures.
In practice most data scientists are not conscious of these risks and blithely recommend deploying models without proposing any validating guardrails to go with them.