Skin cancer diagnosis AI illustrates why it is important to provide explainability and to look for the connections between the model’s behavior and actual causality operating in the system. By understanding why a prediction is being made and what a model says about the underlying system we can catch issues like this.
When dermatologists are looking at a lesion that they think might be a tumor, they’ll break out a ruler—the type you might have used in grade school—to take an accurate measurement of its size. Dermatologists tend to do this only for lesions that are a cause for concern. So in the set of biopsy images, if an image had a ruler in it, the algorithm was more likely to call a tumor malignant, because the presence of a ruler correlated with an increased likelihood a lesion was cancerous. Unfortunately, as Novoa emphasizes, the algorithm doesn’t know why that correlation makes sense, so it could easily misinterpret a random ruler sighting as grounds to diagnose cancer
When going from a proof-of-concept to a production application of this AI the lesion photos might change. For example you might standardize the capturing of photos to always have the ruler present. This would cause the production observations to not be consistent with the training set and would be one source of “model rot”.
In another paper a similar issue was found because doctors sometimes use purple markers to highlight potentially-malignant skin cancers for easier examination. Some argue that the purple marks are a real signal that should be incorporated in the model just as the visual appearance of the tumor itself is incorporated. However, if your goal is robust generalizability over time it is probably best to not have your AI incorporate the human applied purple marks as signal, as the standards for applying those marks may vary across teams and across time. In any case you certainly want to be aware that those purple marks are part of what is driving the models predictions so you can make a conscious decision about whether you want that to be the case. It is through a commitment to explainability and looking for underlying causation that you will become aware of these sorts of impacts.