This is interesting as it runs counter to what many people think about current AI. Its performance seems directly linked to the quality of the training data it has. Here the opposite is happening; it has poor training data and still outperforms humans. It's not surprising the humans would do badly in this situation too; it's hard to keep up to date on things that you may only encounter once or twice in your entire career. It's interesting to extrapolate from this observation as it applies to many other fields.
I mean, recognition is literally the task that is always used for intro to machine learning. From facial recognition and other biometric, handwriting, object recognition. It isn't a surprise that "AI" is able to outperform humans in this task since sometimes AI can pick up features that are too subtle for us to notice. The problem is LLM being hailed as the truth machine or AGI. LLM to NLP is what CNN and GAN is to image processing tasks.
They should provide that instantly if the patient wants it (once the scan is developed). Ad whatever disclaimers and waivers you want, but I wouldn't mind an instant answer.
If you tell a profesional that the answer is "B", while the professional had "A" in mind, you will have to convince them on why "B" is the correct answer, or they will ignore your suggestion. I think a good LLM model should be able to tell which features it valued most in it's reasoning. It would make it much easier to get used to as a tool that way.