I’ve been thinking about the concept of “prompt overfitting”. In this context, there is a distinction between model overfitting and prompt overfitting. Say you want to use a large language model as a classifier. You may give it several example inputs and the expected outputs. I don’t have hard data to go by, but it feels meaningful to keep the prompt generic or abstract where possible rather than enumerating overly specific cases in a way that obfuscates the broader pattern you’re hoping to apply. I hypothesize these overly specific examples could interfere with the model output in unintended, overly restrictive ways.