I’ve seen a lot of “GPT detection” products floating around lately. Sebastian discusses some of the products and their approaches in this article. Some products claim to have developed an “algorithm with an accuracy rate of text detection higher than 98%”. Unfortunately, this same algorithm determined a GPT-4 generated response from the prompt “write a paragraph in the style of Edgar Allan Poe” was 0% AI GPT. In my experience, you don’t need to try very hard to trick “AI-detection” systems. It seems that adding “in the style of…” to pretty much any prompt can thwart detections approaches. Even though these products don’t seem to work, there is clearly a market for them and many products in that market, which seems to indicate a desire for them to work. From these products’ marketing and news references, it appears folks in education are quite interested in them. I can’t begin to appreciate the challenges educators must be experiencing as they attempt to adjust to the changes brought on by the accessibility of LLMs. However, as someone who struggled to learn in the traditional education system, I do hope teachers will pivot their energies to adopting their curriculums rather than trying to maintain the status quo.

Andrej Karpathy lead a session at Microsoft Build giving an overview on models, training and the software ecosystem emerging around them. I particularly appreciated his “default recommendations” on building LLM applications:

  • Achieve your top possible performance
  • Optimize costs

He notes it’s hard to give generic advice and to me, reinforces that prompting is still as much an art as it is a science. We’ve found of a number of approaches that perform better relative to others, but at this point, there is no substitution for experimentation given your use case.