I ran the code from my Fine-tuning “Connections” post using gpt-4o-mini
.
I was hoping the results might be a bit better, which could motivate an effort to fine-tune the model.
I’m not sure where my original version of this code went, so I reconstructed a repo for it.
Once I was done, I ran 100 prompts through the model to get a sense of where its baseline performance was.
Correct: 2.00%
Incorrect: 98.00%
Total Categories Correct: 19.25%
Not great, and not much different from gpt-3.5-turbo
.
With these kind of results, I wasn’t particularly motivated to put the effort in to do more fine tunes.
I read through the instructor
, marvin
and open-interpreter
docs for the first time in a while.
It has been interesting to see these libraries grow and diverge.
I also read through how Jason has been structuring an evals repo for instructor
.