I tried stacking multiple pages of a pdf vertically as a single image to a model, then doing data extraction from this. It didn’t work. I imagine this is because models aren’t trained on much data like this. The inference seemed to output made up data. An interesting pitch written by Hillel for preferring reStructuredText to Markdown. Multiple studies have shown that hallucinations can be significantly reduced by giving the model the right context via retrieval or tools that the model can use to gather context (e.
I wrote and screen-recorded myself building a Python app to call a model to extract structured data from an image, making heavy use of codegen with Cursor. The same protobuf is used as instructions in the prompt and to unpack the result returned by the model into an instance of the class generated from the protobuf via protoc. I’m planning to open source this pattern once I get it into a better state.
My thesis is clearer now. I'm short on "complex reasoning and agents" because it is often a scapegoat for poorly described problem spaces. My thoughts on capabilities are about figuring out the 80/20 and baking "complex reasoning" into specific tools, so you put fewer… — jason liu (@jxnlco) July 28, 2024 This point resonates with me. The more time I spend prompting models, the more it’s becoming clear that the clarity of the instructions are what matter most.
I ran the code from my Fine-tuning “Connections” post using gpt-4o-mini. I was hoping the results might be a bit better, which could motivate an effort to fine-tune the model. I’m not sure where my original version of this code went, so I reconstructed a repo for it. Once I was done, I ran 100 prompts through the model to get a sense of where its baseline performance was. Correct: 2.
Tried to join in on the llama3.1-405b hype using Groq but sadly, no dice curl -X POST https://api.groq.com/openai/v1/chat/completions \ -H "Authorization: Bearer $GROQ_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "llama-3.1-405b-reasoning", "messages": [ { "role": "user", "content": "Hello, how are you?" } ] }' {"error":{"message":"The model `llama-3.1-405b-reasoning` does not exist or you do not have access to it.","type":"invalid_request_error","code":"model_not_found"}} The queue to try it out in their chat is also quite long, so I guess either the infra needs to scale up or the hype needs to die down.
I’ve been wanting to create a chat component for this site for a while, because I really don’t like quoting conversations and manually formatting them each time. When using a model playground, usually there is a code snippet option that generates Python code you can copy out intro a script. Using that feature, I can now copy the message list and paste it as JSON into a Hugo shortcode and get results like this:

2024-07-21

espanso I tried out adding espanso to configure text expansions rather than using Alfred just to try something new. This is the PR to add it to my Nix configurations. The existing examples are a toy configuration. The tool seems to support far more complex configuration that I still need to look into further. gpt-4o-mini people frame this like it’s somehow a win over llama, when in fact the goal of llama has wildly succeeded: commoditize models and drive token cost to zero

2024-07-20

Incredible writing and insight by Linus in Synthesizer for thought. I will probably need to revisit this work several times.
How can I add videos to Google Gemini as context (is this even what their newest model is called anymore) and why is it so hard to figure it out? https://gemini.google.com only let’s me upload images. I assume I need to pay for something. I played around with Cohere’s chat. They support web search and calculator and a python interpreter as tools as well as files and an internet search connector.
Research and experimentation with models presents different problems than I am used to dealing with on a daily basis. The structure of what you want to try out changes often, so I understand why some folks prefer to use notebooks. Personally, notebooks haven’t caught on for my so I’m still just writing scripts. Several times now, I’ve run a relatively lengthy (and expensive) batch of prompts through a model only to realize something about my setup wasn’t quite right.