2024-09-08

I was going to write a quick guide on how to get up and running using Google’s Gemini model via API, since I found it quite straightforward and Twitter is currently dunking on Google for how hard this is. When I tried to retrace my steps, the CSS for the documentation was failing to load with a 503, so I guess this will have to wait until another day.

2024-09-07

I am continuing to see a lot of buzz about ColPali and Qwen2-VL. I’d like to try these out but haven’t put together enough of the pieces to make sense of it yet. I am also seeing a lot of conversation about how traditional OCR to LLM pipelines will be superseded by these approaches. Based on my experience with VLMs, this seems directionally correct. The overall amount of noise makes it tough to figure out what is worth focusing on and what is real vs.

2024-09-05

Played around a bit with baml for extraction structured data with a VLM. It’s an interesting approach and has better ergonomics and tooling from most things I’ve tried so far. I like how you can declare test cases in the same place as the object schemas and that there is a built-in playground. I need to see how to handle multi-step pipelines. I experimented with doing data extraction from pictures of menus.
Benchmarking >80 LLMs shows: The best model is not necessarily the best for your programming language šŸ˜± - Best overall: Anthropicā€™s Sonnet 3.5 - Best for Go: Metaā€™s Llama 3.1 405B - Best for Java: OpenAIā€™s GPT-4 Turbo - Best for Ruby: OpenAIā€™s GPT-4o Good models for oneā€¦ pic.twitter.com/EYUphEI5rH — Markus Zimmermann (@zimmskal) September 2, 2024 Great to see more concrete results published on how different models are “the best” at writing different programming languages.

2024-08-31

Language models can’t generate instructions for knitting patterns generate crossword puzzles from scatch Language models can generate Connections puzzles

2024-08-29

Incredible read: https://eieio.games/essays/the-secret-in-one-million-checkboxes/ I failed many attempts at getting Sonnet to write code to display the folder structure of the output of a tree -F command using shortcodes. After a lot of prompting, I wrote a mini-design doc on how the feature needed to be implemented and used it as context for Sonnet. I tried several variants of instructions in the design including trying to improve it with the model itself for clarity.

2024-08-25

I tried Townie. As has become tradition, I tried to build a writing editor for myself. Townie got a simple version of this working with the ability to send a highlighted selection of text to the backend and run it through a model along with a prompt. This experience was relatively basic, using a textarea and a popup. From here, I got Townie to add the ability to show diffs between the model proposal and original text.

2024-08-23

I’ve been trying out Cursor’s hyped composer mode with Sonnet. I am a bit disappointed. Maybe I shouldn’t be. I think it’s not as good as I expected because I hold Cursor to a higher bar than the other developer tools out there. It’s possible it’s over-hyped or that I am using it suboptimally. But it’s more or less of the same quality as most of the tools of the same level of abstraction like aider, etc.
I tried out OpenRouter for the first time. My struggles to find an API that hosted llama3.1-405B motivated me to try this out. There are too many companies providing inference APIs to keep track. OpenRouter seems to be aiming to make all these available from a single place, sort of like AWS Bedrock, but not locked in cloud configuration purgatory. The first thing I tried was playing a game of Connections with nousresearch/hermes-3-llama-3.

2024-08-21

An interesting read about how the world works through an economic lens. But what is success? You can quantify net worth, but can you quantify the good you have brought to others lives? It is not all about the TAM monster–doing cool things that are NOT ECONOMICALLY VALUABLE, but ARTISTICALLY VALUABLE, is equally important.