I spent some more time experimenting with thought partnership with language models. I’ve previously experimented with this idea when building write-partner. Referring back to this work, the prompts still seemed pretty effective for the goal at hand. My original idea was to incrementally construct and iterate on a document by having a conversation with a language model. A separate model would analyze that conversation and update the working draft of the document to include new information, thoughts or insights from the conversation.While I didn’t have much success getting gpt-4o to perform Task 1 - Counting Line Intersection from the Vision Language Models Are Blind paper, I pulled down some code and did a bit of testing with Claude 3.5 Sonnet. The paper reports the following success rate for Sonnet for this line intersection task:
Thickness Sonnet 3.5 2 80.00 3 79.00 4 73.00 Average 77.33 I used the code from the paper to generate 30 similar images with line thickness 4 of intersecting (or not) lines.We probably are living in a simulation and we’re probably about to create the next one.
Martin Casado
https://podcasts.apple.com/us/podcast/invest-like-the-best-with-patrick-oshaughnessy/id1154105909?i=1000661628717VLMs are Blind showed a number of interesting cases where vision language models fail to solve problems that humans can easily solve. I spent some time trying to build examples with additional context that could steer the model to correctly complete Task 1: Counting line intersections, but didn’t have much success.Kent wrote this post on how to engage an audience by switching the first and second slide of a presentation. The audience focuses more as they try to fill in the gaps of what you’ve introduced them to so far.I’ve been chatting with qwen2, a model from Alibaba. I mostly chatted with it in English but it appears to support several other languages and I noticed a bit of Chinese leaking through even though I don’t speak it, so I’m not sure how I would have introduced it to the conversation.
user: do you have a name
assistant: As an AI, I don’t have personal names or identities like humans do.I was inspired by Daniel’s post to add sidenotes to this blog. I used claude-3.5-sonnet to generate the CSS and HTML shortcode to do this. I was impressed how well it turned out. Now I need to read the CSS in more detail to understand what Claude did It was almost too easy. I’m not the most competent CSS writer and I had never written a Hugo shortcode before. In several turns with Sonnet in Cursor, I was able to create a basic styled shortcode for a sidenote that appeared as a superscript number to start.A nice read by Stuart on Python development tools. This introduced me to the pyproject.toml configuration file, which is more comprehensive than a requirements file. It’s something I’ll need to research a bit more before I’m ready to confidently adopt it.
Claude’s character
A video about the personality of the AI, Claude. I’ve not yet become a big “papers” person yet, so this was my first introduction to “Constitutional AI”, which is a training approach where you use the model to train itself, by having it evaluate its own responses against the principles with which it was trained.I reproduced Josh’s claude-3.5-sonnet mirror test. I hadn’t realized gpt-4 and claude-3-opus had also been “passing” this test since back in March. More interesting still, Sonnet actually seems to resist speaking in the first person about itself. Fascinating research and evolution of the models’ behaviors. After reading a bit more, apparently this type of model behavior has been around at least since Bing/Sydney (paywall, sorry).
https://onemillioncheckboxes.com is an amusing, massively-parallel art project(?I spent some time experimenting with OpenDevin using claude-3-opus (I couldn’t find an easy way to use claude-3.5-sonnet). The agentic capabilities were not bad. I gave a prompt and behind the scenes, the agent iterated, created files, ran code and course corrected. I didn’t love that there wasn’t an obvious way to interrupt or help course correct. My first attempt was with the same prompt I sent to Sonnet to build Tactic.