I’ve been digging more into evals. I wrote a simple Claude completion function in openai/evals to better understand how the different pieces fit together. Quick and dirty code: from anthropic import Anthropic from evals.api import CompletionFn, CompletionResult from evals.prompt.base import is_chat_prompt class ClaudeChatCompletionResult(CompletionResult): def __init__(self, response) -> None: self.response = response def get_completions(self) -> list[str]: return [self.response.strip()] class ClaudeChatCompletionFn(CompletionFn): def __init__(self, **kwargs) -> None: self.client = Anthropic() def __call__(self, prompt, **kwargs) -> ClaudeChatCompletionResult: if is_chat_prompt(prompt): messages = prompt system_prompt = next((p for p in messages if p.
I can’t believe I am saying this but if you play around with language models locally, a 1 TB drive, might not be big enough for very long.
As someone learning to draw, I really enjoyed this article: https://maggieappleton.com/still-cant-draw. I’ve watched the first three videos in this playlist so far and have been sketching random objects from around the house. I find that I’m not too big of a fan of my drawing as I’m doing it but when I return to it later, I seem to like it more. Apparently, this is a common experience for creatives.
I’m taking a break from sketchybar for now. I’m currently looking into build a NL to SQL plugin or addition to datasette to use a language model to write queries.
πŸ€– Connections (claude-3-opus) Puzzle #287 🟩🟩🟩🟩 🟨🟨🟨🟨 🟦🟦🟦🟦 πŸŸͺπŸŸͺπŸŸͺπŸŸͺ I got this result twice in a row. gpt-4 couldn’t solve it. Here is one attempt. πŸ€– Connections (gpt-4) Puzzle #287 🟩πŸŸͺ🟩🟩 🟩🟩🟩🟩 🟦🟨🟦🟨 🟦🟨🟦🟨 🟦🟨🟦🟦 I tried https://echochess.com/. Kind of fun. I remember when my highschool teachers used to tell us Wikipedia wasn’t a legitimate source. It sort of feels like education is having this type of moment now with language models.
One of the greatest misconceptions concerning LLMs is the idea that they are easy to use. They really aren’t: getting great results out of them requires a great deal of experience and hard-fought intuition, combined with deep domain knowledge of the problem you are applying them to. The whole “LLMs are useful” section hits for me. I have an experience similar to Simon’s and I also wouldn’t claim LLMs are without issue or controversy.
Did a bit more work on a LLM evaluator for connections. I’m mostly trying it with gpt-4 and claude-3-opus. On today’s puzzle, the best either did was 2/4 correct. I’m unsure how much more improvement is possible with prompting or even fine tuning, but it’s an interesting challenge. Darwin, who kept a notebook where he wrote down facts that contradicted him, observed that frustrating, cognitively dissonant things were the first to slip his memory.
Setup a Temporal worker in Ruby and got familiar with its ergonomics. Tried out this gpt-4v demo repo Experimented with OCR capabilities of open source multi-modal language models. Tried llava:32b (1.6) and bakllava but neither seemed to touch gpt-4-vison’s performance. It was cool to see the former run on a macbook though.
I use the hyper+u keyboard shortcut to open a language model playground for convenience. I might use this 10-20 times a day. For the last year or so that I’ve been doing this, it has always pointed to https://platform.openai.com/playground. As of today, I’ve switched it to point to https://console.anthropic.com/workbench?new=1. Lately, I’ve preferred claude-3-opus to gpt-4. For a while, I had completely stopped looking for other models as gpt-4 seemed to be unchallenged, but it’s exciting to see new options available.
I tried setting up sqlite-vss with Deno following these instructions but got stuck on this error ❯ deno task dev Task dev deno run --allow-env --allow-read --allow-write --allow-net --unstable-ffi --allow-ffi --watch main.ts Watcher Process started. error: Uncaught (in promise) TypeError: readCstr is not a function export const SQLITE_VERSION = readCstr(sqlite3_libversion()); ^ at https://deno.land/x/[email protected]/src/database.ts:101:31 at eventLoopTick (ext:core/01_core.js:169:7) so I pivoted to Python. That effort eventually turned into this post.