My LLM workflows, June 2025

Some notes on my LLM workflows. These things keep evolving.

How I think about what I’m doing

My workflows are built around managing context. LLMs and LLM chatbots are models of $P(Y \mid X,T)$. $T$ is the training data and model/agent structure, $X$ is the context. Querying the model elicits a specific value $y$ from the set of outputs $Y$. Day to day and specific use is about $X$, pipeline and architecture is about $X$ and $T$. Say $y^\star$ is the desired answer or class of answers I want to elicit. “Managing context” means “constructing $x^\star$ to maximize $P(Y=y^\star \mid X=x^\star,T)$”. When $T$ is fixed let’s call this $P(Y=y^\star \mid X=x^\star)$.

I don’t always know what $y^\star$ looks like in detail until I see it but I always have ideas or some desiderata. Usually it’s something like “a program that does this verifiable thing” or “sentence that conveys an idea”, with some constraints. $y^\star$ is probably always a dense connected set in $Y$. This also means $x^\star$ is probably ill-posed and not unique: it can only ever be as defined as $y^\star$, and even then there are going to be multiple (possibly dense and connected) $x^\star$ that get to $y^\star$. It’s worth spending time to get to a clear problem statement and vision of the solution.

Preparing for productive work

I can articulate simpler problem statements and solution concepts pretty fast, often in a single prompt or chat. For more complicated things I often write vignettes, draw diagrams, and make mock outputs. I put these things into the context while I’m developing initial architecture and planning documents. I don’t usually keep them around once I’m in the weeds unless they seem very relevant. It’s not clear to me whether diagrams improve the context; I haven’t run any experiments on this.

I often start chats with requests to answer as a particular type of expert talking to a colleague familiar with things I actually know. I do this because I like having scaffolding to things I know, and because seems to improve the outputs. It also results in more analogies. Sometimes the analogies are very bad, sometimes there are interesting connections.

Code is nice because I can run it and see if it’s doing what I want, getting better or worse, etc. On a developed project I can do a lot of useful fiddling even when I’m mentally tired. Writing is still writing: it improves with editing, and editing takes time and thought.

IDE and web tools

I use Cursor Pro for most of my code editing. I use Claude Pro for most of my web stuff, and Gemini for the remainder. Gemini is decent, free, and it doesn’t use up any Claude tokens. I like Sonnet or even Haiku for most things; I find Opus overthinks but is fun to watch in Research Mode.

I find it much harder to manage the context window in Cursor. It’s also much easier to make sure I’m working with the latest code state and to update things spread across files. I don’t use it for project planning or architecture; I much prefer Claude Projects for that. I really like the “Project Knowledge” feature. I’m sure all the LLM chatbots have something like it. To be honest, I don’t see much differentiation between web chatbots in their specific features. They all keep changing and generally getting more stuff. I also tend not to notice too much difference in the quality of “factual” answers across LLM vendors; they all have their own quirks and tricks to get performance. But I like Claude’s “voice” best, and I find interacting with Claude productively more pleasant than the others. (I like void’s voice more for social interaction. I think void currently uses Gemini, so there’s clearly room to use post-training architecture to design a voice.)

I have “writing style” documents I carry across projects with notes on how I like to express and receive ideas. Some of the notes are high-level principles (“use active voice”, “every sentence earns its place”), others are particular pet peeves (“don’t use terms like ‘exponential growth’ unless that’s precisely what you mean to say”). All writing outputs are inputs to my editing process; it’s nice to start with better material. I often find it easier to dispose of or chop up proposals than to generate them. Funnily enough in my more productive human collaborations I often find generating a proposal easier than improving one.

Chats and artifacts

Chats are cheap and disposable. Artifacts are worth constructing and keeping so that chats can quickly be given useful context. A lot of my workflows involve running chats to generate and update artifacts before or while I work on a task. These artifacts describe architectures, outlines, plans, details, specific errors or language, and other chunks of useful information. Having artifacts is also psychologically helpful for me to quickly dispose of chats even when they have useful details: whatever I need can easily be moved to a new chat.

Artifacts let me split the context into artifacts $a$ and chats $c$. Artifacts approaching $a^\star$ are iteratively constructed in chats, which is useful because $c$ tends to diverge from $c^\star$ as the chat continues. So to get all the $y^\star$ I need I often end up running many chats with $P(Y \mid X=(a^\star,c))$ to construct $c^\star$. Chats are like bread starters, artifacts are yeast samples. Sometimes I have a small number of chats that are close to $c^\star$ in different ways to help me bake a good answer.