← Writing
AI and agents

What local model keeps NanoClaw useful on a 32 GB MacBook Pro?

This started as a very practical shopping question: if I want NanoClaw to run locally on a 14-inch MacBook Pro with 32 GB of RAM, what model should I actually pick if I still need enough memory left over for a browser, normal desktop apps, and probably a few Docker containers? The first answer was easy. The honest answer took a little longer.

Drafted March 2026 - AI and agents - NanoClaw, Ollama, local models

The easy answer is to name a model. The problem is that model names age faster than the underlying system constraint. The original conversation that prompted this draft landed on qwen2.5-coder:7b-instruct-q4_K_M through Ollama. If I re-ran the same search today, the current Ollama docs would push me toward newer options like qwen3-coder or qwen3.5. That does not mean the earlier answer was foolish. It means the more durable question is not "which badge is newest?" It is "what model class still leaves the machine usable once the agent starts calling real tools?"

That distinction matters more with NanoClaw than with a plain chat app. NanoClaw is not just a text box. It is an agent runtime that wants to route messages, call tools, keep state, and sometimes hand work off to containers or other processes. The model is not the whole system. It is one part of a machine that also needs headroom for everything around it.

The first durable constraint is not the benchmark table

NanoClaw's own README makes the integration path pretty clear. There is now a supported /add-ollama-provider path for local open-weight models through Ollama, and for one-off experiments the repo still accepts any Claude API-compatible endpoint through ANTHROPIC_BASE_URL and ANTHROPIC_AUTH_TOKEN. Ollama's current docs make the other half clear: it exposes an Anthropic-compatible Messages API specifically so tools built for Claude-style backends can point at a local runtime instead.

That is why I would still keep the stack as simple as possible. One local runtime. One local model loaded by default. No translation proxies unless I actually need them. The biggest mistake on a laptop setup is usually not picking a model that is too dumb. It is picking a model that turns the rest of the system into collateral damage.

Why I still prefer the small-model class on this machine

If the requirement is "few moving pieces, local by default, still enough RAM for browser and Docker workloads," I would still start with a 7B to 9B coding-capable model class, not a 20B or 30B default. The reason is not that bigger models are bad. The reason is that the rest of the machine matters.

Qwen's own model card for Qwen2.5-Coder 7B describes a 7.61B-parameter coding model with long-context support and an explicit real-world code-agent use case. That is exactly the kind of class I want here: strong enough to do coding, shell glue, and agent orchestration competently, but small enough that I am not turning every browser launch into a resource fight.

The current Ollama docs are actually useful here because they reveal the shape of the tradeoff. Their Anthropic-compatibility page now recommends newer coding models such as qwen3-coder, and the same docs note that the 30B variant wants at least 24 GB of VRAM to run smoothly. That is a great coding benchmark story. It is not the same thing as a good whole-machine story on a laptop that also needs to run the rest of the agent stack.

The real choice is which part of the workflow gets premium resources

Default choice What you gain What you pay
7B to 9B local coder model Simple setup, fast enough responses, and enough headroom for browser sessions, containers, and tool calls. You give up some long-horizon reasoning and deep repository synthesis compared with larger models.
20B local general-purpose model Stronger single-model reasoning and a better chance of handling mixed assistant plus coding work. More memory pressure, more latency, and less room for the rest of the runtime to stay responsive.
30B local coding model Much better pure coding ceiling and a setup closer to current model-recommendation pages. The model starts competing directly with the rest of the machine, which is exactly what this setup was trying to avoid.
Hybrid local plus cloud routing Best overall capability if you reserve hard reasoning for a stronger remote model. More moving pieces, more policy work, and less of the clean "one local stack" answer.

That table is why I still like the original answer more than the flashier one. If the machine were a bigger desktop or a dedicated always-on node, I would be much more willing to bias upward. But that is not the actual problem. The actual problem is a single laptop that needs to keep being a laptop while the agent runs.

The exact setup I would start with

If you forced me to give one exact starting point for this hardware profile, I would still begin with Ollama and one small coder-oriented model, with the older qwen2.5-coder:7b-instruct-q4_K_M recommendation as a sensible example of the class. The interesting part is not whether that exact model remains the forever winner. The interesting part is the operational shape:

ollama pull qwen2.5-coder:7b-instruct-q4_K_M
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://localhost:11434

Or, if I wanted the more repo-native route, I would let NanoClaw wire Ollama through its supported provider flow instead of hand-setting environment variables every time.

I would also keep the operating posture conservative:

That last point is not mainly about finance. It is about architecture. A laptop-local model can be good enough to classify, summarize, scaffold, plan, and prepare. That does not automatically make it the right component to authorize live external actions.

What changed once I stopped asking for the best model

Once I framed the problem as a system-budget question instead of a leaderboard question, the answer got much calmer. I stopped caring so much about whether the latest recommendation page had moved on to a shinier model family. I cared more about whether the full machine would still feel trustworthy after NanoClaw, Ollama, a browser session, and a few background tools all started competing for the same resources.

That is the broader lesson I keep coming back to with agent systems. The model is never just the model. The real unit of evaluation is the whole operating loop: runtime, memory pressure, tool surface, latency, failure recovery, and what still works when the exciting demo becomes an actual daily system.

Where I landed

If I were setting this up today on that exact machine, I would still choose the small local coder-model class and a single Ollama runtime before I chased maximum benchmark prestige. I would rather have a slightly weaker model inside a responsive system than a stronger model that leaves the agent sitting in a beautiful wreck of swap pressure and stalled tools.

That is why my real answer is not "pick the newest model Ollama happens to feature this month." It is "pick the strongest local model that still leaves the rest of NanoClaw's world alive." On a 32 GB MacBook Pro, that still looks much more like a disciplined 7B to 9B default than a laptop-scale attempt to brute-force a 30B coding setup.

Sources I found useful: NanoClaw README, NanoClaw spec, Ollama Anthropic compatibility, Ollama Claude Code integration, and Qwen2.5-Coder 7B model card.