Want local vibe coding? This AI stack replaces Claude Code and Codex - and it's free

3 weeks ago 23

Want local vibe coding? This AI stack replaces Claude Code and Codex - and it's completely free

Follow ZDNET: Add us as a preferred source on Google.

ZDNET's key takeaways

Goose acts as the agent that plans, iterates, and applies changes.
Ollama is the local runtime that hosts the model.
Qwen3-coder is the coding-focused LLM that generates results.

If you've been programming for any number of years, you've pretty much lived through a bunch of hype cycles. Whether it's a new development environment, a new language, a new plugin, or some new online service with an oh-so-powerful time-saving API, it's all "revolutionary" and "world-changing," at least according to the PR reps hawking The Big New Thing.

And then there's agentic AI coding. When a tool can help you do four years of product development in four days, the impact is world-changing. While vibe coding has its detractors (for good reason), AI coding agents like OpenAI's Codex and Claude Code really are revolutionary. They are radically transforming the software industry.

Also: I tried a Claude Code alternative that's local, open source, and completely free - how it works

In my testing, I determined you can get a few hours of agentic coding done here and there with the $20/month plans from the AI companies. But if you're going to put in full days of coding, you'll need to upgrade to $100 or $200/month plans. Otherwise, you'll risk getting put on hold until your token allocation resets.

While both OpenAI and Anthropic have repeatedly said they respect the privacy of code bases, the fact is that both are doing their work on cloud infrastructure. That effort has an inherent security risk. Using these technologies might also violate agreements based on how you manage your source code or even where your work is done.

Recently, however, a possible solution to these challenges has been released. By combining three separate tools, it may be possible to replace pricy cloud-based coding platforms with a free AI agent that runs on your local computer.

Also: I've tested free vs. paid AI coding tools - here's which one I'd actually use

In my previous article, I showed you how to set up this environment and did some basic testing. I was able to confirm that this setup can run agentic coding (although I only gave it a simple problem, and it did have some challenges).

In this article, I'm going to take you through the three tools (Goose, Ollama, and Qwen3-coder) and explain what each contributes to the overall solution.

Then, in a follow-on article, I'll attempt to use this system to build a big project, extending my Claude Coded iPhone, Mac, and Apple Watch app to the iPad. Instead of using Claude Code for the project, I'm going to see if these three batches of bits can do the whole thing on my Mac, and for free.

Qwen3: The coding LLM

Let's start with Qwen3-coder, the coding-specific large language model. I picked Qwen because of Jack Dorsey's post on X, saying "goose + qwen3-coder = wow", and also because ZDNET's Jack Wallen recommended it to me when I asked about downloadable coding models.

Also: Stop using ChatGPT for everything: My go-to AI models for research, coding, and more (and which I avoid)

That's an issue I want to reinforce. We know models like OpenAI's GPT-5.2-codex and Anthropic's Opus-4.5 are great at coding, but they're cloud-based and come with a fee. We're looking at Qwen3-coder because it is free and downloadable.

Let's talk about what a large language model is. Think about ChatGPT. When you use it, you can choose a model (or, with the free version, a model is usually chosen for you). The interface, or the chatbot, is a separate piece of software from the model.

If we were to use a car analogy, the model is the engine, and the chatbot is the passenger compartment with the steering wheel and dashboard.

Qwen3-coder is a specialized version of the Qwen3 LLM from Alibaba. It's the piece of software that actually writes the code. This model generates code from prompts and understands programming languages, frameworks, and patterns. It can refactor code (make code-wide changes), run diffs (compare code), create code explanations, and fix code.

Also: Xcode 26.3 finally brings agentic coding to Apple's developer tools

The coding model is incapable of managing multi-step workflows. It doesn't know when to stop working on a problem or when to iterate on a problem. The model also has no memory of anything beyond the currently running context.

Ollama: The model runtime

Ollama is the local model runtime and serving layer. Models don't run on their own. Using a database as an analogy, a model is like the database itself, a collection of information. In the case of a model, it's a giant repository of knowledge.

Ollama is like the database engine. The main difference between a database and a database engine is that a database engine inserts and extracts data from the actual database. Ollama only extracts information from the large language model, so it's more of a runtime (a system that runs something previously built by another system) than a full engine.

Ollama is the infrastructure that actually runs large language models on your machine and makes them available to other processes via a local API. It downloads, installs, and manages local LLMs. It runs inference processes on your hardware (CPU or GPU). It makes the models available to other processes through a consistent API endpoint. It also handles model switching, versioning, and resource control.

Also: Is ChatGPT Plus still worth your $20? I compared it to the Free, Go, and Pro plans - here's my advice

On the other hand, Ollama does not understand your project goals. It does not manage conversations or tasks.

There's one other thing to note. Ollama itself isn't a specialized coding tool. It only knows coding if the LLM it's currently running knows coding.

Because it accepts API calls for LLM access, Ollama is something of an AI server, sitting between the LLM and the chatbot interface.

Goose: The coding manager

Goose is basically the agent part of the puzzle, providing orchestration for the other main components. It's the part that understands intent, manages tasks, and decides what to ask the model to do next.

Goose interprets your programming prompts. If you like the idea of vibe coding, Goose decodes the vibes you give it and breaks work into steps related to analysis, planning, code generation, and testing. It's the part of the system that maintains the conversational and task context across iterations.

Also: How to create your first iPhone app with AI - no coding experience needed

In concert with the human guiding it, Goose decides whether a change merits a module or block rewrite, and whether code can just be modified. It also handles workflow commands like "scan the repo, propose changes, apply diffs."

Goose doesn't generate code itself. It doesn't run the models directly (although it talks to them). And it doesn't know anything about code syntax unless the model it's using helps out.

Goose is essentially the director and project manager of the vibe coding process.

A typical workflow

So, let's look at how all three components work together to enable you to generate code:

The human provides a prompt describing a programming goal.
Goose interprets that goal and decides what to do.
Goose sends a precise coding prompt to Ollama.
Ollama runs Gwen3-coder locally on your computer.
Gwen3-coder returns code or analysis.
Goose decides whether to apply it, refine it, or ask again.

This workflow model is why vibe coding feels fluid. You can stay abstract and intuitive while the system translates your prompts into tangible code changes.

Also: I used Claude Code to vibe code a Mac app in 8 hours, but it was more work than magic

While this approach works really well for these three tools, other agentic coding environments like Claude Code or OpenAI Codex have their own mix of the coding LLM, the model runtime, and the programming manager. They're just all running behind the front-end interface that the coding products present to their developer users.

In terms of the three tools we're talking about here, this architecture provides a lot of flexibility and control. For example, you can swap out the Gwen3-coder LLM for another coding model without changing Goose. You can update or optimize Ollama without touching your workflows. Over time, Goose may evolve into a smarter agent without retraining models. Plus, everything is local, inspectable (I think that's a word), and modular.

Your software engineering department in a box

Here's a fun way to think about this approach. Once you set up Goose, Ollama, and Qwen3-coder on your local machine, you effectively have a software engineering department in a box. Goose is the senior engineer guiding the session. Ollama is the infrastructure engineer who manages your computing environment. Qwen3-coder is a fast, talented junior developer who's writing code.

What about you? Have you tried local, agent-based coding tools like Goose with Ollama and a downloadable coding model? Or, are you still relying on cloud-based services like Claude Code or Codex?

Does the idea of keeping your code and prompts entirely on your own machine appeal to you, or do you see trade-offs that would make this approach impractical for your work? How do you feel about mixing and matching components, such as swapping models or runtimes, instead of using an all-in-one coding platform? Let us know in the comments below.

You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.

Read Entire Article