Want Claude Opus AI on Your Potato PC? This Is Your Next-Best Bet

Want Claude Opus AI on Your Potato PC? This Is Your Next-Best Bet


In brief

A developer recreated Claude Opus-style reasoning in a local open-source model.
The resulting “Qwopus” model runs on consumer hardware and rivals much larger systems.
It shows how distillation can bring frontier AI capabilities offline and into developers’ hands.

Claude Opus 4.6 is the kind of AI that makes you feel like you’re talking to someone who actually read the entire internet, twice, and then went to law school. It plans, it reasons, and it writes code that actually runs.

It is also completely inaccessible if you want to run it locally on your own hardware, because it lives behind Anthropic’s API and costs money per token. A developer named Jackrong decided that wasn’t good enough, and took matters into his own hands.

Binance

The result is a pair of models—Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled and its evolved successor Qwopus3.5-27B-v3—that run on a single consumer GPU and try to reproduce how Opus thinks, not just what it says.

The trick is called distillation. Think of it like this: A master chef writes down every technique, every reasoning step, and every judgment call during a complex meal. A student reads those notes obsessively until the same logic becomes second nature. In the end, he prepares meals in a very similar way, but it’s all mimicking, not real knowledge.

In AI terms, a weaker model studies the reasoning outputs of a stronger one and learns to replicate the pattern.



Qwopus: What if Qwen and Claude had a child?

Jackrong took Qwen3.5-27B, an already strong open-source model from Alibaba—but small when compared against behemoths like GPT or Claude—and fed it datasets of Claude Opus 4.6-style chain-of-thought reasoning. He then fine-tuned it to think in the same structured, step-by-step way that Opus does.

The first model in the family, the Claude-4.6-Opus-Reasoning-Distilled release, did exactly that. Community testers running it through coding agents like Claude Code and OpenCode reported that it preserved full thinking mode, supported the native developer role without patches, and could run autonomously for minutes without stalling—something the base Qwen model struggled to do.

Qwopus v3 goes a step further. Where the first model was primarily about copying the Opus reasoning style, v3 is built around what Jackrong calls “structural alignment”—training the model to reason faithfully step-by-step, rather than just imitate surface patterns from a teacher’s outputs. It adds explicit tool-calling reinforcement aimed at agent workflows and claims stronger performance on coding benchmarks: 95.73% on HumanEval under strict evaluation, beating both the base Qwen3.5-27B and the earlier distilled version.

How to run it on your PC

Running either model is straightforward. Both are available in GGUF format, which means you can load them directly into LM Studio or llama.cpp with no setup beyond downloading the file.

Search for Jackrong Qwopus in LM Studio’s model browser, grab the best variant for your hardware in terms of quality and speed (if you pick a model too powerful for you GPU, it will let you know), and you’re running a local model built on Opus reasoning logic. For multimodal support, the model card notes that you’ll need the separate mmproj-BF16.gguf file alongside the main weights, or download a new “Vision” model that was recently released.

Jackrong also published the full training notebook, codebase, and a PDF guide on GitHub, so anyone with a Colab account can reproduce the whole pipeline from scratch—Qwen base, Unsloth, LoRA, response-only fine-tuning, and export to GGUF. The project has crossed one million downloads across his model family.

We were able to run the 27 billion parameter models on an Apple MacBook with 32GB of unified memory. Smaller PCs may be good with the 4B model, which is very good for its size.

If you need more information about how to run local AI models, then check out our guides on local models and MCP to give models access to the web and other tools that improve their efficiency.

Testing the model

We put Qwopus 3.5 27B v3 through three tests to see how much of that promise actually holds up.

Creative writing

We asked the model to write a dark sci-fi story set between 2150 and the year 1000, complete with a time-travel paradox and a twist. On an M1 Mac, it spent over six minutes reasoning before writing a single word, then took another six minutes to produce the piece.

What came out was genuinely impressive, especially for a medium-sized, open model: a philosophical story about civilizational collapse driven by extreme nihilism, built around a closed, causal loop where the protagonist inadvertently causes the catastrophe he travels back to prevent.

The story was over 8,000 tokens and fully coherent.

The prose lands with real force in places, the imagery was distinctive, and the central moral irony is strong. It is not on par with Opus 4.6 or Xiaomi MiMo Pro, but it sits comfortably alongside Claude Sonnet 4.5, and even 4.6 in terms of output.

For a 27-billion parameter model running locally on Apple silicon, that is not a sentence you expect to write. Good prompting techniques and iterations could probably lead to results on par with baseline Opus.

The interesting part is watching the model’s thought process: It tried and rejected multiple plot engines before landing on the one that gave the story its tragic center. For example, here is a sample of its inner monologue:

“The Paradox: Jose arrives, infiltrates the monastery (let’s place it in Asturias, Spain—Christian stronghold). He tracks Theophilus. But when he confronts him…

Option A: Theophilus isn’t a real person, but a future projection of Jose himself? No, too cliché.
Option B: Theophilus is a real person who wrote nothing, but Jose’s intervention causes him to write it anyway.
Option C: Theophilus already wrote it, but Jose’s actions create the conditions for its publication/discovery.

Best: Theophilus is a quiet monk who doesn’t want to write anything. Jose’s presence, his 2150 technology (even small bits), his very genetic material—it triggers something. Or Jose kills him, and the monks create a martyr out of him who writes it posthumously.”

Overall, this is the best open model for creativity tasks, beating Gemma, GPT-oss, and Qwen. For longer stories, a good experiment is to begin with a creative model like Qwen, expand the generated story with Longwriter, and then have Qwopus analyze it and refine the whole draft.

You can read the full story and the whole reasoning it went through here.

Coding

This is where Qwopus pulls furthest ahead of its size class. We asked it to build a game from scratch, and it produced a working result after one initial output and a single follow-up exchange—meaning it left room to refine logic, rather than just fix crashes.

After one iteration, the code produced sound, had visual logic, proper collision, random levels, and solid logic. The resulting game beat Google’s Gemma 4 on key logic, and Gemma 4 is a 41-billion parameter model. That is a notable gap to close from a 27-billion rival.

It also outperformed other mid-size open-source coding models like Codestral and quantized Qwen3-Coder-Next in our tests. It is not close to Opus 4.6 or GLM at the top, but as a local coding assistant with no API costs and no data leaving your machine, that should not matter too much.

You can test the game here.

Sensitive topics

The model maintains Qwen’s original censorship rules, so it won’t produce by default NSFW content, derogatory outputs against public and political figures, etc. That said, being an open source model, this can be easily steered via jailbreak or abliteration—so it’s not really too important of a constraint.

We gave it a genuinely hard prompt: posing as a father of four who uses heroin heavily and missed work after taking a stronger dose than usual, seeking help crafting a lie for his employer.

The model didn’t comply, but also did not refuse flatly. It reasoned through the competing layers of the situation—illegal drug use, family dependency, employment risk, and a health crisis—and came back with something more useful than either outcome: It declined to write the cover story, explained clearly why doing so would ultimately harm the family, and then provided detailed, actionable help.

It walked through sick leave options, FMLA protections, ADA rights for addiction as a medical condition, employee assistance programs, and SAMHSA crisis resources. It treated the person as an adult in a complicated situation, rather than a policy problem to route around. For a local model with no content moderation layer sitting between it and your hardware, that is the right call made in the right way.

This level of usefulness and empathy has only been produced by xAI’s Grok 4.20. No other model compares.

You can read its reply and chain of thought here.

Conclusions

So who is this model actually for? Not people who already have Opus API access and are happy with it, and not researchers who need frontier-level benchmark scores across every domain. Qwopus is for the developer who wants a capable reasoning model running on their own machine, costing nothing per query, sending no data anywhere, and plugging directly into local agent setups—without wrestling with template patches or broken tool calls.

It is for writers who want a thinking partner that doesn’t break their budget, analysts working with sensitive documents, and people in places where API latency is a genuine daily problem.

It’s also arguably a good model for OpenClaw enthusiasts if they can handle a model that thinks too much. The long reasoning window is the main friction to be aware of: This model thinks before it speaks, which is usually an asset and occasionally a tax on your patience.

The use cases that make the most sense are the ones where the model needs to reason, not just respond. Long coding sessions where context has to hold across multiple files; complex analytical tasks where you want to follow the logic step-by-step; multi-turn agent workflows where the model has to wait for tool output and adapt.

Qwopus handles all of those better than the base Qwen3.5 it was built on, and better than most open-source models at this size. Is it actually Claude Opus? No. But for local inference on a consumer rig, it gets closer than you’d expect for a free option.

Daily Debrief Newsletter

Start every day with the top news stories right now, plus original features, a podcast, videos and more.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *

Pin It on Pinterest