Qwen 3.6 Open vs Opus 4.7 vs Gemma 4
A same-day contrast between open local multimodal models and a closed frontier service.
Qwen’s 35B-A3B open-weight release landed the same day as Opus 4.7, putting a consumer-workstation local model next to a closed frontier service, with Gemma 4 as the clearest open-weight reference point.
On April 16, 2026, Anthropic released Claude Opus 4.7. Qwen released the first open-weight Qwen3.6 checkpoint the same day. Gemma 4 had arrived two weeks earlier. Those releases make a useful snapshot.
Gemma 4 and Qwen3.6 belong in the same comparison. They are open-weight multimodal models you can download, quantize, serve locally, and keep inside your own data path. Opus 4.7 belongs in another comparison. It is a managed frontier service you reach through Claude, the Anthropic API, Bedrock, Vertex AI, or Foundry.
This piece uses that split and leaves the full 2026 model derby aside. Gemini 3.1 Pro, GPT-5.4 Pro, and several other frontier systems sit outside this frame. The narrower question is simpler: how far have open-weight local multimodal models come, and how much distance still separates them from a top closed service?
Start With Gemma 4 and Qwen3.6
Gemma 4 is Google’s return to serious open-weight multimodal models. Qwen3.6 is Qwen’s answer in the same local tier. That is the first comparison.
The small Gemma 4 E models still matter. They make the family easier to adopt on laptops and edge devices. They sit outside the core comparison here. The meaningful open-model matchup is Gemma 4 26B A4B and Gemma 4 31B against Qwen3.6-35B-A3B.
At that tier, Qwen3.6 currently looks stronger on the work many local power users actually care about. On Qwen’s published tables, it leads Gemma 4 26B A4B across the main coding, agent, repo, and document benchmarks Qwen chose to foreground, and it often beats or matches Gemma 4 31B as well. The clearest wins are in repo work, terminal work, and document-heavy multimodal tasks. Gemma 4 31B still holds some ground on broader vision-reasoning tests such as MMMU-Pro, so this is not a clean sweep. The short version is simpler: Qwen3.6 currently looks like the sharper open-weight checkpoint in this size class. The full tables are in Qwen’s blog and model card.
These are vendor numbers. They need third-party replication. The pattern is still hard to dismiss. Qwen3.6 is pushing for the strongest open-weight local model in this class for coding, document work, and tool-heavy workflows.
That matters beyond the benchmark table. After Qwen’s recent reorg, the first open 3.6 checkpoint had to show that the line still had momentum. It did.
Gemma 4 looks better as a product line with a smoother open-weight stack. Google shipped a clearer family, more complete official docs, and a cleaner local on-ramp. Qwen3.6 currently looks like the sharper checkpoint edging ahead in performance.
The Local Story Already Shows the Philosophical Split
Google’s local story is unusually orderly. Gemma 4 has official paths for LM Studio, Ollama, llama.cpp, MLX, LiteRT-LM, Transformers, and vLLM. mlx-vlm already gives Apple Silicon users a direct route into Gemma-style multimodal workflows.
Qwen3.6 is more pieced together. The official model card points to Transformers, vLLM, SGLang, and KTransformers. Unsloth fills in a lot of the local story with GGUF builds, llama.cpp, and Unsloth Studio. The same guide treats 24 GB class hardware as the practical floor for a comfortable 35B-A3B local setup, and current Qwen multimodal GGUF flows still depend on a separate mmproj vision file. That is workable. It is less tidy.
The architecture tracks the same split. Gemma 4 offers dense and MoE options inside one family. Qwen3.6 opens with a capability-first 35B-A3B release built from a hybrid Gated DeltaNet and full-attention stack, with 256 experts, 3B activated parameters, and 262,144 native context. Gemma feels designed for adoption. Qwen feels designed to win the table that ambitious local users actually read.
Opus 4.7 Belongs to Another Product Class
Opus 4.7 is part of the same market conversation and a different model category.
Anthropic is selling frontier capability as a managed service. The release note puts the emphasis on advanced software engineering, difficult long-running tasks, better self-verification, higher-resolution vision, and stronger output for interfaces, slides, and docs. Opus 4.7 is generally available across Claude, the Anthropic API, Bedrock, Vertex AI, and Foundry. Pricing stays at $5 per million input tokens and $25 per million output tokens.
That package carries a different promise. Anthropic’s deal is simple: hand us the hard work, skip the model operations, and pay by use. Gemma 4 and Qwen3.6 offer another deal: download the weights, choose your stack, absorb more setup, and keep more control over cost, deployment, and data.
The same-day release makes that split easy to see.
Where They Overlap
The overlap starts with the work.
Anthropic describes Opus 4.7 as stronger on hard coding, long-running tasks, visual reasoning, interfaces, slides, and docs. Qwen3.6 is pitched around repo-level coding, tool use, document understanding, frontend generation, and multimodal reasoning. Gemma 4 is pitched around reasoning, coding, function calling, image understanding, and long context.
Put plainly, all three releases are aimed at repository-scale coding, document-heavy reasoning, multimodal office work, tool-using agents, and interface generation. The code gap is still real. The multimodal gap looks smaller in everyday work. Anthropic is pitching Opus 4.7 on long-running coding, higher-resolution vision, and stronger document work. Qwen is pitching Qwen 3.6 on repo work, tool use, and document-heavy multimodal tasks you can keep on your own machine.
That overlap matters because it shows where the value is. Open local models are being tuned for the same categories of work frontier vendors use to justify premium API pricing.
Where the Gap Is Still Large
The biggest remaining distance sits in reliability over long runs and in the managed service wrapped around the model.
Anthropic’s release note and partner quotes keep returning to the same points: Opus 4.7 carries work through long execution chains, checks its own work, recovers from failures, and produces cleaner output under real production conditions. Those are curated testimonials. They are weaker than neutral lab measurements. They still line up with the product pitch.
Opus 4.7 still looks like the stronger model for code and agentic tool use. Qwen 3.6 still matters because it brings a surprising amount of that workload into an open-weight model you can run locally. Gemma 4 stays in the picture as the cleaner open-weight product line, while Qwen 3.6 looks like the more ambitious checkpoint.
The top end of long-horizon engineering work still leans closed and managed.
Where the Gap Looks Smaller
The smaller gap shows up inside bounded, repeatable workflows.
If the task is document parsing, screenshot QA, diagram understanding, repo exploration, code editing inside a known codebase, or UI generation inside a constrained loop, open local models are much closer than they were even a year ago. Qwen3.6’s benchmark spread against Gemma 4 supports that claim. Gemma 4 supports it from the adoption side: the family is easier to run, easier to serve, and easier to slot into existing workflows.
Cost also changes the shape of the comparison. An API bill is simpler for occasional heavy use. Local open models start to look much better once the work is daily, the inputs are private, or the throughput needs are steady. Frequency, privacy, and setup tolerance decide that trade more than raw benchmark rank does.
The systems layer keeps helping the open side. Mixture-of-Experts matters here because MoE models keep many experts in memory while activating only a few per token. Total memory remains heavy. Per-token compute falls. TurboQuant pushes on the KV cache instead of the weights. In mlx-vlm, a 3.5-bit TurboQuant cache cuts Gemma 4 31B’s KV memory at 128K context from 13.3 GB to 4.9 GB, with peak memory dropping from 75.2 GB to 65.8 GB. REAP trims MoE experts while preserving router control over the ones that remain. Open-weight models can keep improving after release through vendor work and through downstream tooling.
That is one reason the middle of the gap keeps moving.
Gemma 4 and Qwen3.6 Also Point to Different Open-Weight Futures
Gemma 4 looks like Google’s attempt to make open models easier to live with. The family is broad. The docs are good. The supported engines are already in place. Apple Silicon users have a better path today. Google is packaging openness as a product.
Qwen3.6 looks like a capability push. The first open 3.6 release is a 35B-A3B model aimed straight at coding, tool use, long context, and repo work. The setup is rougher. The performance is harder to ignore.
Qwen’s recent release pattern makes the next step fairly easy to guess. Qwen3 spread across dense and MoE sizes. Qwen3.5 pushed native multimodal hybrids down into smaller checkpoints. Qwen3.6 open weights currently start at 35B-A3B. More 3.6 open checkpoints would fit that pattern, although Qwen has not announced them.
The Same-Day Release Still Matters
The April 16 pairing keeps the categories separate and makes the strategic contrast unusually clear.
Gemma 4 and Qwen3.6 show how far open-weight local multimodal models have moved into serious work. Opus 4.7 shows what a frontier managed service looks like when a vendor keeps the model, runs the infrastructure, and charges for reliability and polish.
The overlap is already large in coding, document work, multimodal understanding, and agent workflows. The biggest distance still sits in long-horizon reliability and service convenience. The middle of the gap is moving faster than many people expected.
Opus 4.7 is the stronger remote worker. Qwen 3.6 gives a powerful argument for local multimodal AI. One is a managed frontier service. The other is a consumer-workstation-class open-weight model that is getting uncomfortably close on parts of the workload that used to belong only to the closed frontier.
The cleanest way to feel that difference is to run one of the open models yourself. Start with Gemma 4 if you want the smoother setup. Start with Qwen3.6 if you want the stronger checkpoint and have the hardware. Use a messy PDF, a UI screenshot, or a repo you actually need to change. Then spend a few hours with Opus 4.7 on the same class of work. The philosophical split becomes concrete once the work is real.
Resources
Anthropic release note for Opus 4.7: https://www.anthropic.com/news/claude-opus-4-7
Anthropic pricing: https://claude.com/pricing
Qwen3.6-35B-A3B model card: https://huggingface.co/Qwen/Qwen3.6-35B-A3B
Qwen3.6 blog: https://qwen.ai/blog?id=qwen3.6-35b-a3b
Unsloth Qwen3.6 local guide: https://unsloth.ai/docs/models/qwen3.6
Gemma 4 launch blog: https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/
Gemma run docs: https://ai.google.dev/gemma/docs/run
Qwen3 launch blog: https://qwenlm.github.io/blog/qwen3/
Qwen3.5 model cards: https://huggingface.co/Qwen/Qwen3.5-0.8B, https://huggingface.co/Qwen/Qwen3.5-2B, https://huggingface.co/Qwen/Qwen3.5-4B, https://huggingface.co/Qwen/Qwen3.5-9B
Sebastian Raschka on Gemma 4 architecture: https://magazine.sebastianraschka.com/i/168650848/23-gemma-4
mlx-vlm: https://github.com/Blaizzy/mlx-vlm
TurboQuant: https://github.com/0xSero/turboquant



