397B-parameter MoE multimodal LLM with 17B active params, 262K context, 201 languages
10K+
Qwen3.5-397B-A17B is a cutting-edge multimodal large language model developed by Alibaba's Qwen team, representing a significant advancement in AI foundation models. This model employs a hybrid Mixture-of-Experts (MoE) architecture combining Gated Delta Networks with sparse expert routing, achieving 397 billion total parameters with only 17 billion activated during inference. This efficient design enables high-performance inference with minimal latency and computational overhead.
The model features unified vision-language capabilities through early fusion training, seamlessly handling both text and image inputs. Qwen3.5 was trained using reinforcement learning at massive scale across million-agent environments, resulting in robust real-world adaptability and exceptional performance across reasoning, coding, agent tasks, and visual understanding benchmarks. With native support for 201 languages and dialects, the model provides truly global linguistic coverage with nuanced cultural and regional understanding.
Designed for enterprise and research applications, Qwen3.5-397B-A17B supports up to 262,144 tokens natively and can be extended to over 1 million tokens, making it suitable for complex long-context tasks including document analysis, multi-turn conversations, and agentic workflows.
| Attribute | Value |
|---|---|
| Provider | Qwen (Alibaba) |
| Architecture | Mixture of Experts (MoE) with Gated DeltaNet and Gated Attention |
| Total Parameters | 397B |
| Active Parameters | 17B |
| Context Length | 262,144 tokens (native), extensible to 1,010,000 tokens |
| Languages | 201 languages and dialects |
| Input modalities | Text, Image |
| Output modalities | Text |
| License | Apache 2.0 |
| Model Type | qwen3_5_moe |
docker model run qwen3.5-safetensors
For more information, check out the Docker Model Runner docs.

The model employs a sophisticated hybrid architecture:
Gated DeltaNet:
Gated Attention:
Mixture of Experts:
| Benchmark | GPT5.2 | Claude 4.5 Opus | Gemini-3 Pro | Qwen3-Max-Thinking | K2.5-1T-A32B | Qwen3.5-397B-A17B |
|---|---|---|---|---|---|---|
| MMLU-Pro | 87.4 | 89.5 | 89.8 | 85.7 | 87.1 | 87.8 |
| MMLU-Redux | 95.0 | 95.6 | 95.9 | 92.8 | 94.5 | 94.9 |
| SuperGPQA | 67.9 | 70.6 | 74.0 | 67.3 | 69.2 | 70.4 |
| C-Eval | 90.5 | 92.2 | 93.4 | 93.7 | 94.0 | 93.0 |
| Benchmark | GPT5.2 | Claude 4.5 Opus | Gemini-3 Pro | Qwen3-Max-Thinking | K2.5-1T-A32B | Qwen3.5-397B-A17B |
|---|---|---|---|---|---|---|
| IFEval | 94.8 | 90.9 | 93.5 | 93.4 | 93.9 | 92.6 |
| IFBench | 75.4 | 58.0 | 70.4 | 70.9 | 70.2 | 76.5 |
| MultiChallenge | 57.9 | 54.2 | 64.2 | 63.3 | 62.7 | 67.6 |
| Benchmark | GPT5.2 | Claude 4.5 Opus | Gemini-3 Pro | Qwen3-Max-Thinking | K2.5-1T-A32B | Qwen3.5-397B-A17B |
|---|---|---|---|---|---|---|
| AA-LCR | 72.7 | 74.0 | 70.7 | 68.7 | 70.0 | 68.7 |
| LongBench v2 | 54.5 | 64.4 | 68.2 | 60.6 | 61.0 | 63.2 |
| Benchmark | GPT5.2 | Claude 4.5 Opus | Gemini-3 Pro | Qwen3-Max-Thinking | K2.5-1T-A32B | Qwen3.5-397B-A17B |
|---|---|---|---|---|---|---|
| GPQA | 92.4 | 87.0 | 91.9 | 87.4 | 87.6 | 88.4 |
| HLE | 35.5 | 30.8 | 37.5 | 30.2 | 30.1 | 28.7 |
| HLE-Verified | 43.3 | 38.8 | 48.0 | 37.6 | -- | 37.6 |
| Benchmark | GPT5.2 | Claude 4.5 Opus | Gemini-3 Pro | Qwen3-Max-Thinking | K2.5-1T-A32B | Qwen3.5-397B-A17B |
|---|---|---|---|---|---|---|
| LiveCodeBench v6 | 87.7 | 84.8 | 90.7 | 85.9 | 85.0 | 83.6 |
| HMMT Feb 25 | 99.4 | 92.9 | 97.3 | 98.0 | 95.4 | 94.8 |
| HMMT Nov 25 | 100 | 93.3 | 93.3 | 94.7 | 91.1 | 92.7 |
| IMOAnswerBench | 86.3 | 84.0 | 83.3 | 83.9 | 81.8 | 80.9 |
| AIME26 | 96.7 | 93.3 | 90.6 | 93.3 | 93.3 | 91.3 |
| Benchmark | GPT5.2 | Claude 4.5 Opus | Gemini-3 Pro | Qwen3-Max-Thinking | K2.5-1T-A32B | Qwen3.5-397B-A17B |
|---|---|---|---|---|---|---|
| BFCL-V4 | 63.1 | 77.5 | 72.5 | 67.7 | 68.3 | 72.9 |
| TAU2-Bench | 87.1 | 91.6 | 85.4 | 84.6 | 77.0 | 86.7 |
| VITA-Bench | 38.2 | 56.3 | 51.6 | 40.9 | 41.9 | 49.7 |
| DeepPlanning | 44.6 | 33.9 | 23.3 | 28.7 | 14.5 | 34.3 |
| Tool Decathlon | 43.8 | 43.5 | 36.4 | 18.8 | 27.8 | 38.3 |
| MCP-Mark | 57.5 | 42.3 | 53.9 | 33.5 | 29.5 | 46.1 |
| Benchmark | GPT5.2 | Claude 4.5 Opus | Gemini-3 Pro | Qwen3-Max-Thinking | K2.5-1T-A32B | Qwen3.5-397B-A17B |
|---|---|---|---|---|---|---|
| HLE w/ tool | 45.5 | 43.4 | 45.8 | 49.8 | 50.2 | 48.3 |
| BrowseComp | 65.8 | 67.8 | 59.2 | 53.9 | 74.9 | 69.0/78.6 |
| BrowseComp-zh | 76.1 | 62.4 | 66.8 | 60.9 | -- | 70.3 |
| WideSearch | 76.8 | 76.4 | 68.0 | 57.9 | 72.7 | 74.0 |
| Seal-0 | 45.0 | 47.7 | 45.5 | 46.9 | 57.4 | 46.9 |
| Benchmark | GPT5.2 | Claude 4.5 Opus | Gemini-3 Pro | Qwen3-Max-Thinking | K2.5-1T-A32B | Qwen3.5-397B-A17B |
|---|---|---|---|---|---|---|
| MMMLU | 89.5 | 90.1 | 90.6 | 84.4 | 86.0 | 88.5 |
| MMLU-ProX | 83.7 | 85.7 | 87.7 | 78.5 | 82.3 | 84.7 |
This model card was automatically generated using cagent-action. Want to learn more about Docker Model Runner? Check out the project repository: https://github.com/docker/model-runner.
Content type
Model
Digest
sha256:80154e164…
Size
67 GB
Last updated
3 months ago
docker model pull ai/qwen3.5-safetensors:35B-A3BPulls:
483
Jun 1 to Jun 7