ai/qwen3.5-safetensors

Verified Publisher

By Docker

•Updated 3 months ago

397B-parameter MoE multimodal LLM with 17B active params, 262K context, 201 languages

Model

10K+

Overview Tags

ai/qwen3.5-safetensors repository overview

⁠Qwen3.5

Qwen3.5-397B-A17B is a cutting-edge multimodal large language model developed by Alibaba's Qwen team, representing a significant advancement in AI foundation models. This model employs a hybrid Mixture-of-Experts (MoE) architecture combining Gated Delta Networks with sparse expert routing, achieving 397 billion total parameters with only 17 billion activated during inference. This efficient design enables high-performance inference with minimal latency and computational overhead.

The model features unified vision-language capabilities through early fusion training, seamlessly handling both text and image inputs. Qwen3.5 was trained using reinforcement learning at massive scale across million-agent environments, resulting in robust real-world adaptability and exceptional performance across reasoning, coding, agent tasks, and visual understanding benchmarks. With native support for 201 languages and dialects, the model provides truly global linguistic coverage with nuanced cultural and regional understanding.

Designed for enterprise and research applications, Qwen3.5-397B-A17B supports up to 262,144 tokens natively and can be extended to over 1 million tokens, making it suitable for complex long-context tasks including document analysis, multi-turn conversations, and agentic workflows.

⁠Characteristics

Attribute	Value
Provider	Qwen (Alibaba)
Architecture	Mixture of Experts (MoE) with Gated DeltaNet and Gated Attention
Total Parameters	397B
Active Parameters	17B
Context Length	262,144 tokens (native), extensible to 1,010,000 tokens
Languages	201 languages and dialects
Input modalities	Text, Image
Output modalities	Text
License	Apache 2.0
Model Type	qwen3_5_moe

⁠Using this model with Docker Model Runner

docker model run qwen3.5-safetensors

For more information, check out the Docker Model Runner docs⁠.

⁠Architecture Details

Architecture Benchmark

The model employs a sophisticated hybrid architecture:

Hidden Dimension: 4096
Vocabulary Size: 248,320 tokens (padded)
Number of Layers: 60
Layer Configuration: 15 × (3 × (Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE))

Gated DeltaNet:

Linear Attention Heads: 64 for V, 16 for QK
Head Dimension: 128

Gated Attention:

Attention Heads: 32 for Q, 2 for KV
Head Dimension: 256
Rotary Position Embedding Dimension: 64

Mixture of Experts:

Total Experts: 512
Active Experts: 10 routed + 1 shared
Expert Intermediate Dimension: 1024

⁠Benchmarks

⁠Knowledge

Benchmark	GPT5.2	Claude 4.5 Opus	Gemini-3 Pro	Qwen3-Max-Thinking	K2.5-1T-A32B	Qwen3.5-397B-A17B
MMLU-Pro	87.4	89.5	89.8	85.7	87.1	87.8
MMLU-Redux	95.0	95.6	95.9	92.8	94.5	94.9
SuperGPQA	67.9	70.6	74.0	67.3	69.2	70.4
C-Eval	90.5	92.2	93.4	93.7	94.0	93.0

⁠Instruction Following

Benchmark	GPT5.2	Claude 4.5 Opus	Gemini-3 Pro	Qwen3-Max-Thinking	K2.5-1T-A32B	Qwen3.5-397B-A17B
IFEval	94.8	90.9	93.5	93.4	93.9	92.6
IFBench	75.4	58.0	70.4	70.9	70.2	76.5
MultiChallenge	57.9	54.2	64.2	63.3	62.7	67.6

⁠Long Context

Benchmark	GPT5.2	Claude 4.5 Opus	Gemini-3 Pro	Qwen3-Max-Thinking	K2.5-1T-A32B	Qwen3.5-397B-A17B
AA-LCR	72.7	74.0	70.7	68.7	70.0	68.7
LongBench v2	54.5	64.4	68.2	60.6	61.0	63.2

⁠STEM

Benchmark	GPT5.2	Claude 4.5 Opus	Gemini-3 Pro	Qwen3-Max-Thinking	K2.5-1T-A32B	Qwen3.5-397B-A17B
GPQA	92.4	87.0	91.9	87.4	87.6	88.4
HLE	35.5	30.8	37.5	30.2	30.1	28.7
HLE-Verified	43.3	38.8	48.0	37.6	--	37.6

⁠Reasoning

Benchmark	GPT5.2	Claude 4.5 Opus	Gemini-3 Pro	Qwen3-Max-Thinking	K2.5-1T-A32B	Qwen3.5-397B-A17B
LiveCodeBench v6	87.7	84.8	90.7	85.9	85.0	83.6
HMMT Feb 25	99.4	92.9	97.3	98.0	95.4	94.8
HMMT Nov 25	100	93.3	93.3	94.7	91.1	92.7
IMOAnswerBench	86.3	84.0	83.3	83.9	81.8	80.9
AIME26	96.7	93.3	90.6	93.3	93.3	91.3

⁠General Agent

Benchmark	GPT5.2	Claude 4.5 Opus	Gemini-3 Pro	Qwen3-Max-Thinking	K2.5-1T-A32B	Qwen3.5-397B-A17B
BFCL-V4	63.1	77.5	72.5	67.7	68.3	72.9
TAU2-Bench	87.1	91.6	85.4	84.6	77.0	86.7
VITA-Bench	38.2	56.3	51.6	40.9	41.9	49.7
DeepPlanning	44.6	33.9	23.3	28.7	14.5	34.3
Tool Decathlon	43.8	43.5	36.4	18.8	27.8	38.3
MCP-Mark	57.5	42.3	53.9	33.5	29.5	46.1

⁠Search Agent

Benchmark	GPT5.2	Claude 4.5 Opus	Gemini-3 Pro	Qwen3-Max-Thinking	K2.5-1T-A32B	Qwen3.5-397B-A17B
HLE w/ tool	45.5	43.4	45.8	49.8	50.2	48.3
BrowseComp	65.8	67.8	59.2	53.9	74.9	69.0/78.6
BrowseComp-zh	76.1	62.4	66.8	60.9	--	70.3
WideSearch	76.8	76.4	68.0	57.9	72.7	74.0
Seal-0	45.0	47.7	45.5	46.9	57.4	46.9

⁠Multilingualism

Benchmark	GPT5.2	Claude 4.5 Opus	Gemini-3 Pro	Qwen3-Max-Thinking	K2.5-1T-A32B	Qwen3.5-397B-A17B
MMMLU	89.5	90.1	90.6	84.4	86.0	88.5
MMLU-ProX	83.7	85.7	87.7	78.5	82.3	84.7

⁠Links

⁠Considerations

Computational Requirements: As a 397B parameter model with 17B activated, this model requires significant GPU memory and computational resources for inference
Extended Context: While the model supports context lengths up to 1M tokens through extension, native support is for 262K tokens
Managed Service: For production deployments without infrastructure management, Alibaba Cloud Model Studio offers Qwen3.5-Plus, the hosted version with 1M context by default
Multimodal Capabilities: The model supports both text and image inputs, making it suitable for vision-language tasks
Tool Use: The model includes built-in support for tool calling and function execution, ideal for agentic workflows
Reasoning Mode: The model supports a thinking/reasoning mode with structured reasoning traces

⁠Generated by

This model card was automatically generated using cagent-action⁠. Want to learn more about Docker Model Runner? Check out the project repository: https://github.com/docker/model-runner⁠.

Tag summary

Recent tags

Content type

Model

Digest

sha256:80154e164…

Size

67 GB

Last updated

3 months ago

docker model pull ai/qwen3.5-safetensors:35B-A3B

This week's pulls

Pulls:

483

Jun 1 to Jun 7

Learn more⁠