ai/qwen3.5-safetensors

Verified Publisher

By Docker

Updated 3 months ago

397B-parameter MoE multimodal LLM with 17B active params, 262K context, 201 languages

Model
1

10K+

ai/qwen3.5-safetensors repository overview

Qwen3.5

Qwen3.5-397B-A17B is a cutting-edge multimodal large language model developed by Alibaba's Qwen team, representing a significant advancement in AI foundation models. This model employs a hybrid Mixture-of-Experts (MoE) architecture combining Gated Delta Networks with sparse expert routing, achieving 397 billion total parameters with only 17 billion activated during inference. This efficient design enables high-performance inference with minimal latency and computational overhead.

The model features unified vision-language capabilities through early fusion training, seamlessly handling both text and image inputs. Qwen3.5 was trained using reinforcement learning at massive scale across million-agent environments, resulting in robust real-world adaptability and exceptional performance across reasoning, coding, agent tasks, and visual understanding benchmarks. With native support for 201 languages and dialects, the model provides truly global linguistic coverage with nuanced cultural and regional understanding.

Designed for enterprise and research applications, Qwen3.5-397B-A17B supports up to 262,144 tokens natively and can be extended to over 1 million tokens, making it suitable for complex long-context tasks including document analysis, multi-turn conversations, and agentic workflows.


Characteristics

AttributeValue
ProviderQwen (Alibaba)
ArchitectureMixture of Experts (MoE) with Gated DeltaNet and Gated Attention
Total Parameters397B
Active Parameters17B
Context Length262,144 tokens (native), extensible to 1,010,000 tokens
Languages201 languages and dialects
Input modalitiesText, Image
Output modalitiesText
LicenseApache 2.0
Model Typeqwen3_5_moe

Using this model with Docker Model Runner

docker model run qwen3.5-safetensors

For more information, check out the Docker Model Runner docs.

Architecture Details

Architecture Benchmark

The model employs a sophisticated hybrid architecture:

  • Hidden Dimension: 4096
  • Vocabulary Size: 248,320 tokens (padded)
  • Number of Layers: 60
  • Layer Configuration: 15 × (3 × (Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE))

Gated DeltaNet:

  • Linear Attention Heads: 64 for V, 16 for QK
  • Head Dimension: 128

Gated Attention:

  • Attention Heads: 32 for Q, 2 for KV
  • Head Dimension: 256
  • Rotary Position Embedding Dimension: 64

Mixture of Experts:

  • Total Experts: 512
  • Active Experts: 10 routed + 1 shared
  • Expert Intermediate Dimension: 1024

Benchmarks

Knowledge
BenchmarkGPT5.2Claude 4.5 OpusGemini-3 ProQwen3-Max-ThinkingK2.5-1T-A32BQwen3.5-397B-A17B
MMLU-Pro87.489.589.885.787.187.8
MMLU-Redux95.095.695.992.894.594.9
SuperGPQA67.970.674.067.369.270.4
C-Eval90.592.293.493.794.093.0
Instruction Following
BenchmarkGPT5.2Claude 4.5 OpusGemini-3 ProQwen3-Max-ThinkingK2.5-1T-A32BQwen3.5-397B-A17B
IFEval94.890.993.593.493.992.6
IFBench75.458.070.470.970.276.5
MultiChallenge57.954.264.263.362.767.6
Long Context
BenchmarkGPT5.2Claude 4.5 OpusGemini-3 ProQwen3-Max-ThinkingK2.5-1T-A32BQwen3.5-397B-A17B
AA-LCR72.774.070.768.770.068.7
LongBench v254.564.468.260.661.063.2
STEM
BenchmarkGPT5.2Claude 4.5 OpusGemini-3 ProQwen3-Max-ThinkingK2.5-1T-A32BQwen3.5-397B-A17B
GPQA92.487.091.987.487.688.4
HLE35.530.837.530.230.128.7
HLE-Verified43.338.848.037.6--37.6
Reasoning
BenchmarkGPT5.2Claude 4.5 OpusGemini-3 ProQwen3-Max-ThinkingK2.5-1T-A32BQwen3.5-397B-A17B
LiveCodeBench v687.784.890.785.985.083.6
HMMT Feb 2599.492.997.398.095.494.8
HMMT Nov 2510093.393.394.791.192.7
IMOAnswerBench86.384.083.383.981.880.9
AIME2696.793.390.693.393.391.3
General Agent
BenchmarkGPT5.2Claude 4.5 OpusGemini-3 ProQwen3-Max-ThinkingK2.5-1T-A32BQwen3.5-397B-A17B
BFCL-V463.177.572.567.768.372.9
TAU2-Bench87.191.685.484.677.086.7
VITA-Bench38.256.351.640.941.949.7
DeepPlanning44.633.923.328.714.534.3
Tool Decathlon43.843.536.418.827.838.3
MCP-Mark57.542.353.933.529.546.1
Search Agent
BenchmarkGPT5.2Claude 4.5 OpusGemini-3 ProQwen3-Max-ThinkingK2.5-1T-A32BQwen3.5-397B-A17B
HLE w/ tool45.543.445.849.850.248.3
BrowseComp65.867.859.253.974.969.0/78.6
BrowseComp-zh76.162.466.860.9--70.3
WideSearch76.876.468.057.972.774.0
Seal-045.047.745.546.957.446.9
Multilingualism
BenchmarkGPT5.2Claude 4.5 OpusGemini-3 ProQwen3-Max-ThinkingK2.5-1T-A32BQwen3.5-397B-A17B
MMMLU89.590.190.684.486.088.5
MMLU-ProX83.785.787.778.582.384.7

Considerations

  • Computational Requirements: As a 397B parameter model with 17B activated, this model requires significant GPU memory and computational resources for inference
  • Extended Context: While the model supports context lengths up to 1M tokens through extension, native support is for 262K tokens
  • Managed Service: For production deployments without infrastructure management, Alibaba Cloud Model Studio offers Qwen3.5-Plus, the hosted version with 1M context by default
  • Multimodal Capabilities: The model supports both text and image inputs, making it suitable for vision-language tasks
  • Tool Use: The model includes built-in support for tool calling and function execution, ideal for agentic workflows
  • Reasoning Mode: The model supports a thinking/reasoning mode with structured reasoning traces
Generated by

This model card was automatically generated using cagent-action. Want to learn more about Docker Model Runner? Check out the project repository: https://github.com/docker/model-runner.

Tag summary

Content type

Model

Digest

sha256:80154e164

Size

67 GB

Last updated

3 months ago

docker model pull ai/qwen3.5-safetensors:35B-A3B

This week's pulls

Pulls:

483

Jun 1 to Jun 7