DecentralGpt Airdrop Is Now Live !

Connect Your wallet and claim your $DGC

Connect wallet
Category Benchmark
Qwen3-235B
DeepSeek V3
Qwen3-235B (Thinking)
DeepSeek R1
Llama-4-Scout-Instruct
General
MMLU (0-shot,CoT)
86.8
87.2
87.5
81.3
80.6
MMLU PRO (5-shot,CoT)
82.4
83
82.7
78.1
77.8
Instruction Following
IFEval
76.3
84.5
78.1
82.8
79.6
code
HumanEval (0-shot)
73.2
76.5
74.8
70.4
65.2
MBPP EvalPlus (base) (0-shot)
81.6
83.8
82
77.1
72.9
Math
MATH (0-shot,CoT)
58.7
59.3
61
44.2
40.8
Reasoning
ARC Challenge (0-shot)
94.5
95.2
94.8
92.1
91
Tool Use
BFCL v2 (0-shot)
87.6
91.2
88.9
93.5
90.8
Long Context
NIH/Multi-needle
92%
96%
93%
95%
89%
Multilingual
Multilingual MGSM (0-shot)
76.4
79.2
77.3
73.5
68.1

Qwen3 235B

Chinese task priority - Alibaba's open-source large model supports long text and multimodality, making it suitable for enterprise-level complex scenario deployment.

  • Development Team: Alibaba
  • Launch Date: 2025.04
  • Model Parameters: 235B
  • Features: Large-scale open-source MoE model, supporting multimodality and high-performance inference, suitable for complex tasks and long-text processing.

DeepSeek V3

Logic Reasoning Enhanced Version - Excels in mathematics, coding, and chain-of-thought tasks, suitable for scenarios requiring rigorous reasoning.

  • Development Team: DeepSeek
  • Launch Date: 2024.12
  • Model Parameters: 670B
  • Features: Supports 128K long text understanding, strong code and mathematical capabilities, open source, suitable for complex reasoning and generation tasks.

Qwen3 235B Thinking

Long-text Expert – Supports 128K context, boasts strong code and mathematical capabilities, open-source and commercially available, with extremely high cost-performance.

  • Development Team: Alibaba
  • Launch Date: 2025.04
  • Model Parameters: 235B
  • Features: Enhanced Reasoning Qwen3, optimized for logical reasoning and chain-of-thought (CoT), suitable for complex decision-making and mathematical derivation.

DeepSeek R1

For complex decision-making and mathematical derivation. DeepSeek R1 - The Terminator of Complex Problems: Optimized for reasoning, suitable for high-difficulty tasks such as math competitions and algorithm research.

  • Development Team: DeepSeek
  • Launch Date: 2025.01
  • Model Parameters: 670B
  • Features: Reinforcement Learning Optimized Version, enabling efficient fine-tuning and suitable for practical application scenarios such as search and recommendation.

Llama-4-Scout-Instruct

An open-source, instruction-tuned LLM by Meta and Hugging Face, optimized for English with multilingual support, based on Llama

  • Development Team: meta and Hugging Face
  • Launch Date: 2025.04
  • Model Parameters: 70B
  • Features: Fully open-source and free, risk-free for private deployment. Perfect for developers and enterprises to build English-focused AI assistants — from auto-writing emails and reviewing contracts to private knowledge base QA, effortlessly handling 128K long-text contexts.