Back to Developer Zone

Zhipu AI Models

Explore all 6 models from Zhipu AI with detailed pricing, pros & cons, and developer recommendations.

6
Models
$0.050
Lowest Input
1M
Max Context
3
Quality Tiers

Quick Recommendations

Best Value: GLM-4-Flash ($0.050/1M)
Best Quality: GLM-5.1

GLM-5.1

Flagship

Complex coding, long-horizon agentic tasks, open-source deployment

Official Pricing

When to use: Open-source coding assistant, internal developer tooling, agentic coding workflows, and teams needing self-hosted frontier-capable models.

Upgrade Highlights

  • 754B MoE open-weight — MIT license, full commercial use
  • SWE-bench matches GPT-5.4 — frontier coding performance
  • 8-hour autonomous task execution on a single problem
  • Rumination: iterative internal reasoning for correctness
  • Self-host on your own GPUs — no vendor lock-in
Input Price
$0.830
per 1M tokens
Output Price
$3.31
per 1M tokens
Cached Input
$0.170
per 1M tokens
Batch Input
per 1M tokens
Context Window: 1M
Max Output: 16,384 tokens
Knowledge Cutoff: 2026-04
VisionFunction CallingFine-tuningJSON ModeFree Tier

Pros

  • 754B MoE open-weight (MIT license)
  • Matches GPT-5.4 on SWE-bench coding
  • 8-hour sustained autonomous task execution
  • Self-hostable with full commercial rights
  • Rumination architecture for deep reasoning

Cons

  • 754B params requires substantial GPU infrastructure to self-host
  • Weaker English vs closed frontier models on generalist tasks
  • No vision on base model

Performance

Output Speed~40 tok/s
Rate Limit3,000 RPM

Multimodal

Image InputImage OutputAudio InputAudio Output

Benchmarks

MMLU (CN)
91.2%
C-Eval
93.5%
SWE-Verified
78.6%
CMMLU
92.1%

GLM-4.6

Flagship

Chinese language tasks, enterprise AI

Official Pricing

When to use: Chinese-language enterprise applications, customer service bots, and content generation targeting Chinese markets.

Upgrade Highlights

  • Top-tier Chinese NLU and generation — beats GPT-4 on Chinese benchmarks
  • 128K context with 16K max output — longest output in class
  • Full function calling for agent workflows
  • Fine-tuning available for domain adaptation
  • $0.50/$2.00 — competitive with GPT-4o at half the price
Input Price
$0.500
per 1M tokens
Output Price
$2.00
per 1M tokens
Cached Input
$0.100
per 1M tokens
Batch Input
per 1M tokens
Context Window: 128K
Max Output: 16,000 tokens
Knowledge Cutoff: 2025-03
VisionFunction CallingFine-tuningJSON ModeFree Tier

Pros

  • Best Chinese language performance
  • 128K context, 16K output
  • Strong function calling
  • Fine-tuning support

Cons

  • Weaker English vs GPT-4
  • No vision on base model
  • Smaller ecosystem

Performance

Output Speed~60 tok/s
Rate Limit5,000 RPM

Multimodal

Image InputImage OutputAudio InputAudio Output

Benchmarks

MMLU (CN)
84.5%
C-Eval
89.2%
CMMLU
88.7%

GLM-4.5

Mid-tier

Balanced Chinese/English tasks

Official Pricing

When to use: Bilingual applications needing good Chinese and English at mid-tier pricing.

Upgrade Highlights

  • Strong bilingual: competitive in both Chinese and English
  • 128K context at $0.30/1M — affordable long-context
  • 16K max output for long-form generation
  • Fine-tuning support for customization
Input Price
$0.300
per 1M tokens
Output Price
$1.20
per 1M tokens
Cached Input
$0.080
per 1M tokens
Batch Input
per 1M tokens
Context Window: 128K
Max Output: 16,000 tokens
Knowledge Cutoff: 2025-03
VisionFunction CallingFine-tuningJSON ModeFree Tier

Pros

  • Strong bilingual performance
  • 128K context
  • 16K max output
  • Cost-effective

Cons

  • Less capable than GLM-4.6
  • No vision
  • Smaller model ecosystem

Performance

Output Speed~75 tok/s
Rate Limit8,000 RPM

Multimodal

Image InputImage OutputAudio InputAudio Output

Benchmarks

MMLU
76.8%
C-Eval
83.1%

GLM-4-Plus

Mid-tier

General purpose, API integration

Official Pricing

When to use: General-purpose API integration, chatbots, and content generation at budget-friendly pricing.

Upgrade Highlights

  • Versatile mid-tier model for most use cases
  • 128K context at just $0.20/1M input
  • Full function calling for tool use
  • Fine-tuning available
Input Price
$0.200
per 1M tokens
Output Price
$0.800
per 1M tokens
Cached Input
$0.050
per 1M tokens
Batch Input
per 1M tokens
Context Window: 128K
Max Output: 8,192 tokens
Knowledge Cutoff: 2025-03
VisionFunction CallingFine-tuningJSON ModeFree Tier

Pros

  • Good all-rounder
  • 128K context
  • Affordable pricing
  • Function calling

Cons

  • 8K max output
  • No vision
  • Weaker on complex reasoning

Performance

Output Speed~85 tok/s
Rate Limit10,000 RPM

Multimodal

Image InputImage OutputAudio InputAudio Output

Benchmarks

MMLU
73.5%
C-Eval
79.8%

GLM-4-Flash

Lite

High-throughput, low-latency tasks

Official Pricing

When to use: High-volume tasks like classification, summarization, and simple Q&A where speed and cost matter.

Upgrade Highlights

  • Fastest GLM model — optimized for throughput
  • $0.05/1M input — ultra-budget friendly
  • 128K context despite lite tier
  • Free tier: 1M tokens/day for development
Input Price
$0.050
per 1M tokens
Output Price
$0.200
per 1M tokens
Cached Input
$0.010
per 1M tokens
Batch Input
per 1M tokens
Context Window: 128K
Max Output: 8,192 tokens
Knowledge Cutoff: 2025-03
VisionFunction CallingFine-tuningJSON ModeFree Tier

Pros

  • Extremely fast inference
  • 128K context
  • Very low cost
  • Free tier available

Cons

  • Basic reasoning only
  • No fine-tuning
  • No vision

Performance

Output Speed~200 tok/s
Rate Limit30,000 RPM

Multimodal

Image InputImage OutputAudio InputAudio Output

Benchmarks

MMLU
65.2%
C-Eval
72.1%

GLM-4V-Plus

Mid-tier

Chinese multimodal, document AI

Official Pricing

When to use: Chinese document analysis, receipt/invoice processing, and visual Q&A for Chinese markets.

Upgrade Highlights

  • Native multimodal with strong Chinese OCR
  • Document AI: receipts, invoices, forms
  • Visual Q&A optimized for Chinese content
  • Function calling for multimodal agent workflows
Input Price
$0.300
per 1M tokens
Output Price
$1.20
per 1M tokens
Cached Input
$0.080
per 1M tokens
Batch Input
per 1M tokens
Context Window: 8K
Max Output: 4,096 tokens
Knowledge Cutoff: 2025-03
VisionFunction CallingFine-tuningJSON ModeFree Tier

Pros

  • Native vision-language
  • Strong Chinese OCR
  • Document and chart understanding
  • Function calling

Cons

  • 8K context only
  • 4K max output
  • No fine-tuning

Performance

Output Speed~50 tok/s
Rate Limit3,000 RPM

Multimodal

Image InputImage OutputAudio InputAudio Output

Benchmarks

MMMU (CN)
62.8%
DocVQA
85.3%

Side-by-Side Comparison

ModelTierInputOutputContext
GLM-5.1Flagship$0.830$3.311M
GLM-4.6Flagship$0.500$2.00128K
GLM-4.5Mid-tier$0.300$1.20128K
GLM-4-PlusMid-tier$0.200$0.800128K
GLM-4-FlashLite$0.050$0.200128K
GLM-4V-PlusMid-tier$0.300$1.208K