Zhipu AI Models

Explore all 6 models from Zhipu AI with detailed pricing, pros & cons, and developer recommendations.

Models

$0.050

Lowest Input

Max Context

Quality Tiers

Quick Recommendations

Best Value: GLM-4-Flash ($0.050/1M)

Best Quality: GLM-5.1

GLM-5.1

Flagship

Complex coding, long-horizon agentic tasks, open-source deployment

Official Pricing

When to use: Open-source coding assistant, internal developer tooling, agentic coding workflows, and teams needing self-hosted frontier-capable models.

Upgrade Highlights

◆754B MoE open-weight — MIT license, full commercial use
◆SWE-bench matches GPT-5.4 — frontier coding performance
◆8-hour autonomous task execution on a single problem
◆Rumination: iterative internal reasoning for correctness
◆Self-host on your own GPUs — no vendor lock-in

Input Price

$0.830

per 1M tokens

Output Price

$3.31

per 1M tokens

Cached Input

$0.170

per 1M tokens

Batch Input

—

per 1M tokens

Context Window: 1M

Max Output: 16,384 tokens

Knowledge Cutoff: 2026-04

VisionFunction CallingFine-tuningJSON ModeFree Tier

Pros

754B MoE open-weight (MIT license)
Matches GPT-5.4 on SWE-bench coding
8-hour sustained autonomous task execution
Self-hostable with full commercial rights
Rumination architecture for deep reasoning

Cons

754B params requires substantial GPU infrastructure to self-host
Weaker English vs closed frontier models on generalist tasks
No vision on base model

Performance

Output Speed~40 tok/s

Rate Limit3,000 RPM

Multimodal

Image InputImage OutputAudio InputAudio Output

Benchmarks

MMLU (CN)

91.2%

C-Eval

93.5%

SWE-Verified

78.6%

CMMLU

92.1%

GLM-4.6

Flagship

Chinese language tasks, enterprise AI

Official Pricing

When to use: Chinese-language enterprise applications, customer service bots, and content generation targeting Chinese markets.

Upgrade Highlights

◆Top-tier Chinese NLU and generation — beats GPT-4 on Chinese benchmarks
◆128K context with 16K max output — longest output in class
◆Full function calling for agent workflows
◆Fine-tuning available for domain adaptation
◆$0.50/$2.00 — competitive with GPT-4o at half the price

Input Price

$0.500

per 1M tokens

Output Price

$2.00

per 1M tokens

Cached Input

$0.100

per 1M tokens

Batch Input

—

per 1M tokens

Context Window: 128K

Max Output: 16,000 tokens

Knowledge Cutoff: 2025-03

VisionFunction CallingFine-tuningJSON ModeFree Tier

Pros

Best Chinese language performance
128K context, 16K output
Strong function calling
Fine-tuning support

Cons

Weaker English vs GPT-4
No vision on base model
Smaller ecosystem

Performance

Output Speed~60 tok/s

Rate Limit5,000 RPM

Multimodal

Image InputImage OutputAudio InputAudio Output

Benchmarks

MMLU (CN)

84.5%

C-Eval

89.2%

CMMLU

88.7%

GLM-4.5

Mid-tier

Balanced Chinese/English tasks

Official Pricing

When to use: Bilingual applications needing good Chinese and English at mid-tier pricing.

Upgrade Highlights

◆Strong bilingual: competitive in both Chinese and English
◆128K context at $0.30/1M — affordable long-context
◆16K max output for long-form generation
◆Fine-tuning support for customization

Input Price

$0.300

per 1M tokens

Output Price

$1.20

per 1M tokens

Cached Input

$0.080

per 1M tokens

Batch Input

—

per 1M tokens

Context Window: 128K

Max Output: 16,000 tokens

Knowledge Cutoff: 2025-03

VisionFunction CallingFine-tuningJSON ModeFree Tier

Pros

Strong bilingual performance
128K context
16K max output
Cost-effective

Cons

Less capable than GLM-4.6
No vision
Smaller model ecosystem

Performance

Output Speed~75 tok/s

Rate Limit8,000 RPM

Multimodal

Image InputImage OutputAudio InputAudio Output

Benchmarks

MMLU

76.8%

C-Eval

83.1%

GLM-4-Plus

Mid-tier

General purpose, API integration

Official Pricing

When to use: General-purpose API integration, chatbots, and content generation at budget-friendly pricing.

Upgrade Highlights

◆Versatile mid-tier model for most use cases
◆128K context at just $0.20/1M input
◆Full function calling for tool use
◆Fine-tuning available

Input Price

$0.200

per 1M tokens

Output Price

$0.800

per 1M tokens

Cached Input

$0.050

per 1M tokens

Batch Input

—

per 1M tokens

Context Window: 128K

Max Output: 8,192 tokens

Knowledge Cutoff: 2025-03

VisionFunction CallingFine-tuningJSON ModeFree Tier

Pros

Good all-rounder
128K context
Affordable pricing
Function calling

Cons

8K max output
No vision
Weaker on complex reasoning

Performance

Output Speed~85 tok/s

Rate Limit10,000 RPM

Multimodal

Image InputImage OutputAudio InputAudio Output

Benchmarks

MMLU

73.5%

C-Eval

79.8%

GLM-4-Flash

Lite

High-throughput, low-latency tasks

Official Pricing

When to use: High-volume tasks like classification, summarization, and simple Q&A where speed and cost matter.

Upgrade Highlights

◆Fastest GLM model — optimized for throughput
◆$0.05/1M input — ultra-budget friendly
◆128K context despite lite tier
◆Free tier: 1M tokens/day for development

Input Price

$0.050

per 1M tokens

Output Price

$0.200

per 1M tokens

Cached Input

$0.010

per 1M tokens

Batch Input

—

per 1M tokens

Context Window: 128K

Max Output: 8,192 tokens

Knowledge Cutoff: 2025-03

VisionFunction CallingFine-tuningJSON ModeFree Tier

Pros

Extremely fast inference
128K context
Very low cost
Free tier available

Cons

Basic reasoning only
No fine-tuning
No vision

Performance

Output Speed~200 tok/s

Rate Limit30,000 RPM

Multimodal

Image InputImage OutputAudio InputAudio Output

Benchmarks

MMLU

65.2%

C-Eval

72.1%

GLM-4V-Plus

Mid-tier

Chinese multimodal, document AI

Official Pricing

When to use: Chinese document analysis, receipt/invoice processing, and visual Q&A for Chinese markets.

Upgrade Highlights

◆Native multimodal with strong Chinese OCR
◆Document AI: receipts, invoices, forms
◆Visual Q&A optimized for Chinese content
◆Function calling for multimodal agent workflows

Input Price

$0.300

per 1M tokens

Output Price

$1.20

per 1M tokens

Cached Input

$0.080

per 1M tokens

Batch Input

—

per 1M tokens

Context Window: 8K

Max Output: 4,096 tokens

Knowledge Cutoff: 2025-03

VisionFunction CallingFine-tuningJSON ModeFree Tier

Pros

Native vision-language
Strong Chinese OCR
Document and chart understanding
Function calling

Cons

8K context only
4K max output
No fine-tuning

Performance

Output Speed~50 tok/s

Rate Limit3,000 RPM

Multimodal

Image InputImage OutputAudio InputAudio Output

Benchmarks

MMMU (CN)

62.8%

DocVQA

85.3%

Side-by-Side Comparison

Model	Tier	Input	Output	Cached	Context	Max Output
GLM-5.1	Flagship	$0.830	$3.31	$0.170	1M	16,384
GLM-4.6	Flagship	$0.500	$2.00	$0.100	128K	16,000
GLM-4.5	Mid-tier	$0.300	$1.20	$0.080	128K	16,000
GLM-4-Plus	Mid-tier	$0.200	$0.800	$0.050	128K	8,192
GLM-4-Flash	Lite	$0.050	$0.200	$0.010	128K	8,192
GLM-4V-Plus	Mid-tier	$0.300	$1.20	$0.080	8K	4,096