Best LLM API Models for RAG

RAG workloads often need enough context for retrieved passages plus economical input pricing. This page starts with large context windows and practical cost signals.

50Models listed

1M input + 500K outputCost example tokens

USD / 1MNormalized prices

Quick shortlist

Start with Llama 4 Scout.

This guide is sorted by context window, so the first rows are the strongest starting point for RAG, long documents, and large codebase context.

Lead model Llama 4 Scout

ProviderMeta

Sample cost$0.25

Context10M

The ranking is a discovery aid, not a final recommendation. Always compare the model against your workload and verify provider pricing before production use.

How to read this ranking

Models are sorted by context window size. Use this page when your workflow needs long documents, large retrieval payloads, or multi-file context.

Model Ranking

Browse all models

Model	Provider	Prompt	Output	Sample cost	Your Cost	Context	Popularity	Release
Llama 4 Scout	Meta	$0.1	$0.3	$0.25	$0.25	10M		2025-04-05
Grok 4.20	xAI	$1.25	$2.5	$2.5	$2.5	2M	#88	2026-03-31
Grok 4.20 Multi-Agent	xAI	$1.25	$2.5	$2.5	$2.5	2M	#155	2026-03-31
🔥GPT-5.5	OpenAI	$5	$30	$20	$20	1.05M	#19	2026-04-24
GPT-5.4	OpenAI	$2.5	$15	$10	$10	1.05M	#30	2026-03-05
GPT-5.5 Pro	OpenAI	$30	$180	$120	$120	1.05M	#161	2026-04-24
GPT-5.4 Pro	OpenAI	$30	$180	$120	$120	1.05M	#251	2026-03-05
OpenAI GPT Latest	OpenAI	$5	$30	$20	$20	1.05M		2026-04-27
🔥Owl Alpha	OpenRouter	$0	$0	$0	$0	1.05M	#7	2026-04-28
Gemini 3.1 Pro Preview Custom Tools	Google	$2	$12	$8	$8	1.05M	#120	2026-02-25
🔥DeepSeek V4 Flash	DeepSeek	$0.09	$0.18	$0.18	$0.18	1.05M	#1	2026-04-24
🔥MiMo-V2.5	Xiaomi	$0.105	$0.28	$0.24	$0.24	1.05M	#3	2026-04-22
New🔥MiniMax M3	MiniMax	$0.3	$1.2	$0.9	$0.9	1.05M	#4	2026-05-31
🔥DeepSeek V4 Pro	DeepSeek	$0.435	$0.87	$0.87	$0.87	1.05M	#5	2026-04-24
🔥Gemini 3 Flash Preview	Google	$0.5	$3	$2	$2	1.05M	#10	2025-12-17
🔥Gemini 2.5 Flash	Google	$0.3	$2.5	$1.55	$1.55	1.05M	#13	2025-06-17
🔥Gemini 2.5 Flash Lite	Google	$0.1	$0.4	$0.3	$0.3	1.05M	#15	2025-07-22
🔥Gemini 3.5 Flash	Google	$1.5	$9	$6	$6	1.05M	#16	2026-05-19
🔥MiMo-V2.5-Pro	Xiaomi	$0.435	$0.87	$0.87	$0.87	1.05M	#17	2026-04-22
🔥Gemini 3.1 Flash Lite	Google	$0.25	$1.5	$1	$1	1.05M	#20	2026-05-07
Gemini 3.1 Pro Preview	Google	$2	$12	$8	$8	1.05M	#31	2026-02-19
Gemini 3.1 Flash Lite Preview	Google	$0.25	$1.5	$1	$1	1.05M	#34	2026-03-03
Gemini 2.5 Pro	Google	$1.25	$10	$6.25	$6.25	1.05M	#57	2025-06-17
Qwen3 Coder 480B A35B	Qwen	$0.22	$1.8	$1.12	$1.12	1.05M	#59	2025-07-23
Gemini 2.5 Flash Lite Preview 09-2025	Google	$0.1	$0.4	$0.3	$0.3	1.05M	#81	2025-09-25
Lyria 3 Pro Preview	Google	$0	$0	$0	$0	1.05M	#283	2026-03-30
Lyria 3 Clip Preview	Google	$0	$0	$0	$0	1.05M	#291	2026-03-30
NewGLM 5.2	Z.ai	$0.95	$3	$2.45	$2.45	1.05M		2026-06-16
Google Gemini Pro Latest	Google	$2	$12	$8	$8	1.05M		2026-04-27
Google Gemini Flash Latest	Google	$1.5	$9	$6	$6	1.05M		2026-04-27
DeepSeek V4 Flash (free)	DeepSeek	$0	$0	$0	$0	1.05M		2026-04-24
MiMo-V2-Pro	Xiaomi	$1	$3	$2.5	$2.5	1.05M		2026-03-18
Qwen3 Coder 480B A35B (free)	Qwen	$0	$0	$0	$0	1.05M		2025-07-23
Gemini 2.5 Pro Preview 06-05	Google	$1.25	$10	$6.25	$6.25	1.05M		2025-06-05
Gemini 2.5 Pro Preview 05-06	Google	$1.25	$10	$6.25	$6.25	1.05M		2025-05-07
Llama 4 Maverick	Meta	$0.15	$0.6	$0.45	$0.45	1.05M		2025-04-05
Gemini 2.0 Flash Lite	Google	$0.075	$0.3	$0.22	$0.22	1.05M		2025-02-25
Gemini 2.0 Flash	Google	$0.1	$0.4	$0.3	$0.3	1.05M		2025-02-05
GPT-4.1 Mini	OpenAI	$0.4	$1.6	$1.2	$1.2	1.05M	#52	2025-04-14
GPT-4.1 Nano	OpenAI	$0.1	$0.4	$0.3	$0.3	1.05M	#58	2025-04-14
GPT-4.1	OpenAI	$2	$8	$6	$6	1.05M	#66	2025-04-14
Palmyra X5	Writer	$0.6	$6	$3.6	$3.6	1.04M	#264	2026-01-21
MiniMax-01	MiniMax	$0.2	$1.1	$0.75	$0.75	1M	#215	2025-01-15
🔥Claude Sonnet 4.6	Anthropic	$3	$15	$10.5	$10.5	1M	#6	2026-02-17
🔥Claude Opus 4.7	Anthropic	$5	$25	$17.5	$17.5	1M	#8	2026-04-16
New🔥Nemotron 3 Ultra (free)	NVIDIA	$0	$0	$0	$0	1M	#12	2026-06-04
🔥Claude Opus 4.6	Anthropic	$5	$25	$17.5	$17.5	1M	#18	2026-02-04
Nemotron 3 Super (free)	NVIDIA	$0	$0	$0	$0	1M	#23	2026-03-11
Qwen3.7 Max	Qwen	$1.25	$3.75	$3.12	$3.12	1M	#35	2026-05-21
NewQwen3.7 Plus	Qwen	$0.32	$1.28	$0.96	$0.96	1M	#51	2026-06-03

Pricing FAQ

How is the sample workload cost calculated?

The sample workload uses 1,000,000 input tokens plus 500,000 output tokens, then applies each model's normalized USD price per 1 million tokens.

Why do input and output token prices matter separately?

Many applications are output-token heavy, while retrieval and classification workloads may be input-token heavy. Comparing both prices helps avoid picking a model that is cheap for the wrong workload shape.

Should I verify prices before production use?

Yes. AI Model Matrix normalizes public pricing metadata for comparison, but provider availability, limits, and prices can change. Always verify the final contract or provider dashboard before production use.

Related Guides

Cheapest LLM APIs

Sort models by estimated workload cost and normalized token prices.

Open guide

Largest Context Windows

Find models for long documents, retrieval, and codebase context.

Open guide

Coding Models

Compare code-oriented models by cost, context, and practical popularity signals.

Open guide

Free Models

Browse zero-price models for prototypes and evaluation.

Open guide

RAG Models

Start from large context windows and practical input-cost constraints.

Open guide

Chatbot Costs

Find budget-sensitive models for output-heavy assistant traffic.

Open guide

Cost Calculator

Enter your own input and output token volume before narrowing the shortlist.

Estimate cost

Alternatives

Find cheaper candidates around popular model anchors.

Find alternatives