Models are sorted by context window size. Use this page when your workflow needs long documents, large retrieval payloads, or multi-file context.
Best LLM API Models for RAG
RAG workloads often need enough context for retrieved passages plus economical input pricing. This page starts with large context windows and practical cost signals.
Quick shortlist
Start with Grok 4.1 Fast.
This guide is sorted by context window, so the first rows are the strongest starting point for RAG, long documents, and large codebase context.
The ranking is a discovery aid, not a final recommendation. Always compare the model against your workload and verify provider pricing before production use.
Model Ranking
Browse all models| Model | Provider | Prompt | Output | Example Cost | Your Cost | Context | Rank | Release |
|---|---|---|---|---|---|---|---|---|
| 🔥Grok 4.1 Fast | xAI | $0.2 | $0.5 | $0.45 | $0.45 | 2M | #18 | 2025-11-19 |
| Grok 4.20 Multi-Agent | xAI | $2 | $6 | $5 | $5 | 2M | Unranked | 2026-03-31 |
| Grok 4.20 | xAI | $1.25 | $2.5 | $2.5 | $2.5 | 2M | Unranked | 2026-03-31 |
| Grok 4 Fast | xAI | $0.2 | $0.5 | $0.45 | $0.45 | 2M | Unranked | 2025-09-19 |
| 🔥GPT-5.5 | OpenAI | $5 | $30 | $20 | $20 | 1.05M | #19 | 2026-04-24 |
| OpenAI GPT Latest | OpenAI | $5 | $30 | $20 | $20 | 1.05M | Unranked | 2026-04-27 |
| GPT-5.5 Pro | OpenAI | $30 | $180 | $120 | $120 | 1.05M | Unranked | 2026-04-24 |
| GPT-5.4 Pro | OpenAI | $30 | $180 | $120 | $120 | 1.05M | Unranked | 2026-03-05 |
| GPT-5.4 | OpenAI | $2.5 | $15 | $10 | $10 | 1.05M | Unranked | 2026-03-05 |
| 🔥Owl Alpha | OpenRouter | $0 | $0 | $0 | $0 | 1.05M | #17 | 2026-04-28 |
| 🔥DeepSeek V4 Flash | DeepSeek | $0.126 | $0.252 | $0.25 | $0.25 | 1.05M | #4 | 2026-04-24 |
| 🔥Gemini 3 Flash Preview | $0.5 | $3 | $2 | $2 | 1.05M | #6 | 2025-12-17 | |
| 🔥DeepSeek V4 Pro | DeepSeek | $0.435 | $0.87 | $0.87 | $0.87 | 1.05M | #8 | 2026-04-24 |
| 🔥Gemini 2.5 Flash Lite | $0.1 | $0.4 | $0.3 | $0.3 | 1.05M | #11 | 2025-07-22 | |
| 🔥Gemini 2.5 Flash | $0.3 | $2.5 | $1.55 | $1.55 | 1.05M | #13 | 2025-06-17 | |
| $0.25 | $1.5 | $1 | $1 | 1.05M | Unranked | 2026-05-07 | ||
| Google Gemini Pro Latest | $2 | $12 | $8 | $8 | 1.05M | Unranked | 2026-04-27 | |
| Google Gemini Flash Latest | $0.5 | $3 | $2 | $2 | 1.05M | Unranked | 2026-04-27 | |
| MiMo-V2.5-Pro | Xiaomi | $1 | $3 | $2.5 | $2.5 | 1.05M | Unranked | 2026-04-22 |
| MiMo-V2.5 | Xiaomi | $0.4 | $2 | $1.4 | $1.4 | 1.05M | Unranked | 2026-04-22 |
| Lyria 3 Pro Preview | $0 | $0 | $0 | $0 | 1.05M | Unranked | 2026-03-30 | |
| Lyria 3 Clip Preview | $0 | $0 | $0 | $0 | 1.05M | Unranked | 2026-03-30 | |
| MiMo-V2-Pro | Xiaomi | $1 | $3 | $2.5 | $2.5 | 1.05M | Unranked | 2026-03-18 |
| Gemini 3.1 Flash Lite Preview | $0.25 | $1.5 | $1 | $1 | 1.05M | Unranked | 2026-03-03 | |
| Gemini 3.1 Pro Preview Custom Tools | $2 | $12 | $8 | $8 | 1.05M | Unranked | 2026-02-25 | |
| Gemini 3.1 Pro Preview | $2 | $12 | $8 | $8 | 1.05M | Unranked | 2026-02-19 | |
| Gemini 2.5 Flash Lite Preview 09-2025 | $0.1 | $0.4 | $0.3 | $0.3 | 1.05M | Unranked | 2025-09-25 | |
| Gemini 2.5 Pro | $1.25 | $10 | $6.25 | $6.25 | 1.05M | Unranked | 2025-06-17 | |
| Gemini 2.5 Pro Preview 06-05 | $1.25 | $10 | $6.25 | $6.25 | 1.05M | Unranked | 2025-06-05 | |
| Gemini 2.5 Pro Preview 05-06 | $1.25 | $10 | $6.25 | $6.25 | 1.05M | Unranked | 2025-05-07 | |
| Llama 4 Maverick | Meta | $0.15 | $0.6 | $0.45 | $0.45 | 1.05M | Unranked | 2025-04-05 |
| Gemini 2.0 Flash Lite | $0.075 | $0.3 | $0.22 | $0.22 | 1.05M | Unranked | 2025-02-25 | |
| Gemini 2.0 Flash | $0.1 | $0.4 | $0.3 | $0.3 | 1.05M | Unranked | 2025-02-05 | |
| GPT-4.1 | OpenAI | $2 | $8 | $6 | $6 | 1.05M | Unranked | 2025-04-14 |
| GPT-4.1 Mini | OpenAI | $0.4 | $1.6 | $1.2 | $1.2 | 1.05M | Unranked | 2025-04-14 |
| GPT-4.1 Nano | OpenAI | $0.1 | $0.4 | $0.3 | $0.3 | 1.05M | Unranked | 2025-04-14 |
| Palmyra X5 | Writer | $0.6 | $6 | $3.6 | $3.6 | 1.04M | Unranked | 2026-01-21 |
| MiniMax-01 | MiniMax | $0.2 | $1.1 | $0.75 | $0.75 | 1M | Unranked | 2025-01-15 |
| 🔥Claude Opus 4.7 | Anthropic | $5 | $25 | $17.5 | $17.5 | 1M | #2 | 2026-04-16 |
| 🔥Claude Sonnet 4.6 | Anthropic | $3 | $15 | $10.5 | $10.5 | 1M | #3 | 2026-02-17 |
| Anthropic | $30 | $150 | $105 | $105 | 1M | Unranked | 2026-05-12 | |
| Grok 4.3 | xAI | $1.25 | $2.5 | $2.5 | $2.5 | 1M | Unranked | 2026-04-30 |
| Anthropic Claude Sonnet Latest | Anthropic | $3 | $15 | $10.5 | $10.5 | 1M | Unranked | 2026-04-27 |
| Qwen3.5 Plus 2026-04-20 | Qwen | $0.4 | $2.4 | $1.6 | $1.6 | 1M | Unranked | 2026-04-27 |
| Qwen3.6 Flash | Qwen | $0.25 | $1.5 | $1 | $1 | 1M | Unranked | 2026-04-27 |
| Claude Opus Latest | Anthropic | $5 | $25 | $17.5 | $17.5 | 1M | Unranked | 2026-04-21 |
| Claude Opus 4.6 (Fast) | Anthropic | $30 | $150 | $105 | $105 | 1M | Unranked | 2026-04-07 |
| Qwen3.6 Plus | Qwen | $0.325 | $1.95 | $1.3 | $1.3 | 1M | Unranked | 2026-04-02 |
| Qwen3.5-Flash | Qwen | $0.065 | $0.26 | $0.2 | $0.2 | 1M | Unranked | 2026-02-25 |
| Qwen3.5 Plus 2026-02-15 | Qwen | $0.26 | $1.56 | $1.04 | $1.04 | 1M | Unranked | 2026-02-16 |
Pricing FAQ
How is the sample workload cost calculated?
The sample workload uses 1 million input tokens plus 500 thousand output tokens, then applies each model's normalized USD price per 1 million tokens.
Why do input and output token prices matter separately?
Many applications are output-token heavy, while retrieval and classification workloads may be input-token heavy. Comparing both prices helps avoid picking a model that is cheap for the wrong workload shape.
Should I verify prices before production use?
Yes. AI Model Matrix normalizes public pricing metadata for comparison, but provider availability, limits, and prices can change. Always verify the final contract or provider dashboard before production use.
Related Guides
Cheapest LLM APIs
Sort models by estimated workload cost and normalized token prices.
Open guideLargest Context Windows
Find models for long documents, retrieval, and codebase context.
Open guideCoding Models
Compare code-oriented models by cost, context, and popularity rank.
Open guideFree Models
Browse zero-price models for prototypes and evaluation.
Open guideRAG Models
Start from large context windows and practical input-cost constraints.
Open guideChatbot Costs
Find budget-sensitive models for output-heavy assistant traffic.
Open guide