Start with the lowest viable price band for your workload, then compare context window, provider fit, popularity signal, and output-token cost before choosing an API.
LLM API Pricing Guide
This guide explains practical LLM API pricing trade-offs and ranks models by a standardized workload estimate so teams can start from budget constraints.
Quick shortlist
Start with Owl Alpha.
This guide is sorted by standard workload cost, so the first rows are the strongest budget shortlist before model-quality testing.
The ranking is a discovery aid, not a final recommendation. Always compare the model against your workload and verify provider pricing before production use.
LLM API Price Bands
Estimate your workloadAssistant and chatbot products often spend more on output than input. Use the calculator when output tokens are the main driver of monthly cost.
Models are sorted by estimated cost for 1,000,000 input tokens and 500,000 output tokens. Use this page when your first constraint is API spend.
Estimate your workload cost
Customize guide costs
This estimate uses normalized public API pricing per 1M tokens. It is a planning aid, not a billing quote. Verify provider pricing, limits, and terms before production use.
Model Ranking
Browse all models| Model | Provider | Prompt | Output | Sample cost | Your Cost | Context | Popularity | Release |
|---|---|---|---|---|---|---|---|---|
| 🔥Owl Alpha | OpenRouter | $0 | $0 | $0 | $0 | 1.05M | #7 | |
| New🔥Nemotron 3 Ultra (free) | NVIDIA | $0 | $0 | $0 | $0 | 1M | #12 | |
| 🔥Laguna M.1 (free) | Poolside | $0 | $0 | $0 | $0 | 262.14K | #14 | |
| Nemotron 3 Super (free) | NVIDIA | $0 | $0 | $0 | $0 | 1M | #23 | |
| gpt-oss-120b (free) | OpenAI | $0 | $0 | $0 | $0 | 131.07K | #33 | |
| Laguna XS.2 (free) | Poolside | $0 | $0 | $0 | $0 | 262.14K | #47 | |
| GLM 4.5 Air (free) | Z.ai | $0 | $0 | $0 | $0 | 131.07K | #50 | |
| gpt-oss-20b (free) | OpenAI | $0 | $0 | $0 | $0 | 131.07K | #67 | |
| Gemma 4 31B (free) | $0 | $0 | $0 | $0 | 262.14K | #68 | ||
| Nemotron 3 Nano 30B A3B (free) | NVIDIA | $0 | $0 | $0 | $0 | 256K | #75 | |
| Kimi K2.6 (free) | MoonshotAI | $0 | $0 | $0 | $0 | 262.14K | #83 | |
| Nemotron 3 Nano Omni (free) | NVIDIA | $0 | $0 | $0 | $0 | 256K | #94 | |
| Nemotron Nano 9B V2 (free) | NVIDIA | $0 | $0 | $0 | $0 | 128K | #105 | |
| Nemotron Nano 12B 2 VL (free) | NVIDIA | $0 | $0 | $0 | $0 | 128K | #107 | |
| Gemma 4 26B A4B (free) | $0 | $0 | $0 | $0 | 262.14K | #140 | ||
| NewNemotron 3.5 Content Safety (free) | NVIDIA | $0 | $0 | $0 | $0 | 128K | #181 | |
| LFM2.5-1.2B-Thinking (free) | LiquidAI | $0 | $0 | $0 | $0 | 32.77K | #184 | |
| LFM2.5-1.2B-Instruct (free) | LiquidAI | $0 | $0 | $0 | $0 | 32.77K | #195 | |
| Qwen3 Next 80B A3B Instruct (free) | Qwen | $0 | $0 | $0 | $0 | 262.14K | #210 | |
| Llama 3.3 70B Instruct (free) | Meta | $0 | $0 | $0 | $0 | 131.07K | #213 | |
| Uncensored (free) | Venice | $0 | $0 | $0 | $0 | 32.77K | #242 | |
| Hermes 3 405B Instruct (free) | Nous | $0 | $0 | $0 | $0 | 131.07K | #257 | |
| Llama 3.2 3B Instruct (free) | Meta | $0 | $0 | $0 | $0 | 131.07K | #258 | |
| Lyria 3 Pro Preview | $0 | $0 | $0 | $0 | 1.05M | #283 | ||
| Lyria 3 Clip Preview | $0 | $0 | $0 | $0 | 1.05M | #291 | ||
| NewNorth Mini Code (free) | Cohere | $0 | $0 | $0 | $0 | 256K | ||
| NewKimi K2.7 Code (free) | MoonshotAI | $0 | $0 | $0 | $0 | 262.14K | ||
| NewNex-N2-Pro (free) | Nex AGI | $0 | $0 | $0 | $0 | 262.14K | ||
| CoBuddy (free) | Baidu Qianfan | $0 | $0 | $0 | $0 | 131.07K | ||
| DeepSeek V4 Flash (free) | DeepSeek | $0 | $0 | $0 | $0 | 1.05M | ||
| Trinity Large Thinking (free) | Arcee AI | $0 | $0 | $0 | $0 | 262.14K | ||
| MiniMax M2.5 (free) | MiniMax | $0 | $0 | $0 | $0 | 204.8K | ||
| Free Models Router | OpenRouter | $0 | $0 | $0 | $0 | 200K | ||
| Qwen3 Coder 480B A35B (free) | Qwen | $0 | $0 | $0 | $0 | 1.05M | ||
| Ling-2.6-flash | inclusionAI | $0.01 | $0.03 | $0.03 | $0.03 | 262.14K | #43 | |
| Mistral Nemo | Mistral | $0.02 | $0.03 | $0.04 | $0.04 | 131.07K | #39 | |
| Llama 3.1 8B Instruct | Meta | $0.02 | $0.03 | $0.04 | $0.04 | 131.07K | #44 | |
| Llama 3 8B Lunaris | Sao10K | $0.04 | $0.05 | $0.07 | $0.07 | 8.19K | #127 | |
| Granite 4.0 Micro | IBM | $0.017 | $0.112 | $0.07 | $0.07 | 131K | #225 | |
| Qwen2.5 7B Instruct | Qwen | $0.04 | $0.1 | $0.09 | $0.09 | 131.07K | #106 | |
| LFM2-24B-A2B | LiquidAI | $0.03 | $0.12 | $0.09 | $0.09 | 128K | #119 | |
| Mistral Small 3 | Mistral | $0.05 | $0.08 | $0.09 | $0.09 | 32.77K | #138 | |
| MythoMax 13B | gryphe | $0.06 | $0.06 | $0.09 | $0.09 | 4.1K | #193 | |
| gpt-oss-20b | OpenAI | $0.029 | $0.14 | $0.1 | $0.1 | 131.07K | #61 | |
| Granite 4.1 8B | IBM | $0.05 | $0.1 | $0.1 | $0.1 | 131.07K | #156 | |
| Gemma 3 4B | $0.05 | $0.1 | $0.1 | $0.1 | 131.07K | #158 | ||
| gpt-oss-120b | OpenAI | $0.03 | $0.15 | $0.1 | $0.1 | 131.07K | #22 | |
| Nova Micro 1.0 | Amazon | $0.035 | $0.14 | $0.11 | $0.11 | 128K | #110 | |
| Command R7B (12-2024) | Cohere | $0.0375 | $0.15 | $0.11 | $0.11 | 128K | #228 | |
| Trinity Mini | Arcee AI | $0.045 | $0.15 | $0.12 | $0.12 | 131.07K | #179 |
Pricing FAQ
How is the sample workload cost calculated?
The sample workload uses 1,000,000 input tokens plus 500,000 output tokens, then applies each model's normalized USD price per 1 million tokens.
Why do input and output token prices matter separately?
Many applications are output-token heavy, while retrieval and classification workloads may be input-token heavy. Comparing both prices helps avoid picking a model that is cheap for the wrong workload shape.
Should I verify prices before production use?
Yes. AI Model Matrix normalizes public pricing metadata for comparison, but provider availability, limits, and prices can change. Always verify the final contract or provider dashboard before production use.
Related Guides
Cheapest LLM APIs
Sort models by estimated workload cost and normalized token prices.
Open guideLargest Context Windows
Find models for long documents, retrieval, and codebase context.
Open guideCoding Models
Compare code-oriented models by cost, context, and practical popularity signals.
Open guideFree Models
Browse zero-price models for prototypes and evaluation.
Open guideRAG Models
Start from large context windows and practical input-cost constraints.
Open guideChatbot Costs
Find budget-sensitive models for output-heavy assistant traffic.
Open guideCost Calculator
Enter your own input and output token volume before narrowing the shortlist.
Estimate costAlternatives
Find cheaper candidates around popular model anchors.
Find alternatives