LLM API Pricing Guide

This guide explains practical LLM API pricing trade-offs and ranks models by a standardized workload estimate so teams can start from budget constraints.

50Models listed
1M input + 500K outputCost example tokens
USD / 1MNormalized prices

Quick shortlist

Start with Owl Alpha.

This guide is sorted by standard workload cost, so the first rows are the strongest budget shortlist before model-quality testing.

Lead model 🔥Owl Alpha
ProviderOpenRouter
Sample cost$0
Context1.05M

The ranking is a discovery aid, not a final recommendation. Always compare the model against your workload and verify provider pricing before production use.

LLM API Price Bands

Estimate your workload
34Free or zero-price models
155Under $1 sample workload
100$1 to $5 sample workload
87$5+ sample workload
Use price bands before shortlisting

Start with the lowest viable price band for your workload, then compare context window, provider fit, popularity signal, and output-token cost before choosing an API.

Watch output-heavy workloads

Assistant and chatbot products often spend more on output than input. Use the calculator when output tokens are the main driver of monthly cost.

How to read this ranking

Models are sorted by estimated cost for 1,000,000 input tokens and 500,000 output tokens. Use this page when your first constraint is API spend.

Estimate your workload cost

Customize guide costs

Prices are normalized to USD per 1M tokens.

This estimate uses normalized public API pricing per 1M tokens. It is a planning aid, not a billing quote. Verify provider pricing, limits, and terms before production use.

Model Ranking

Browse all models
ModelProviderPromptOutputSample costYour CostContextPopularityRelease
🔥Owl AlphaOpenRouter$0$0$0$01.05M#7
New🔥Nemotron 3 Ultra (free)NVIDIA$0$0$0$01M#12
🔥Laguna M.1 (free)Poolside$0$0$0$0262.14K#14
Nemotron 3 Super (free)NVIDIA$0$0$0$01M#23
gpt-oss-120b (free)OpenAI$0$0$0$0131.07K#33
Laguna XS.2 (free)Poolside$0$0$0$0262.14K#47
GLM 4.5 Air (free)Z.ai$0$0$0$0131.07K#50
gpt-oss-20b (free)OpenAI$0$0$0$0131.07K#67
Gemma 4 31B (free)Google$0$0$0$0262.14K#68
Nemotron 3 Nano 30B A3B (free)NVIDIA$0$0$0$0256K#75
Kimi K2.6 (free)MoonshotAI$0$0$0$0262.14K#83
Nemotron 3 Nano Omni (free)NVIDIA$0$0$0$0256K#94
Nemotron Nano 9B V2 (free)NVIDIA$0$0$0$0128K#105
Nemotron Nano 12B 2 VL (free)NVIDIA$0$0$0$0128K#107
Gemma 4 26B A4B (free)Google$0$0$0$0262.14K#140
NewNemotron 3.5 Content Safety (free)NVIDIA$0$0$0$0128K#181
LFM2.5-1.2B-Thinking (free)LiquidAI$0$0$0$032.77K#184
LFM2.5-1.2B-Instruct (free)LiquidAI$0$0$0$032.77K#195
Qwen3 Next 80B A3B Instruct (free)Qwen$0$0$0$0262.14K#210
Llama 3.3 70B Instruct (free)Meta$0$0$0$0131.07K#213
Uncensored (free)Venice$0$0$0$032.77K#242
Hermes 3 405B Instruct (free)Nous$0$0$0$0131.07K#257
Llama 3.2 3B Instruct (free)Meta$0$0$0$0131.07K#258
Lyria 3 Pro PreviewGoogle$0$0$0$01.05M#283
Lyria 3 Clip PreviewGoogle$0$0$0$01.05M#291
NewNorth Mini Code (free)Cohere$0$0$0$0256K
NewKimi K2.7 Code (free)MoonshotAI$0$0$0$0262.14K
NewNex-N2-Pro (free)Nex AGI$0$0$0$0262.14K
CoBuddy (free)Baidu Qianfan$0$0$0$0131.07K
DeepSeek V4 Flash (free)DeepSeek$0$0$0$01.05M
Trinity Large Thinking (free)Arcee AI$0$0$0$0262.14K
MiniMax M2.5 (free)MiniMax$0$0$0$0204.8K
Free Models RouterOpenRouter$0$0$0$0200K
Qwen3 Coder 480B A35B (free)Qwen$0$0$0$01.05M
Ling-2.6-flashinclusionAI$0.01$0.03$0.03$0.03262.14K#43
Mistral NemoMistral$0.02$0.03$0.04$0.04131.07K#39
Llama 3.1 8B InstructMeta$0.02$0.03$0.04$0.04131.07K#44
Llama 3 8B LunarisSao10K$0.04$0.05$0.07$0.078.19K#127
Granite 4.0 MicroIBM$0.017$0.112$0.07$0.07131K#225
Qwen2.5 7B InstructQwen$0.04$0.1$0.09$0.09131.07K#106
LFM2-24B-A2BLiquidAI$0.03$0.12$0.09$0.09128K#119
Mistral Small 3Mistral$0.05$0.08$0.09$0.0932.77K#138
MythoMax 13Bgryphe$0.06$0.06$0.09$0.094.1K#193
gpt-oss-20bOpenAI$0.029$0.14$0.1$0.1131.07K#61
Granite 4.1 8BIBM$0.05$0.1$0.1$0.1131.07K#156
Gemma 3 4BGoogle$0.05$0.1$0.1$0.1131.07K#158
gpt-oss-120bOpenAI$0.03$0.15$0.1$0.1131.07K#22
Nova Micro 1.0Amazon$0.035$0.14$0.11$0.11128K#110
Command R7B (12-2024)Cohere$0.0375$0.15$0.11$0.11128K#228
Trinity MiniArcee AI$0.045$0.15$0.12$0.12131.07K#179

Pricing FAQ

How is the sample workload cost calculated?

The sample workload uses 1,000,000 input tokens plus 500,000 output tokens, then applies each model's normalized USD price per 1 million tokens.

Why do input and output token prices matter separately?

Many applications are output-token heavy, while retrieval and classification workloads may be input-token heavy. Comparing both prices helps avoid picking a model that is cheap for the wrong workload shape.

Should I verify prices before production use?

Yes. AI Model Matrix normalizes public pricing metadata for comparison, but provider availability, limits, and prices can change. Always verify the final contract or provider dashboard before production use.

Related Guides

Cheapest LLM APIs

Sort models by estimated workload cost and normalized token prices.

Open guide

Largest Context Windows

Find models for long documents, retrieval, and codebase context.

Open guide

Coding Models

Compare code-oriented models by cost, context, and practical popularity signals.

Open guide

Free Models

Browse zero-price models for prototypes and evaluation.

Open guide

RAG Models

Start from large context windows and practical input-cost constraints.

Open guide

Chatbot Costs

Find budget-sensitive models for output-heavy assistant traffic.

Open guide

Cost Calculator

Enter your own input and output token volume before narrowing the shortlist.

Estimate cost