Cheapest LLM API Models

This page ranks models by a standardized workload estimate: 1 million input tokens plus 500 thousand output tokens. It is built for developers comparing API cost before choosing a model or provider.

50Models listed
1M + 500KCost example tokens
USD / 1MNormalized prices

Quick shortlist

Start with Ring-2.6-1T (free).

This guide is sorted by standard workload cost, so the first rows are the strongest budget shortlist before model-quality testing.

Lead model New🔥Ring-2.6-1T (free)
ProviderinclusionAI
Sample cost$0
Context262.14K

The ranking is a discovery aid, not a final recommendation. Always compare the model against your workload and verify provider pricing before production use.

How to read this ranking

Models are sorted by estimated cost for 1 million input tokens and 500 thousand output tokens. Use this page when your first constraint is API spend.

Estimate your workload cost

Customize guide costs

Prices are normalized to USD per 1M tokens.

This estimate uses normalized public API pricing per 1M tokens. It is a planning aid, not a billing quote. Verify provider pricing, limits, and terms before production use.

Model Ranking

Browse all models
ModelProviderPromptOutputExample CostYour CostContextRankRelease
New🔥Ring-2.6-1T (free)inclusionAI$0$0$0$0262.14K#102026-05-08
🔥Nemotron 3 Super (free)NVIDIA$0$0$0$0262.14K#122026-03-11
🔥Owl AlphaOpenRouter$0$0$0$01.05M#172026-04-28
CoBuddy (free)Baidu Qianfan$0$0$0$0131.07KUnranked2026-05-06
Nemotron 3 Nano Omni (free)NVIDIA$0$0$0$0256KUnranked2026-04-28
Laguna XS.2 (free)Poolside$0$0$0$0131.07KUnranked2026-04-28
Laguna M.1 (free)Poolside$0$0$0$0131.07KUnranked2026-04-28
Qianfan-OCR-Fast (free)Baidu$0$0$0$065.54KUnranked2026-04-20
Gemma 4 26B A4B (free)Google$0$0$0$0262.14KUnranked2026-04-03
Gemma 4 31B (free)Google$0$0$0$0262.14KUnranked2026-04-02
Trinity Large Thinking (free)Arcee AI$0$0$0$0262.14KUnranked2026-04-01
Lyria 3 Pro PreviewGoogle$0$0$0$01.05MUnranked2026-03-30
Lyria 3 Clip PreviewGoogle$0$0$0$01.05MUnranked2026-03-30
MiniMax M2.5 (free)MiniMax$0$0$0$0196.61KUnranked2026-02-12
Free Models RouterOpenRouter$0$0$0$0200KUnranked2026-02-01
LFM2.5-1.2B-Thinking (free)LiquidAI$0$0$0$032.77KUnranked2026-01-20
LFM2.5-1.2B-Instruct (free)LiquidAI$0$0$0$032.77KUnranked2026-01-20
Nemotron 3 Nano 30B A3B (free)NVIDIA$0$0$0$0256KUnranked2025-12-14
Nemotron Nano 12B 2 VL (free)NVIDIA$0$0$0$0128KUnranked2025-10-28
Qwen3 Next 80B A3B Instruct (free)Qwen$0$0$0$0262.14KUnranked2025-09-11
Nemotron Nano 9B V2 (free)NVIDIA$0$0$0$0128KUnranked2025-09-05
gpt-oss-120b (free)OpenAI$0$0$0$0131.07KUnranked2025-08-05
gpt-oss-20b (free)OpenAI$0$0$0$0131.07KUnranked2025-08-05
GLM 4.5 Air (free)Z.ai$0$0$0$0131.07KUnranked2025-07-25
Qwen3 Coder 480B A35B (free)Qwen$0$0$0$0262KUnranked2025-07-23
Uncensored (free)Venice$0$0$0$032.77KUnranked2025-07-09
Llama 3.3 70B Instruct (free)Meta$0$0$0$065.54KUnranked2024-12-06
Llama 3.2 3B Instruct (free)Meta$0$0$0$0131.07KUnranked2024-09-25
Hermes 3 405B Instruct (free)Nous$0$0$0$0131.07KUnranked2024-08-16
Mistral NemoMistral$0.02$0.03$0.04$0.04131.07KUnranked2024-07-19
Llama 3.1 8B InstructMeta$0.02$0.05$0.04$0.0416.38KUnranked2024-07-23
Llama 3 8B InstructMeta$0.04$0.04$0.06$0.068.19KUnranked2024-04-18
Llama 3 8B LunarisSao10K$0.04$0.05$0.07$0.078.19KUnranked2024-08-13
Granite 4.0 MicroIBM$0.017$0.11$0.07$0.07131KUnranked2025-10-20
Gemma 3 4BGoogle$0.04$0.08$0.08$0.08131.07KUnranked2025-03-13
LFM2-24B-A2BLiquidAI$0.03$0.12$0.09$0.0932.77KUnranked2026-02-25
Mistral Small 3Mistral$0.05$0.08$0.09$0.0932.77KUnranked2025-01-30
Qwen2.5 7B InstructQwen$0.04$0.1$0.09$0.0932.77KUnranked2024-10-16
MythoMax 13Bgryphe$0.06$0.06$0.09$0.094.1KUnranked2023-07-02
Granite 4.1 8BIBM$0.05$0.1$0.1$0.1131.07KUnranked2026-04-30
gpt-oss-20bOpenAI$0.03$0.14$0.1$0.1131.07KUnranked2025-08-05
Gemma 3 12BGoogle$0.04$0.13$0.11$0.11131.07KUnranked2025-03-13
Nova Micro 1.0Amazon$0.035$0.14$0.11$0.11128KUnranked2024-12-05
Command R7B (12-2024)Cohere$0.0375$0.15$0.11$0.11128KUnranked2024-12-14
Qwen3.5-9BQwen$0.04$0.15$0.11$0.11262.14KUnranked2026-03-10
Trinity MiniArcee AI$0.045$0.15$0.12$0.12131.07KUnranked2025-12-01
Nemotron Nano 9B V2NVIDIA$0.04$0.16$0.12$0.12131.07KUnranked2025-09-05
Gemma 3n 4BGoogle$0.06$0.12$0.12$0.1232.77KUnranked2025-05-20
Qwen3 235B A22B Instruct 2507Qwen$0.071$0.1$0.12$0.12262.14KUnranked2025-07-21
Llama 3.2 1B InstructMeta$0.027$0.2$0.13$0.1360KUnranked2024-09-25

Pricing FAQ

How is the sample workload cost calculated?

The sample workload uses 1 million input tokens plus 500 thousand output tokens, then applies each model's normalized USD price per 1 million tokens.

Why do input and output token prices matter separately?

Many applications are output-token heavy, while retrieval and classification workloads may be input-token heavy. Comparing both prices helps avoid picking a model that is cheap for the wrong workload shape.

Should I verify prices before production use?

Yes. AI Model Matrix normalizes public pricing metadata for comparison, but provider availability, limits, and prices can change. Always verify the final contract or provider dashboard before production use.

Related Guides

Cheapest LLM APIs

Sort models by estimated workload cost and normalized token prices.

Open guide

Largest Context Windows

Find models for long documents, retrieval, and codebase context.

Open guide

Coding Models

Compare code-oriented models by cost, context, and popularity rank.

Open guide

Free Models

Browse zero-price models for prototypes and evaluation.

Open guide

RAG Models

Start from large context windows and practical input-cost constraints.

Open guide

Chatbot Costs

Find budget-sensitive models for output-heavy assistant traffic.

Open guide