Model Library

A collection of models available for use with the AI Gateway.

AI Gateway Overview

AI Gateway is a proxy service from Vercel designed to work with AI SDK 5. It makes it fast and easy to use a wide range of AI models across providers. Use the AI Gateway provider with any model available in the library below and Vercel handles the rest.

Note: AI Gateway is currently in alpha release and not yet ready for production use. Usage is subject to rate limits based on your Vercel plan, with limits refreshing every 24 hours. Your use of this early preview is subject to Vercel's Public Beta Agreement and AI Policy.

Usage Instructions

Installation

Follow the instructions at one of the demo apps: Next.js Demo, Svelte Demo or you can create a new app and install AI SDK 5 Alpha with the AI Gateway provider module by creating a new Next.js app.

pnpm dlx create-next-app@latest my-ai-app

And then installing the AI SDK 5 Alpha and AI Gateway provider:

pnpm add ai@alpha @vercel/ai-sdk-gateway

How Authentication Works

Your Vercel project has an OIDC token associated with it. This token is used for authenticating with the AI Gateway service. During local development, run your dev server with the following command for auto-refreshing authentication tokens:

vc dev

Alternatively, you can run your dev server as you normally do and just pull your environment variables when the token expires (every 12 hours).

vc env pull

In a Vercel deployment, the project's OIDC token is automatically available. Just deploy your project as normal and you can use your desired AI models.

Using the Gateway Provider

You can use the AI Gateway provider in your app as you would any other AI SDK provider. Simple streaming looks something like:

import { gateway } from "@vercel/ai-sdk-gateway";
import { streamText } from "ai";

const result = streamText({
  model: gateway("xai/grok-3-beta"),
  prompt: "Tell me the history of the San Francisco Mission-style burrito.",
  onError: (error: unknown) => { console.error(error); },
});

We'd love to hear your feedback with the Feedback button above, or on X.

95 models available

anthropic/claude-v3-opus

Anthropic Claude 3 Opus

Claude 3 Opus is Anthropic's most intelligent model, with best-in-market performance on highly complex tasks. It can navigate open-ended prompts and sight-unseen scenarios with remarkable fluency and human-like understanding. Opus shows us the outer limits of what's possible with generative AI.

Context

200,000 tokens

Input Pricing

$15.00 / million tokens

Output Pricing

$75.00 / million tokens

anthropic/claude-v3.5-sonnet

Anthropic Claude 3.5 Sonnet

Claude 3.5 Sonnet strikes the ideal balance between intelligence and speed—particularly for enterprise workloads. It delivers strong performance at a lower cost compared to its peers, and is engineered for high endurance in large-scale AI deployments.

Context

200,000 tokens

Input Pricing

$3.00 / million tokens

Output Pricing

$15.00 / million tokens

anthropic/claude-v3-haiku

Anthropic Claude 3 Haiku

Claude 3 Haiku is Anthropic's fastest model yet, designed for enterprise workloads which often involve longer prompts. Haiku to quickly analyze large volumes of documents, such as quarterly filings, contracts, or legal cases, for half the cost of other models in its performance tier.

Context

200,000 tokens

Input Pricing

$0.25 / million tokens

Output Pricing

$1.25 / million tokens

anthropic/claude-3.5-haiku

Anthropic Claude 3.5 Haiku

Claude 3.5 Haiku is the next generation of our fastest model. For a similar speed to Claude 3 Haiku, Claude 3.5 Haiku improves across every skill set and surpasses Claude 3 Opus, the largest model in our previous generation, on many intelligence benchmarks.

Context

200,000 tokens

Input Pricing

$0.80 / million tokens

Output Pricing

$4.00 / million tokens

anthropic/claude-3.7-sonnet

Anthropic Claude 3.7 Sonnet

Claude 3.7 Sonnet is the first hybrid reasoning model and Anthropic's most intelligent model to date. It delivers state-of-the-art performance for coding, content generation, data analysis, and planning tasks, building upon its predecessor Claude 3.5 Sonnet's capabilities in software engineering and computer use.

Context

200,000 tokens

Input Pricing

$3.00 / million tokens

Output Pricing

$15.00 / million tokens

anthropic/claude-3.7-sonnet-reasoning

Anthropic Claude 3.7 Sonnet Reasoning

Context

200,000 tokens

Input Pricing

$3.00 / million tokens

Output Pricing

$15.00 / million tokens

bedrock/amazon.nova-pro-v1:0

Bedrock Nova Pro

A highly capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks.

Context

300,000 tokens

Input Pricing

$0.80 / million tokens

Output Pricing

$3.20 / million tokens

bedrock/amazon.nova-lite-v1:0

Bedrock Nova Lite

A very low cost multimodal model that is lightning fast for processing image, video, and text inputs.

Context

300,000 tokens

Input Pricing

$0.06 / million tokens

Output Pricing

$0.24 / million tokens

bedrock/amazon.nova-micro-v1:0

Bedrock Nova Micro

A text-only model that delivers the lowest latency responses at very low cost.

Context

128,000 tokens

Input Pricing

$0.04 / million tokens

Output Pricing

$0.14 / million tokens

bedrock/claude-3-7-sonnet-20250219

Bedrock Claude 3.7 Sonnet (Bedrock)

Claude 3.7 Sonnet is Anthropic's most intelligent model to date and the first Claude model to offer extended thinking—the ability to solve complex problems with careful, step-by-step reasoning. Anthropic is the first AI lab to introduce a single model where users can balance speed and quality by choosing between standard thinking for near-instant responses or extended thinking or advanced reasoning. Claude 3.7 Sonnet is state-of-the-art for coding, and delivers advancements in computer use, agentic capabilities, complex reasoning, and content generation. With frontier performance and more control over speed, Claude 3.7 Sonnet is the ideal choice for powering AI agents, especially customer-facing agents, and complex AI workflows.

Context

200,000 tokens

Input Pricing

$3.00 / million tokens

Output Pricing

$15.00 / million tokens

bedrock/claude-3-5-haiku-20241022

Bedrock Claude 3.5 Haiku (Bedrock)

Claude 3 Haiku is Anthropic's fastest, most compact model for near-instant responsiveness. It answers simple queries and requests with speed. Customers will be able to build seamless AI experiences that mimic human interactions. Claude 3 Haiku can process images and return text outputs, and features a 200K context window.

Context

200,000 tokens

Input Pricing

$0.80 / million tokens

Output Pricing

$4.00 / million tokens

bedrock/claude-3-5-sonnet-20241022-v2

Bedrock Claude 3.5 Sonnet v2 (Bedrock)

The upgraded Claude 3.5 Sonnet is now state-of-the-art for a variety of tasks including real-world software engineering, agentic capabilities and computer use. The new Claude 3.5 Sonnet delivers these advancements at the same price and speed as its predecessor.

Context

200,000 tokens

Input Pricing

$3.00 / million tokens

Output Pricing

$15.00 / million tokens

bedrock/claude-3-5-sonnet-20240620-v1

Bedrock Claude 3.5 Sonnet (Bedrock)

Claude 3.5 Sonnet raises the industry bar for intelligence, outperforming competitor models and Claude 3 Opus on a wide range of evaluations, with the speed and cost of our mid-tier model, Claude 3 Sonnet.

Context

200,000 tokens

Input Pricing

$3.00 / million tokens

Output Pricing

$15.00 / million tokens

bedrock/claude-3-haiku-20240307-v1

Bedrock Claude 3 Haiku (Bedrock)

Context

200,000 tokens

Input Pricing

$0.25 / million tokens

Output Pricing

$1.25 / million tokens

bedrock/meta.llama4-maverick-17b-instruct-v1

Bedrock Llama 4 Maverick 17B Instruct (Bedrock)

As a general purpose LLM, Llama 4 Maverick contains 17 billion active parameters, 128 experts, and 400 billion total parameters, offering high quality at a lower price compared to Llama 3.3 70B.

Context

128,000 tokens

Input Pricing

$0.24 / million tokens

Output Pricing

$0.97 / million tokens

bedrock/meta.llama4-scout-17b-instruct-v1

Bedrock Llama 4 Scout 17B Instruct (Bedrock)

Llama 4 Scout is the best multimodal model in the world in its class and is more powerful than our Llama 3 models, while fitting in a single H100 GPU. Additionally, Llama 4 Scout supports an industry-leading context window of up to 10M tokens.

Context

128,000 tokens

Input Pricing

$0.17 / million tokens

Output Pricing

$0.66 / million tokens

bedrock/meta.llama3-3-70b-instruct-v1

Bedrock Llama 3.3 70B Instruct (Bedrock)

Where performance meets efficiency. This model supports high-performance conversational AI designed for content creation, enterprise applications, and research, offering advanced language understanding capabilities, including text summarization, classification, sentiment analysis, and code generation.

Context

128,000 tokens

Input Pricing

$0.72 / million tokens

Output Pricing

$0.72 / million tokens

bedrock/meta.llama3-2-11b-instruct-v1

Bedrock Llama 3.2 11B Vision Instruct (Bedrock)

Instruction-tuned image reasoning generative model (text + images in / text out) optimized for visual recognition, image reasoning, captioning and answering general questions about the image.

Context

128,000 tokens

Input Pricing

$0.16 / million tokens

Output Pricing

$0.16 / million tokens

bedrock/meta.llama3-2-1b-instruct-v1

Bedrock Llama 3.2 1B Instruct (Bedrock)

Text-only model, supporting on-device use cases such as multilingual local knowledge retrieval, summarization, and rewriting.

Context

128,000 tokens

Input Pricing

$0.10 / million tokens

Output Pricing

$0.10 / million tokens

bedrock/meta.llama3-2-3b-instruct-v1

Bedrock Llama 3.2 3B Instruct (Bedrock)

Text-only model, fine-tuned for supporting on-device use cases such as multilingual local knowledge retrieval, summarization, and rewriting.

Context

128,000 tokens

Input Pricing

$0.15 / million tokens

Output Pricing

$0.15 / million tokens

bedrock/meta.llama3-2-90b-instruct-v1

Bedrock Llama 3.2 90B Vision Instruct (Bedrock)

Instruction-tuned image reasoning generative model (text + images in / text out) optimized for visual recognition, image reasoning, captioning and answering general questions about the image.

Context

128,000 tokens

Input Pricing

$0.72 / million tokens

Output Pricing

$0.72 / million tokens

bedrock/meta.llama3-1-70b-instruct-v1

Bedrock Llama 3.1 70B Instruct (Bedrock)

An update to Meta Llama 3 70B Instruct that includes an expanded 128K context length, multilinguality and improved reasoning capabilities.

Context

128,000 tokens

Input Pricing

$0.72 / million tokens

Output Pricing

$0.72 / million tokens

bedrock/meta.llama3-1-8b-instruct-v1

Bedrock Llama 3.1 8B Instruct (Bedrock)

An update to Meta Llama 3 8B Instruct that includes an expanded 128K context length, multilinguality and improved reasoning capabilities.

Context

128,000 tokens

Input Pricing

$0.22 / million tokens

Output Pricing

$0.22 / million tokens

bedrock/deepseek.r1-v1

Bedrock DeepSeek-R1 (Bedrock)

DeepSeek-R1 provides customers a state-of-the-art reasoning model, optimized for general reasoning tasks, math, science, and code generation.

Context

128,000 tokens

Input Pricing

$1.35 / million tokens

Output Pricing

$5.40 / million tokens

cerebras/llama-4-scout-17b-16e-instruct

Cerebras Llama 4 Scout

The Llama-4-Scout-17B-16E-Instruct model is a state-of-the-art, instruction-tuned, multimodal AI model developed by Meta as part of the Llama 4 family. It is designed to handle both text and image inputs, making it suitable for a wide range of applications, including conversational AI, code generation, and visual reasoning.

Context

128,000 tokens

Input Pricing

$0.65 / million tokens

Output Pricing

$0.85 / million tokens

cerebras/llama3.1-8b

Cerebras Llama 3.1 8B

Llama 3.1 8B brings powerful performance in a smaller, more efficient package. With improved multilingual support, tool use, and a 128K context length, it enables sophisticated use cases like interactive agents and compact coding assistants while remaining lightweight and accessible.

Context

128,000 tokens

Input Pricing

$0.10 / million tokens

Output Pricing

$0.10 / million tokens

cerebras/llama-3.3-70b

Cerebras Llama 3.3 70B

The upgraded Llama 3.1 70B model features enhanced reasoning, tool use, and multilingual abilities, along with a significantly expanded 128K context window. These improvements make it well-suited for demanding tasks such as long-form summarization, multilingual conversations, and coding assistance.

Context

128,000 tokens

Input Pricing

$0.85 / million tokens

Output Pricing

$1.20 / million tokens

cerebras/deepseek-r1-distill-llama-70b

Cerebras DeepSeek R1 Distill Llama 70B

DeepSeek-R1 is a state-of-the-art reasoning model trained with reinforcement learning and cold-start data, delivering strong performance across math, code, and complex reasoning tasks. It offers improved stability, readability, and multilingual handling compared to earlier versions, and is available alongside several high-quality distilled variants.

Context

128,000 tokens

Input Pricing

$2.20 / million tokens

Output Pricing

$2.50 / million tokens

cerebras/qwen-3-32b

Cerebras Qwen 3.32B

Qwen3-32B is a world-class model with comparable quality to DeepSeek R1 while outperforming GPT-4.1 and Claude Sonnet 3.7. It excels in code-gen, tool-calling, and advanced reasoning, making it an exceptional model for a wide range of production use cases.

Context

128,000 tokens

Input Pricing

$0.40 / million tokens

Output Pricing

$0.80 / million tokens

cohere/command-a

Cohere Command A

Command A is Cohere's most performant model to date, excelling at tool use, agents, retrieval augmented generation (RAG), and multilingual use cases. Command A has a context length of 256K, only requires two GPUs to run, and has 150% higher throughput compared to Command R+ 08-2024.

Context

256,000 tokens

Input Pricing

$2.50 / million tokens

Output Pricing

$10.00 / million tokens

cohere/command-r-plus

Cohere Command R+

Command R+ is Cohere's newest large language model, optimized for conversational interaction and long-context tasks. It aims at being extremely performant, enabling companies to move beyond proof of concept and into production.

Context

128,000 tokens

Input Pricing

$2.50 / million tokens

Output Pricing

$10.00 / million tokens

cohere/command-r

Cohere Command R

Command R is a large language model optimized for conversational interaction and long context tasks. It targets the "scalable" category of models that balance high performance with strong accuracy, enabling companies to move beyond proof of concept and into production.

Context

128,000 tokens

Input Pricing

$0.15 / million tokens

Output Pricing

$0.60 / million tokens

deepinfra/llama-4-maverick-17b-128e-instruct-fp8

DeepInfra Llama 4 Maverick 17B 128E Instruct FP8

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. Llama 4 Maverick, a 17 billion parameter model with 128 experts. Served by DeepInfra.

Context

131,072 tokens

Input Pricing

$0.20 / million tokens

Output Pricing

$0.60 / million tokens

deepinfra/llama-4-scout-17b-16e-instruct

DeepInfra Llama 4 Scout 17B 16E Instruct

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. Llama 4 Scout, a 17 billion parameter model with 16 experts. Served by DeepInfra.

Context

131,072 tokens

Input Pricing

$0.10 / million tokens

Output Pricing

$0.30 / million tokens

deepinfra/qwen3-235b-a22b

DeepInfra Qwen3-235B-A22B

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support

Context

40,960 tokens

Input Pricing

$0.20 / million tokens

Output Pricing

$0.60 / million tokens

deepinfra/qwen3-30b-a3b

DeepInfra Qwen3-30B-A3B

Context

40,960 tokens

Input Pricing

$0.10 / million tokens

Output Pricing

$0.30 / million tokens

deepinfra/qwen3-32b

DeepInfra Qwen3-32B

Context

40,960 tokens

Input Pricing

$0.10 / million tokens

Output Pricing

$0.30 / million tokens

deepinfra/qwen3-14b

DeepInfra Qwen3-14B

Context

40,960 tokens

Input Pricing

$0.08 / million tokens

Output Pricing

$0.24 / million tokens

deepseek/chat

DeepSeek hosted on Fireworks DeepSeek-V3

DeepSeek-V3 is an open-source large language model that builds upon LLaMA (Meta's foundational language model) to enable versatile functionalities such as text generation, code completion, and more, served by Fireworks AI.

Context

128,000 tokens

Input Pricing

$0.90 / million tokens

Output Pricing

$0.90 / million tokens

deepseek/deepseek-r1

DeepSeek hosted on Fireworks DeepSeek R1

DeepSeek Reasoner is a specialized model developed by DeepSeek that uses Chain of Thought (CoT) reasoning to improve response accuracy. Before providing a final answer, it generates detailed reasoning steps that are accessible through the API, allowing users to examine and leverage the model's thought process, served by Fireworks AI.

Context

160,000 tokens

Input Pricing

$3.00 / million tokens

Output Pricing

$8.00 / million tokens

fireworks/firefunction-v1

Fireworks FireFunction V1

Fireworks' GPT-4-level function calling model - 4x faster than GPT-4 and open weights.

Context

32,768 tokens

Input Pricing

$0.90 / million tokens

Output Pricing

$0.90 / million tokens

fireworks/mixtral-8x22b-instruct

Fireworks Mixtral MoE 8x22B Instruct

8x22b Instruct model. 8x22b is mixture-of-experts open source model by Mistral served by Fireworks.

Context

2,048 tokens

Input Pricing

$1.20 / million tokens

Output Pricing

$1.20 / million tokens

fireworks/mixtral-8x7b-instruct

Fireworks Mixtral MoE 8x7B Instruct

Mistral MoE 8x7B Instruct v0.1 model with Sparse Mixture of Experts. Fine tuned for instruction following.Warning: unofficial implementation + served by Fireworks.

Context

4,096 tokens

Input Pricing

$0.50 / million tokens

Output Pricing

$0.50 / million tokens

fireworks/qwq-32b

Fireworks QwQ-32B

QwQ-32B is a 32B parameter open-source model that uses Reinforcement Learning to enhance reasoning capabilities for coding and mathematical tasks, achieving performance comparable to much larger models.

Context

131,072 tokens

Input Pricing

$0.90 / million tokens

Output Pricing

$0.90 / million tokens

fireworks/qwen3-235b-a22b

Fireworks Qwen3-235B-A22B

Qwen3 235B with 22B active parameter model.

Context

32,768 tokens

Input Pricing

$0.90 / million tokens

Output Pricing

$0.90 / million tokens

groq/llama-3.2-1b

Groq Llama 3.2 1B

The Llama 3.2, 1 billion parameter multi-lingual text only model is made by Meta. It is lightweight and can be run everywhere on mobile and on edge devices. Served by Groq with their custom Language Processing Units (LPUs) hardware to provide fast and efficient inference.

Context

128,000 tokens

Input Pricing

$0.04 / million tokens

Output Pricing

$0.04 / million tokens

groq/llama-3.3-70b-versatile

Groq Llama 3.3 70B Versatile

The Meta Llama 3.3 multilingual model is a pretrained and instruction tuned generative model with 70B parameters. Optimized for multilingual dialogue use cases, it outperforms many of the available open source and closed chat models on common industry benchmarks. Served by Groq with their custom Language Processing Units (LPUs) hardware to provide fast and efficient inference.

Context

128,000 tokens

Input Pricing

$0.59 / million tokens

Output Pricing

$0.79 / million tokens

groq/llama-3.1-8b

Groq Llama 3.1 8B Instant

Llama 3.1 8B with 128K context window support, making it ideal for real-time conversational interfaces and data analysis while offering significant cost savings compared to larger models. Served by Groq with their custom Language Processing Units (LPUs) hardware to provide fast and efficient inference.

Context

128,000 tokens

Input Pricing

$0.05 / million tokens

Output Pricing

$0.08 / million tokens

groq/llama-3-8b-instruct

Groq Llama 3 8B Instruct

Llama is a 8 billion parameter open source model by Meta fine-tuned for instruction following purposes. Served by Groq with their custom Language Processing Units (LPUs) hardware to provide fast and efficient inference.

Context

8,192 tokens

Input Pricing

$0.05 / million tokens

Output Pricing

$0.08 / million tokens

groq/llama-3-70b-instruct

Groq Llama 3 70B Instruct

Llama is a 70 billion parameter open source model by Meta fine-tuned for instruction following purposes. Served by Groq with their custom Language Processing Units (LPUs) hardware to provide fast and efficient inference.

Context

8,192 tokens

Input Pricing

$0.59 / million tokens

Output Pricing

$0.79 / million tokens

groq/gemma2-9b-it

Groq Gemma 2 9B IT

9 billion parameter open source model by Google fine-tuned for chat purposes. Served by Groq with their custom Language Processing Units (LPUs) hardware to provide fast and efficient inference.

Context

8,192 tokens

Input Pricing

$0.20 / million tokens

Output Pricing

$0.20 / million tokens

groq/deepseek-r1-distill-llama-70b

Groq DeepSeek R1 Distill Llama 70B

DeepSeek-R1-Distill-Llama-70B is a distilled, more efficient variant of the 70B Llama model. It preserves strong performance across text-generation tasks, reducing computational overhead for easier deployment and research. Served by Groq with their custom Language Processing Units (LPUs) hardware to provide fast and efficient inference.

Context

131,072 tokens

Input Pricing

$0.75 / million tokens

Output Pricing

$0.99 / million tokens

groq/mistral-saba-24b

Groq Mistral Saba 24B

Mistral Saba 24B is a 24 billion parameter open source model by Mistral.ai. Saba is a specialized model trained to excel in Arabic, Farsi, Urdu, Hebrew, and Indic languages. Served by Groq with their custom Language Processing Units (LPUs) hardware to provide fast and efficient inference.

Context

32,768 tokens

Input Pricing

$0.79 / million tokens

Output Pricing

$0.79 / million tokens

groq/qwen-qwq-32b

Groq QWQ-32B

Qwen QWQ-32B is a powerful large language model with strong reasoning capabilities and versatile applications across various tasks. Served by Groq with their custom Language Processing Units (LPUs) hardware to provide fast and efficient inference.

Context

32,768 tokens

Input Pricing

$0.29 / million tokens

Output Pricing

$0.39 / million tokens

groq/llama-4-scout-17b-16e-instruct

Groq Llama 4 Scout 17B 16E Instruct

Llama 4 Scout is Meta's natively multimodal model with a 17B parameter mixture-of-experts architecture (16 experts), offering exceptional performance across text and image understanding with support for 12 languages, optimized for assistant-like chat, image recognition, and coding tasks. Served by Groq with their custom Language Processing Units (LPUs) hardware to provide fast and efficient inference.

Context

131,072 tokens

Input Pricing

$0.11 / million tokens

Output Pricing

$0.34 / million tokens

inception/mercury-coder-small

Inception Mercury Coder Small Beta

Mercury Coder Small is ideal for code generation, debugging, and refactoring tasks with minimal latency.

Context

32,000 tokens

Input Pricing

$0.25 / million tokens

Output Pricing

$1.00 / million tokens

mistral/mistral-large

Mistral Mistral Large

Mistral Large is ideal for complex tasks that require large reasoning capabilities or are highly specialized - like Synthetic Text Generation, Code Generation, RAG, or Agents.

Context

32,000 tokens

Input Pricing

$2.00 / million tokens

Output Pricing

$6.00 / million tokens

mistral/mistral-small

Mistral Mistral Small

Mistral Small is the ideal choice for simple tasks that one can do in bulk - like Classification, Customer Support, or Text Generation. It offers excellent performance at an affordable price point.

Context

32,000 tokens

Input Pricing

$0.10 / million tokens

Output Pricing

$0.30 / million tokens

mistral/codestral-2501

Mistral Mistral Codestral 25.01

Mistral Codestral 25.01 is a state-of-the-art coding model optimized for low-latency, high-frequency use cases. Proficient in over 80 programming languages, it excels at tasks like fill-in-the-middle (FIM), code correction, and test generation.

Context

256,000 tokens

Input Pricing

$0.20 / million tokens

Output Pricing

$0.60 / million tokens

mistral/pixtral-12b-2409

Mistral Pixtral 12B 2409

A 12B model with image understanding capabilities in addition to text.

Context

128,000 tokens

Input Pricing

$2.00 / million tokens

Output Pricing

$6.00 / million tokens

mistral/ministral-3b-latest

Mistral Ministral 3B

A compact, efficient model for on-device tasks like smart assistants and local analytics, offering low-latency performance.

Context

128,000 tokens

Input Pricing

$0.04 / million tokens

Output Pricing

$0.04 / million tokens

mistral/ministral-8b-latest

Mistral Ministral 8B

A more powerful model with faster, memory-efficient inference, ideal for complex workflows and demanding edge applications.

Context

128,000 tokens

Input Pricing

$0.10 / million tokens

Output Pricing

$0.10 / million tokens

mistral/pixtral-large-latest

Mistral Pixtral Large

Pixtral Large is the second model in our multimodal family and demonstrates frontier-level image understanding. Particularly, the model is able to understand documents, charts and natural images, while maintaining the leading text-only understanding of Mistral Large 2.

Context

128,000 tokens

Input Pricing

$2.00 / million tokens

Output Pricing

$6.00 / million tokens

mistral/mistral-small-2503

Mistral Mistral Small 2503

Mistral Small 3.1 is a state-of-the-art multimodal and multilingual model with excellent benchmark performance while delivering 150 tokens per second inference speeds and supporting up to 128k context window.

Context

128,000 tokens

Input Pricing

$0.10 / million tokens

Output Pricing

$0.30 / million tokens

openai/o3-mini

OpenAI o3-mini

o3-mini is OpenAI's most recent small reasoning model, providing high intelligence at the same cost and latency targets of o1-mini.

Context

200,000 tokens

Input Pricing

$1.10 / million tokens

Output Pricing

$4.40 / million tokens

openai/o3

OpenAI o3

OpenAI's o3 is their most powerful reasoning model, setting new state-of-the-art benchmarks in coding, math, science, and visual perception. It excels at complex queries requiring multi-faceted analysis, with particular strength in analyzing images, charts, and graphics.

Context

200,000 tokens

Input Pricing

$10.00 / million tokens

Output Pricing

$40.00 / million tokens

openai/o4-mini

OpenAI o4-mini

OpenAI's o4-mini delivers fast, cost-efficient reasoning with exceptional performance for its size, particularly excelling in math (best-performing on AIME benchmarks), coding, and visual tasks.

Context

200,000 tokens

Input Pricing

$1.10 / million tokens

Output Pricing

$4.40 / million tokens

openai/gpt-4.1

OpenAI GPT-4.1

GPT 4.1 is OpenAI's flagship model for complex tasks. It is well suited for problem solving across domains.

Context

1,047,576 tokens

Input Pricing

$2.00 / million tokens

Output Pricing

$8.00 / million tokens

openai/gpt-4.1-mini

OpenAI GPT-4.1 mini

GPT 4.1 mini provides a balance between intelligence, speed, and cost that makes it an attractive model for many use cases.

Context

1,047,576 tokens

Input Pricing

$0.40 / million tokens

Output Pricing

$1.60 / million tokens

openai/gpt-4.1-nano

OpenAI GPT-4.1 nano

GPT-4.1 nano is the fastest, most cost-effective GPT 4.1 model.

Context

1,047,576 tokens

Input Pricing

$0.10 / million tokens

Output Pricing

$0.40 / million tokens

openai/gpt-4o

OpenAI GPT-4o

GPT-4o from OpenAI has broad general knowledge and domain expertise allowing it to follow complex instructions in natural language and solve difficult problems accurately. It matches GPT-4 Turbo performance with a faster and cheaper API.

Context

128,000 tokens

Input Pricing

$2.50 / million tokens

Output Pricing

$10.00 / million tokens

openai/gpt-4o-mini

OpenAI GPT-4o mini

GPT-4o mini from OpenAI is their most advanced and cost-efficient small model. It is multi-modal (accepting text or image inputs and outputting text) and has higher intelligence than gpt-3.5-turbo but is just as fast.

Context

128,000 tokens

Input Pricing

$0.15 / million tokens

Output Pricing

$0.60 / million tokens

openai/gpt-4-turbo

OpenAI GPT-4 Turbo

gpt-4-turbo from OpenAI has broad general knowledge and domain expertise allowing it to follow complex instructions in natural language and solve difficult problems accurately. It has a knowledge cutoff of April 2023 and a 128,000 token context window.

Context

128,000 tokens

Input Pricing

$10.00 / million tokens

Output Pricing

$30.00 / million tokens

openai/gpt-3.5-turbo

OpenAI GPT-3.5 Turbo

OpenAI's most capable and cost effective model in the GPT-3.5 family optimized for chat purposes, but also works well for traditional completions tasks.

Context

4,096 tokens

Input Pricing

$0.50 / million tokens

Output Pricing

$1.50 / million tokens

openai/gpt-3.5-turbo-instruct

OpenAI GPT-3.5 Turbo Instruct

Similar capabilities as GPT-3 era models. Compatible with legacy Completions endpoint and not Chat Completions.

Context

4,096 tokens

Input Pricing

$1.50 / million tokens

Output Pricing

$2.00 / million tokens

perplexity/sonar

Perplexity Sonar

Perplexity's lightweight offering with search grounding, quicker and cheaper than Sonar Pro.

Context

127,000 tokens

Input Pricing

$1.00 / million tokens

Output Pricing

$1.00 / million tokens

perplexity/sonar-pro

Perplexity Sonar Pro

Perplexity's premier offering with search grounding, supporting advanced queries and follow-ups.

Context

200,000 tokens

Input Pricing

$3.00 / million tokens

Output Pricing

$15.00 / million tokens

perplexity/sonar-reasoning

Perplexity Sonar Reasoning

A reasoning-focused model that outputs Chain of Thought (CoT) in responses, providing detailed explanations with search grounding.

Context

127,000 tokens

Input Pricing

$1.00 / million tokens

Output Pricing

$5.00 / million tokens

perplexity/sonar-reasoning-pro

Perplexity Sonar Reasoning Pro

A premium reasoning-focused model that outputs Chain of Thought (CoT) in responses, providing comprehensive explanations with enhanced search capabilities and multiple search queries per request.

Context

127,000 tokens

Input Pricing

$2.00 / million tokens

Output Pricing

$8.00 / million tokens

vertex/claude-3-7-sonnet-20250219

Google Vertex AI Claude 3.7 Sonnet (Vertex)

A highly capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks.

Context

200,000 tokens

Input Pricing

$3.00 / million tokens

Output Pricing

$15.00 / million tokens

vertex/claude-3-5-sonnet-v2-20241022

Google Vertex AI Claude 3.5 Sonnet v2 (Vertex)

The upgraded Claude 3.5 Sonnet is now state-of-the-art for a variety of tasks including real-world software engineering, enhanced agentic capabilities, and computer use.

Context

200,000 tokens

Input Pricing

$3.00 / million tokens

Output Pricing

$15.00 / million tokens

vertex/claude-3-5-haiku-20241022

Google Vertex AI Claude 3.5 Haiku (Vertex)

Claude 3.5 Haiku, Anthropic’s fastest and most cost-effective model, excels at use cases like code and test case generation, sub-agents, and user-facing chatbots.

Context

200,000 tokens

Input Pricing

$0.80 / million tokens

Output Pricing

$4.00 / million tokens

vertex/claude-3-opus-20240229

Google Vertex AI Claude 3 Opus (Vertex)

Claude 3 Opus is a powerful AI model, with top-level performance on highly complex tasks. It can navigate open-ended prompts and sight-unseen scenarios with remarkable fluency and human-like understanding.

Context

200,000 tokens

Input Pricing

$15.00 / million tokens

Output Pricing

$75.00 / million tokens

vertex/claude-3-haiku-20240307

Google Vertex AI Claude 3 Haiku (Vertex)

Claude 3 Haiku is Anthropic's fastest vision and text model for near-instant responses to simple queries, meant for seamless AI experiences mimicking human interactions.

Context

200,000 tokens

Input Pricing

$0.25 / million tokens

Output Pricing

$1.25 / million tokens

vertex/claude-3-5-sonnet-20240620

Google Vertex AI Claude 3.5 Sonnet (Vertex)

Claude 3.5 Sonnet outperforms Claude 3 Opus on a wide range of Anthropic’s evaluations with the speed and cost of Anthropic’s mid-tier model, Claude 3 Sonnet.

Context

200,000 tokens

Input Pricing

$3.00 / million tokens

Output Pricing

$15.00 / million tokens

vertex/gemini-2.0-flash-001

Google Vertex AI Gemini 2.0 Flash (Vertex)

Gemini 2.0 Flash delivers next-gen features and improved capabilities, including superior speed, built-in tool use, multimodal generation, and a 1M token context window.

Context

1,048,576 tokens

Input Pricing

$0.15 / million tokens

Output Pricing

$0.60 / million tokens

vertex/gemini-2.0-flash-lite-001

Google Vertex AI Gemini 2.0 Flash Lite (Vertex)

Gemini 2.0 Flash delivers next-gen features and improved capabilities, including superior speed, built-in tool use, multimodal generation, and a 1M token context window.

Context

1,048,576 tokens

Input Pricing

$0.08 / million tokens

Output Pricing

$0.30 / million tokens

vertex/llama-4-scout-17b-16e-instruct-maas

Google Vertex AI Llama 4 Scout 17B 16E Instruct (Vertex)

Llama 4 Scout 17B 16E Instruct is a multimodal model that uses the Mixture-of-Experts (MoE) architecture and early fusion, delivering state-of-the-art results for its size class.

Context

1,310,720 tokens

Input Pricing

$0.25 / million tokens

Output Pricing

$0.70 / million tokens

vertex/llama-4-maverick-17b-128e-instruct-maas

Google Vertex AI Llama 4 Maverick 17B 128E Instruct (Vertex)

Llama 4 Maverick 17B-128E is Llama 4's largest and most capable model. It uses the Mixture-of-Experts (MoE) architecture and early fusion to provide coding, reasoning, and image capabilities.

Context

1,310,720 tokens

Input Pricing

$0.35 / million tokens

Output Pricing

$1.15 / million tokens

xai/grok-2-1212

xAI Grok 2

Grok 2 is a frontier language model with state-of-the-art reasoning capabilities. It features advanced capabilities in chat, coding, and reasoning, outperforming both Claude 3.5 Sonnet and GPT-4-Turbo on the LMSYS leaderboard.

Context

131,072 tokens

Input Pricing

$2.00 / million tokens

Output Pricing

$10.00 / million tokens

xai/grok-2-vision-1212

xAI Grok 2 Vision

Grok 2 vision model excels in vision-based tasks, delivering state-of-the-art performance in visual math reasoning (MathVista) and document-based question answering (DocVQA). It can process a wide variety of visual information including documents, diagrams, charts, screenshots, and photographs.

Context

32,768 tokens

Input Pricing

$2.00 / million tokens

Output Pricing

$10.00 / million tokens

xai/grok-3-beta

xAI Grok 3 Beta

xAI's flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science.

Context

131,072 tokens

Input Pricing

$3.00 / million tokens

Output Pricing

$15.00 / million tokens

xai/grok-3-fast-beta

xAI Grok 3 Fast Beta

xAI's flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science. The fast model variant is served on faster infrastructure, offering response times that are significantly faster than the standard. The increased speed comes at a higher cost per output token.

Context

131,072 tokens

Input Pricing

$5.00 / million tokens

Output Pricing

$25.00 / million tokens

xai/grok-3-mini-beta

xAI Grok 3 Mini Beta

xAI's lightweight model that thinks before responding. Great for simple or logic-based tasks that do not require deep domain knowledge. The raw thinking traces are accessible.

Context

131,072 tokens

Input Pricing

$0.30 / million tokens

Output Pricing

$0.50 / million tokens

xai/grok-3-mini-fast-beta

xAI Grok 3 Mini Fast Beta

xAI's lightweight model that thinks before responding. Great for simple or logic-based tasks that do not require deep domain knowledge. The raw thinking traces are accessible. The fast model variant is served on faster infrastructure, offering response times that are significantly faster than the standard. The increased speed comes at a higher cost per output token.

Context

131,072 tokens

Input Pricing

$0.60 / million tokens

Output Pricing

$4.00 / million tokens