Что такое WuProj?

WuProj — это AI API и LLM-прокси: один OpenAI-совместимый ключ к 300+ моделям от Anthropic, OpenAI, Google, xAI, DeepSeek. Оплата за токены из единого баланса wu-gold через эндпоинт https://api.wuproj.com/v1.

Какие модели доступны?

300+ моделей: Claude, GPT, Gemini, Grok, DeepSeek и другие — через один ключ и один эндпоинт. Модель указывается в поле model (например claude-opus-4.6, gpt-5.4, gemini-3.1-pro, grok-4.20).

Сколько стоит WuProj?

Оплата за использованные токены из баланса wu-gold: 1 gold = 110 ₽, без обязательной подписки, пополнение в рублях. Цена каждой модели за запрос видна в каталоге.

WuProj совместим с OpenAI API?

Да, drop-in замена: base_url https://api.wuproj.com/v1, ключ wu-… и модель из каталога. Тот же формат, что у OpenAI (chat/completions, streaming, tools, vision). Работает с OpenAI SDK, SillyTavern, Janitor, Cursor.

Можно использовать для агентов, кода и ролеплея?

Да: топовые reasoning-модели (Claude, GPT, Gemini, Grok) с function-calling и стримингом для агентов и кода; те же модели по оплате за токены для ролеплея, плюс RollApi для Gemini по дневному лимиту.

Чем WuProj отличается от прямого доступа к провайдерам?

Один ключ и один баланс вместо аккаунтов у каждого провайдера, оплата в рублях, 300+ моделей сразу, единый OpenAI-формат и автоматический фейловер. Модель меняется одной строкой в поле model.

Каталог LLM-моделей — 300+ в WuProj AI API

OOpenAI63 модели

GPT-3.5 Turbo

GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for c

ctx 16K~0.025ggeneral

GPT-3.5 Turbo (older v0613)

GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for c

ctx 4K~0.047ggeneral

GPT-3.5 Turbo 16k

This model offers four times the context length of gpt-3.5-turbo, allowing it to support approximately 20 pages of text

ctx 16K~0.133ggeneral

GPT-3.5 Turbo Instruct

This model is a variant of GPT-3.5 Turbo tuned for instructional prompts and omitting chat-related optimizations. Traini

ctx 4K~0.066ggeneral

GPT-4

OpenAI's flagship model, GPT-4 is a large-scale multimodal language model capable of solving difficult problems with gre

ctx 8K~1.40greasoning

GPT-4 (older v0314)

GPT-4-0314 is the first version of GPT-4 released, with a context length of 8,192 tokens, and was supported until June 1

ctx 8K~1.40ggeneral

GPT-4 Turbo (older v1106)

The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Train

ctx 128K~0.507ggeneral

GPT-4 Turbo

multimodal

The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Train

ctx 128K~0.507gvision

GPT-4 Turbo Preview

The preview GPT-4 model with improved instruction following, JSON mode, reproducible outputs, parallel function calling,

ctx 128K~0.507ggeneral

GPT-4.1

multimodal

GPT-4.1 is a flagship large language model optimized for advanced instruction following, real-world software engineering

ctx 1.0M~0.109greasoning

GPT-4.1 Mini

multimodal

GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantially lower latency and cost

ctx 1.0M~0.022gvision

GPT-4.1 Nano

multimodal

For tasks that demand low latency, GPT‑4.1 nano is the fastest and cheapest model in the GPT-4.1 series. It delivers exc

ctx 1.0M~0.0055gvision

GPT-4o

multimodal

GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintai

ctx 128K~0.137gvision

GPT-4o (2024-05-13)

multimodal

GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintai

ctx 128K~0.254gvision

GPT-4o (2024-08-06)

multimodal

The 2024-08-06 version of GPT-4o offers improved performance in structured outputs, with the ability to supply a JSON sc

ctx 128K~0.137gvision

GPT-4o (2024-11-20)

multimodal

The 2024-11-20 version of GPT-4o offers a leveled-up creative writing ability with more natural, engaging, and tailored

ctx 128K~0.137gvision

GPT-4o Audio

The gpt-4o-audio-preview model adds support for audio inputs as prompts. This enhancement allows the model to detect nua

ctx 128K~0.137gvision

GPT-4o-mini

multimodal

GPT-4o mini is OpenAI's newest model after GPT-4 Omni, supporting both text and image inputs with text outputs. As their

ctx 128K~0.0082gvision

GPT-4o-mini (2024-07-18)

multimodal

GPT-4o mini is OpenAI's newest model after GPT-4 Omni, supporting both text and image inputs with text outputs. As their

ctx 128K~0.0082gvision

GPT-4o-mini Search Preview

GPT-4o mini Search Preview is a specialized model for web search in Chat Completions. It is trained to understand and ex

ctx 128K~0.0082gvision

GPT-4o Search Preview

GPT-4o Search Previewis a specialized model for web search in Chat Completions. It is trained to understand and execute

ctx 128K~0.137gvision

GPT-5

multimodal

GPT-5 is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It i

ctx 400K~0.088greasoning

GPT-5 Chat

multimodal

GPT-5 Chat is designed for advanced, natural, multimodal, and context-aware conversations for enterprise applications

ctx 128K~0.088greasoning

GPT-5 Codex

multimodal

GPT-5-Codex is a specialized version of GPT-5 optimized for software engineering and coding workflows. It is designed fo

ctx 400K~0.088greasoning

GPT-5 Image

multimodal

GPT-5 Image combines OpenAI's GPT-5 model with state-of-the-art image generation capabilities. It offers major improveme

ctx 400K~0.429greasoning

GPT-5 Image Mini

multimodal

GPT-5 Image Mini combines OpenAI's advanced language capabilities, powered by GPT-5 Mini, with GPT Image 1 Mini for effi

ctx 400K~0.105greasoning

GPT-5 Mini

multimodal

GPT-5 Mini is a compact version of GPT-5, designed to handle lighter-weight reasoning tasks. It provides the same instru

ctx 400K~0.018greasoning

GPT-5 Nano

multimodal

GPT-5-Nano is the smallest and fastest variant in the GPT-5 system, optimized for developer tools, rapid interactions, a

ctx 400K~0.0035greasoning

GPT-5 Pro

multimodal

GPT-5 Pro is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience.

ctx 400K~1.05greasoning

GPT-5.1

multimodal

GPT-5.1 is the latest frontier-grade model in the GPT-5 series, offering stronger general-purpose reasoning, improved in

ctx 400K~0.088greasoning

GPT-5.1 Chat

multimodal

GPT-5.1 Chat (AKA Instant is the fast, lightweight member of the 5.1 family, optimized for low-latency chat while retain

ctx 128K~0.088greasoning

GPT-5.1-Codex

multimodal

GPT-5.1-Codex is a specialized version of GPT-5.1 optimized for software engineering and coding workflows. It is designe

ctx 400K~0.088greasoning

GPT-5.1-Codex-Max

multimodal

GPT-5.1-Codex-Max is OpenAI’s latest agentic coding model, designed for long-running, high-context software development

ctx 400K~0.088greasoning

GPT-5.1-Codex-Mini

multimodal

GPT-5.1-Codex-Mini is a smaller and faster version of GPT-5.1-Codex

ctx 400K~0.018greasoning

GPT-5.2

multimodal

GPT-5.2 is the latest frontier-grade model in the GPT-5 series, offering stronger agentic and long context perfomance co

ctx 400K~0.123greasoning

GPT-5.2 Chat

multimodal

GPT-5.2 Chat (AKA Instant) is the fast, lightweight member of the 5.2 family, optimized for low-latency chat while retai

ctx 128K~0.123greasoning

GPT-5.2-Codex

multimodal

GPT-5.2-Codex is an upgraded version of GPT-5.1-Codex optimized for software engineering and coding workflows. It is des

ctx 400K~0.123greasoning

GPT-5.2 Pro

multimodal

GPT-5.2 Pro is OpenAI’s most advanced model, offering major improvements in agentic coding and long context performance

ctx 400K~1.47greasoning

GPT-5.3 Chat

multimodal

GPT-5.3 Chat is an update to ChatGPT's most-used model that makes everyday conversations smoother, more useful, and more

ctx 128K~0.123greasoning

GPT-5.3-Codex

multimodal

GPT-5.3-Codex is OpenAI’s most advanced agentic coding model, combining the frontier software engineering performance of

ctx 400K~0.123greasoning

GPT-5.4

multimodal

GPT-5.4 is OpenAI’s latest frontier model, unifying the Codex and GPT lines into a single system. It features a 1M+ toke

ctx 1.1M~0.156greasoning

GPT-5.4 Image 2

multimodal

GPT-5.4 Image 2 combines OpenAI's GPT-5.4 model with state-of-the-art image generation capabilities from GPT Image 2. It

ctx 272K~0.370greasoning

GPT-5.4 Mini

multimodal

GPT-5.4 mini brings the core capabilities of GPT-5.4 to a faster, more efficient model optimized for high-throughput wor

ctx 400K~0.047greasoning

GPT-5.4 Nano

multimodal

GPT-5.4 nano is the most lightweight and cost-efficient variant of the GPT-5.4 family, optimized for speed-critical and

ctx 400K~0.013greasoning

GPT-5.4 Pro

multimodal

GPT-5.4 Pro is OpenAI's most advanced model, building on GPT-5.4's unified architecture with enhanced reasoning capabili

ctx 1.1M~1.87gflagship

GPT-5.5

multimodal

GPT-5.5 is OpenAI’s frontier model designed for complex professional workloads, building on GPT-5.4 with stronger reason

ctx 1.1M~0.312gflagship

GPT-5.5 Pro

multimodal

GPT-5.5 Pro is OpenAI’s high-capability model optimized for deep reasoning and accuracy on complex, high-stakes workload

ctx 1.1M~1.87gflagship

GPT Audio

The gpt-audio model is OpenAI's first generally available audio model. The new snapshot features an upgraded decoder for

ctx 128K~0.137ggeneral

GPT Audio Mini

A cost-efficient version of GPT Audio. The new snapshot features an upgraded decoder for more natural sounding voices an

ctx 128K~0.033ggeneral

GPT Chat Latest

multimodal

GPT Chat Latest points to OpenAI's stable API alias `chat-latest` that always resolves to the latest Instant chat model

ctx 400K~0.312gvision

gpt-oss-120b

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-rea

ctx 131K~0.0022greasoning

gpt-oss-20b

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-

ctx 131K~0.0017ggeneral

gpt-oss-safeguard-20b

gpt-oss-safeguard-20b is a safety reasoning model from OpenAI built upon gpt-oss-20b. This open-weight, 21B-parameter Mi

ctx 131K~0.0041greasoning

o1

multimodal

The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1

ctx 200K~0.819greasoning

o1-pro

multimodal

The o1 series of models are trained with reinforcement learning to think before they answer and perform complex reasonin

ctx 200K~8.19greasoning

o3

multimodal

o3 is a well-rounded and powerful model across domains. It sets a new standard for math, science, coding, and visual rea

ctx 200K~0.109greasoning

o3 Deep Research

multimodal

o3-deep-research is OpenAI's advanced model for deep research, designed to tackle complex, multi-step research tasks. No

ctx 200K~0.546greasoning

o3 Mini

OpenAI o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science,

ctx 200K~0.060greasoning

o3 Mini High

OpenAI o3-mini-high is the same model as o3-mini with reasoning_effort set to high. o3-mini is a cost-efficient language

ctx 200K~0.060greasoning

o3 Pro

multimodal

The o-series of models are trained with reinforcement learning to think before they answer and perform complex reasoning

ctx 200K~1.09greasoning

o4 Mini

multimodal

OpenAI o4-mini is a compact reasoning model in the o-series, optimized for fast, cost-efficient performance while retain

ctx 200K~0.060greasoning

o4 Mini Deep Research

multimodal

o4-mini-deep-research is OpenAI's faster, more affordable deep research model—ideal for tackling complex, multi-step res

ctx 200K~0.109greasoning

o4 Mini High

multimodal

OpenAI o4-mini-high is the same model as o4-mini with reasoning_effort set to high. OpenAI o4-mini is a compact reasonin

ctx 200K~0.060greasoning

QQwen45 моделей

Qwen2.5 72B Instruct

Qwen2.5 72B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: -

ctx 131K~0.016ggeneral

Qwen2.5 7B Instruct

Qwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - S

ctx 131K~0.0020ggeneral

Qwen2.5 Coder 32B Instruct

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Qwen2.5-Cod

ctx 128K~0.030greasoning

Qwen-Plus

Qwen-Plus, based on the Qwen2.5 foundation model, is a 131K context model with a balanced performance, speed, and cost c

ctx 1M~0.013ggeneral

Qwen Plus 0728

Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning model with a balanced perfo

ctx 1M~0.013greasoning

Qwen Plus 0728 (thinking)

Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning model with a balanced perfo

ctx 1M~0.013greasoning

Qwen2.5 VL 72B Instruct

multimodal

Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and insects. It is also highly capa

ctx 131K~0.013gvision

Qwen3 14B

Qwen3-14B is a dense 14.8B parameter causal language model from the Qwen3 series, designed for both complex reasoning an

ctx 132K~0.0048greasoning

Qwen3 235B A22B

Qwen3-235B-A22B is a 235B parameter mixture-of-experts (MoE) model developed by Qwen, activating 22B parameters per forw

ctx 131K~0.025greasoning

Qwen3 235B A22B Instruct 2507

Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-

ctx 262K~0.0032ggeneral

Qwen3 235B A22B Thinking 2507

Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) language model optimized for c

ctx 262K~0.012greasoning

Qwen3 30B A3B

Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) a

ctx 131K~0.0053greasoning

Qwen3 30B A3B Instruct 2507

Qwen3-30B-A3B-Instruct-2507 is a 30.5B-parameter mixture-of-experts language model from Qwen, with 3.3B active parameter

ctx 262K~0.0047greasoning

Qwen3 30B A3B Thinking 2507

Qwen3-30B-A3B-Thinking-2507 is a 30B parameter Mixture-of-Experts reasoning model optimized for complex tasks requiring

ctx 131K~0.0047greasoning

Qwen3 32B

Qwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimized for both complex reasoning a

ctx 131K~0.0042greasoning

Qwen3 8B

Qwen3-8B is a dense 8.2B parameter causal language model from the Qwen3 series, designed for both reasoning-heavy tasks

ctx 131K~0.0035greasoning

Qwen3 Coder 480B A35B

Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is opt

ctx 1.0M~0.016greasoning

Qwen3 Coder 30B A3B Instruct

Qwen3-Coder-30B-A3B-Instruct is a 30.5B parameter Mixture-of-Experts (MoE) model with 128 experts (8 active per forward

ctx 160K~0.0038gcoding

Qwen3 Coder Flash

Qwen3 Coder Flash is Alibaba's fast and cost efficient version of their proprietary Qwen3 Coder Plus. It is a powerful c

ctx 1M~0.011gcoding

Qwen3 Coder Next

Qwen3-Coder-Next is an open-weight causal language model optimized for coding agents and local development workflows. It

ctx 262K~0.0074gcoding

Qwen3 Coder Plus

Qwen3 Coder Plus is Alibaba's proprietary version of the Open Source Qwen3 Coder 480B A35B. It is a powerful coding agen

ctx 1M~0.038gcoding

Qwen3 Max

Qwen3-Max is an updated release built on the Qwen3 series, offering major improvements in reasoning, instruction followi

ctx 262K~0.046greasoning

Qwen3 Max Thinking

Qwen3-Max-Thinking is the flagship reasoning model in the Qwen3 series, designed for high-stakes cognitive tasks that re

ctx 262K~0.046greasoning

Qwen3 Next 80B A3B Instruct

Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimized for fast, stable respo

ctx 262K~0.0078greasoning

Qwen3 Next 80B A3B Thinking

Qwen3-Next-80B-A3B-Thinking is a reasoning-first chat model in the Qwen3-Next line that outputs structured “thinking” tr

ctx 262K~0.0068greasoning

Qwen3 VL 235B A22B Instruct

multimodal

Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understan

ctx 262K~0.011gvision

Qwen3 VL 235B A22B Thinking

multimodal

Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with visual understanding across i

ctx 131K~0.020greasoning

Qwen3 VL 30B A3B Instruct

multimodal

Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images

ctx 262K~0.0071gvision

Qwen3 VL 30B A3B Thinking

multimodal

Qwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text generation with visual understanding for images

ctx 131K~0.011greasoning

Qwen3 VL 32B Instruct

multimodal

Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and re

ctx 262K~0.0057greasoning

Qwen3 VL 8B Instruct

multimodal

Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understandi

ctx 256K~0.0051greasoning

Qwen3 VL 8B Thinking

multimodal

Qwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B multimodal model, designed for advanced visua

ctx 256K~0.0099greasoning

Qwen3.5-122B-A10B

multimodal

The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention

ctx 262K~0.018gvision

Qwen3.5-27B

multimodal

The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response t

ctx 262K~0.014gvision

Qwen3.5-35B-A3B

multimodal

The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear

ctx 262K~0.0093gvision

Qwen3.5 397B A17B

multimodal

The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear att

ctx 262K~0.024gvision

Qwen3.5-9B

multimodal

Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and v

ctx 262K~0.0021greasoning

Qwen3.5-Flash

multimodal

The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention me

ctx 1M~0.0035gvision

Qwen3.5 Plus 2026-02-15

multimodal

The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attentio

ctx 1M~0.016gvision

Qwen3.5 Plus 2026-04-20

multimodal

Qwen3.5 Plus (April 2026) is a large-scale multimodal language model from Alibaba. It accepts text, image, and video inp

ctx 1M~0.019gvision

Qwen3.6 27B

multimodal

Qwen3.6 27B is a dense 27-billion-parameter language model from the Qwen Team at Alibaba, released in April 2026. It fea

ctx 262K~0.025gvision

Qwen3.6 35B A3B

multimodal

Qwen3.6-35B-A3B is an open-weight multimodal model from Alibaba Cloud with 35 billion total parameters and 3 billion act

ctx 262K~0.0097gvision

Qwen3.6 Flash

multimodal

Qwen3.6 Flash is a fast, efficient language model from Alibaba's Qwen 3.6 series. It supports text, image, and video inp

ctx 1M~0.012gvision

Qwen3.6 Max Preview

Qwen3.6-Max-Preview is a proprietary frontier model from Alibaba Cloud built on a sparse mixture-of-experts architecture

ctx 262K~0.065gflagship

Qwen3.6 Plus

multimodal

Qwen 3.6 Plus builds on a hybrid architecture that combines efficient linear attention with sparse mixture-of-experts ro

ctx 1M~0.020gvision

GGoogle26 моделей

Gemini 2.0 Flash

multimodal

Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to Gemini Flash 1.5, while maintainin

ctx 1.0M~0.0040gvision

Gemini 2.0 Flash Lite

multimodal

Gemini 2.0 Flash Lite offers a significantly faster time to first token (TTFT) compared to Gemini Flash 1.5, while maint

ctx 1.0M~0.0030gvision

Gemini 2.5 Flash

multimodal

Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mat

ctx 1.0M~0.016greasoning

Nano Banana (Gemini 2.5 Flash Image)

multimodal

Gemini 2.5 Flash Image, a.k.a. "Nano Banana," is now generally available. It is a state of the art image generation mode

ctx 33K~0.016gvision

Gemini 2.5 Flash Lite

multimodal

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cos

ctx 1.0M~0.0040greasoning

Gemini 2.5 Flash Lite Preview 09-2025

multimodal

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cos

ctx 1.0M~0.0040greasoning

Gemini 2.5 Pro

multimodal

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientifi

ctx 1.0M~0.064greasoning

Gemini 2.5 Pro Preview 06-05

multimodal

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientifi

ctx 1.0M~0.064gflagship

Gemini 2.5 Pro Preview 05-06

multimodal

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientifi

ctx 1.0M~0.064gflagship

Gemini 3 Flash Preview

multimodal

Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and c

ctx 1.0M~0.023greasoning

Nano Banana Pro (Gemini 3 Pro Image Preview)

multimodal

Nano Banana Pro is Google’s most advanced image-generation and editing model, built on Gemini 3 Pro. It extends the orig

ctx 66K~0.091greasoning

Nano Banana 2 (Gemini 3.1 Flash Image Preview)

multimodal

Gemini 3.1 Flash Image Preview, a.k.a. "Nano Banana 2," is Google’s latest state of the art image generation and editing

ctx 131K~0.023greasoning

Gemini 3.1 Flash Lite

multimodal

Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads.

ctx 1.0M~0.011greasoning

Gemini 3.1 Flash Lite Preview

multimodal

Gemini 3.1 Flash Lite Preview is Google's high-efficiency model optimized for high-volume use cases. It outperforms Gemi

ctx 1.0M~0.011greasoning

Gemini 3.1 Pro Preview

multimodal

Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engineering performance, impro

ctx 1.0M~0.091gflagship

Gemini 3.1 Pro Preview Custom Tools

multimodal

Gemini 3.1 Pro Preview Custom Tools is a variant of Gemini 3.1 Pro that improves tool selection behavior by preventing o

ctx 1.0M~0.091gflagship

Gemini 3.5 Flash

multimodal

Gemini 3.5 Flash is Google's high-efficiency multimodal model, bringing near-Pro level coding and reasoning at Flash-tie

ctx 1.0M~0.068greasoning

Gemma 2 27B

Gemma 2 27B by Google is an open model built from the same research and technology used to create the Gemini models. Gem

ctx 8K~0.020ggeneral

Gemma 3 12B

multimodal

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 12

ctx 131K~0.0015greasoning

Gemma 3 27B

multimodal

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 12

ctx 131K~0.0027greasoning

Gemma 3 4B

multimodal

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 12

ctx 131K~0.0014greasoning

Gemma 3n 4B

Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and ta

ctx 33K~0.0020ggeneral

Gemma 4 26B A4B

multimodal

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total para

ctx 262K~0.0026gvision

Gemma 4 31B

multimodal

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output.

ctx 262K~0.0045greasoning

Lyria 3 Clip Preview

multimodal

30 second duration clips are priced at $0.04 per clip. Lyria 3 is Google's family of music generation models, available

ctx 1.0M—vision

Lyria 3 Pro Preview

multimodal

Full-length songs are priced at $0.08 per song. Lyria 3 is Google's family of music generation models, available through

ctx 1.0M—flagship

MMistral24 модели

Codestral 2508

Mistral's cutting-edge language model for coding released end of July 2025. Codestral specializes in low-latency, high-f

ctx 256K~0.015gcoding

Devstral 2 2512

Devstral 2 is a state-of-the-art open-source model by Mistral AI specializing in agentic coding. It is a 123B-parameter

ctx 262K~0.023gcoding

Devstral Medium

Devstral Medium is a high-performance code generation and agentic reasoning model developed jointly by Mistral AI and Al

ctx 131K~0.023greasoning

Devstral Small 1.1

Devstral Small 1.1 is a 24B parameter open-weight language model for software engineering agents, developed by Mistral A

ctx 131K~0.0051gcoding

Ministral 3 14B 2512

multimodal

The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to

ctx 262K~0.0086gvision

Ministral 3 3B 2512

multimodal

The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision ca

ctx 131K~0.0043gvision

Ministral 3 8B 2512

multimodal

A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capa

ctx 262K~0.0064gvision

Mistral 7B Instruct v0.1

A 7.3B parameter model that outperforms Llama 2 13B on all benchmarks, with optimizations for speed and context length

ctx 4K~0.0050ggeneral

Mistral Large

This is Mistral AI's flagship model, Mistral Large 2 (version `mistral-large-2407`). It's a proprietary weights-availabl

ctx 128K~0.101greasoning

Mistral Large 2407

This is Mistral AI's flagship model, Mistral Large 2 (version mistral-large-2407). It's a proprietary weights-available

ctx 131K~0.101greasoning

Mistral Large 2411

Mistral Large 2 2411 is an update of Mistral Large 2 released together with Pixtral Large 2411 It provides a significant

ctx 131K~0.101ggeneral

Mistral Large 3 2512

multimodal

Mistral Large 3 2512 is Mistral’s most capable model to date, featuring a sparse mixture-of-experts architecture with 41

ctx 262K~0.025gflagship

Mistral Medium 3

multimodal

Mistral Medium 3 is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities a

ctx 131K~0.023greasoning

Mistral Medium 3.5

multimodal

Mistral Medium 3.5 is a dense 128B instruction-following model from Mistral AI. It supports text and image inputs with t

ctx 262K~0.088gvision

Mistral Medium 3.1

multimodal

Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-performance enterprise-grade language mode

ctx 131K~0.023gvision

Mistral Nemo

A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multi

ctx 131K~0.0009ggeneral

Saba

Mistral Saba is a 24B-parameter language model specifically designed for the Middle East and South Asia, delivering accu

ctx 33K~0.010ggeneral

Mistral Small 3

Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released

ctx 33K~0.0023ggeneral

Mistral Small 4

multimodal

Mistral Small 4 is the next major release in the Mistral Small family, unifying the capabilities of several flagship Mis

ctx 262K~0.0082greasoning

Mistral Small 3.1 24B

multimodal

Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with ad

ctx 128K~0.016greasoning

Mistral Small 3.2 24B

multimodal

Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral optimized for instruction following,

ctx 128K~0.0037gvision

Mixtral 8x22B Instruct

Mistral's official instruct fine-tuned version of Mixtral 8x22B. It uses 39B active parameters out of 141B, offering unp

ctx 66K~0.101ggeneral

Pixtral Large 2411

multimodal

Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of Mistral Large 2. The model is able to u

ctx 131K~0.101gvision

Voxtral Small 24B 2507

Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retain

ctx 32K~0.0051ggeneral

AAnthropic15 моделей

Claude 3 Haiku

multimodal

Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targete

ctx 200K~0.015gvision

Claude 3.5 Haiku

multimodal

Claude 3.5 Haiku features offers enhanced capabilities in speed, coding accuracy, and tool use. Engineered to excel in r

ctx 200K~0.047gvision

Claude Haiku 4.5

multimodal

Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, delivering near-frontier intelligence at a fraction of

ctx 200K~0.059gvision

Claude Opus 4

multimodal

Claude Opus 4 is benchmarked as the world’s best coding model, at time of release, bringing sustained performance on com

ctx 200K~0.877gflagship

Claude Opus 4.1

multimodal

Claude Opus 4.1 is an updated version of Anthropic’s flagship model, offering improved performance in coding, reasoning,

ctx 200K~0.877gflagship

Claude Opus 4.5

multimodal

Claude Opus 4.5 is Anthropic’s frontier reasoning model optimized for complex software engineering, agentic workflows, a

ctx 200K~0.292gflagship

Claude Opus 4.6

multimodal

Opus 4.6 is Anthropic’s strongest model for coding and long-running professional tasks. It is built for agents that oper

ctx 1M~0.292gflagship

Claude Opus 4.6 (Fast)

multimodal

Fast-mode variant of Opus 4.6 - identical capabilities with higher output speed at premium 6x pricing. Learn more in Ant

ctx 1M~1.75gflagship

Claude Opus 4.7

multimodal

Opus 4.7 is the next generation of Anthropic's Opus family, built for long-running, asynchronous agents. Building on the

ctx 1M~0.292gflagship

Claude Opus 4.7 (Fast)

multimodal

Fast-mode variant of Opus 4.7 - identical capabilities with higher output speed at premium 6x pricing. Learn more in Ant

ctx 1M~1.75gflagship

Claude Opus 4.8

multimodal

Claude Opus 4.8 is Anthropic's most capable generally available model in the Opus family. It supports text, image, and f

ctx 1M~0.292gflagship

Claude Opus 4.8 (Fast)

multimodal

Fast-mode variant of Opus 4.8 - identical capabilities with higher output speed at 2x pricing relative to regular Opus 4

ctx 1M~0.585gflagship

Claude Sonnet 4

multimodal

Claude Sonnet 4 significantly enhances the capabilities of its predecessor, Sonnet 3.7, excelling in both coding and rea

ctx 1M~0.175greasoning

Claude Sonnet 4.5

multimodal

Claude Sonnet 4.5 is Anthropic’s most advanced Sonnet model to date, optimized for real-world agents and coding workflow

ctx 1M~0.175greasoning

Claude Sonnet 4.6

multimodal

Sonnet 4.6 is Anthropic's most capable Sonnet-class model yet, with frontier performance across coding, agents, and prof

ctx 1M~0.175greasoning

DDeepSeek13 моделей

DeepSeek V3

DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of

ctx 164K~0.016ggeneral

DeepSeek V3 0324

DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from

ctx 164K~0.011ggeneral

DeepSeek V3.1

DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinki

ctx 164K~0.011greasoning

R1

DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B

ctx 164K~0.037greasoning

R1 0528

May 28th update to the original DeepSeek R1 Performance on par with OpenAI o1, but open-sourced and with fully open reas

ctx 164K~0.028greasoning

R1 Distill Llama 70B

DeepSeek R1 Distill Llama 70B is a distilled large language model based on Llama-3.3-70B-Instruct, using outputs from De

ctx 131K~0.030greasoning

R1 Distill Qwen 32B

DeepSeek R1 Distill Qwen 32B is a distilled large language model based on Qwen 2.5 32B, using outputs from DeepSeek R1.

ctx 128K~0.012greasoning

DeepSeek V3.1 Terminus

DeepSeek-V3.1 Terminus is an update to DeepSeek V3.1 that maintains the model's original capabilities while addressing i

ctx 164K~0.014ggeneral

DeepSeek V3.2

DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and ag

ctx 131K~0.011greasoning

DeepSeek V3.2 Exp

DeepSeek-V3.2-Exp is an experimental large language model released by DeepSeek as an intermediate step between V3.1 and

ctx 164K~0.012ggeneral

DeepSeek V3.2 Speciale

DeepSeek-V3.2-Speciale is a high-compute variant of DeepSeek-V3.2 optimized for maximum reasoning and agentic performanc

ctx 164K~0.013greasoning

DeepSeek V4 Flash

DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model from DeepSeek with 284B total parameters and 13B a

ctx 1.0M~0.0052ggeneral

DeepSeek V4 Pro

DeepSeek V4 Pro is a large-scale Mixture-of-Experts model from DeepSeek with 1.6T total parameters and 49B activated par

ctx 1.0M~0.020greasoning

MMeta12 моделей

Llama 3 70B Instruct

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B instruct-tuned version was o

ctx 8K~0.023ggeneral

Llama 3 8B Instruct

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was op

ctx 8K~0.0017ggeneral

Llama 3.1 70B Instruct

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is

ctx 131K~0.017ggeneral

Llama 3.1 8B Instruct

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is f

ctx 131K~0.0010ggeneral

Llama 3.2 11B Vision Instruct

multimodal

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and tex

ctx 131K~0.011gvision

Llama 3.2 1B Instruct

Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as s

ctx 131K~0.0018ggeneral

Llama 3.2 3B Instruct

Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language process

ctx 131K~0.0033greasoning

Llama 3.3 70B Instruct

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B

ctx 131K~0.0051ggeneral

Llama 4 Maverick

multimodal

Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-exper

ctx 1.0M~0.0082gvision

Llama 4 Scout

multimodal

Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion p

ctx 10M~0.0043gvision

Llama Guard 3 8B

Llama Guard 3 is a Llama-3.1-8B pretrained model, fine-tuned for content safety classification. Similar to previous vers

ctx 131K~0.019ggeneral

Llama Guard 4 12B

multimodal

Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content safety classification. Simi

ctx 164K~0.0077gvision

ZZ.AI12 моделей

GLM 4 32B

GLM 4 32B is a cost-effective foundation language model. It can efficiently perform complex tasks and has significantly

ctx 128K~0.0043ggeneral

GLM 4.5

GLM-4.5 is our latest flagship foundation model, purpose-built for agent-based applications. It leverages a Mixture-of-E

ctx 131K~0.032ggeneral

GLM 4.5 Air

GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applica

ctx 131K~0.0084ggeneral

GLM 4.5V

multimodal

GLM-4.5V is a vision-language foundation model for multimodal agent applications. Built on a Mixture-of-Experts (MoE) ar

ctx 66K~0.030gvision

GLM 4.6

Compared with GLM-4.5, this generation brings several key improvements: Longer context window: The context window has be

ctx 203K~0.024ggeneral

GLM 4.6V

multimodal

GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and long-context reasoning across i

ctx 131K~0.015greasoning

GLM 4.7

GLM-4.7 is Z.ai’s latest flagship model, featuring upgrades in two key areas: enhanced programming capabilities and more

ctx 203K~0.022greasoning

GLM 4.7 Flash

As a 30B-class SOTA model, GLM-4.7-Flash offers a new option that balances performance and efficiency. It is further opt

ctx 203K~0.0039ggeneral

GLM 5

GLM-5 is Z.ai’s flagship open-source foundation model engineered for complex systems design and long-horizon agent workf

ctx 203K~0.031ggeneral

GLM 5 Turbo

GLM-5 Turbo is a new model from Z.ai designed for fast inference and strong performance in agent-driven environments suc

ctx 203K~0.062ggeneral

GLM 5.1

GLM-5.1 delivers a major leap in coding capability, with particularly significant gains in handling long-horizon tasks.

ctx 203K—general

GLM 5V Turbo

multimodal

GLM-5V-Turbo is Z.ai’s first native multimodal agent foundation model, built for vision-based coding and agent-driven ta

ctx 203K~0.062gvision

AArcee7 моделей

Coder Large

Coder‑Large is a 32 B‑parameter offspring of Qwen 2.5‑Instruct that has been further trained on permissively‑licensed Gi

ctx 33K~0.023gcoding

Maestro Reasoning

Maestro Reasoning is Arcee's flagship analysis model: a 32 B‑parameter derivative of Qwen 2.5‑32 B tuned with DPO and ch

ctx 131K~0.048greasoning

Spotlight

multimodal

Spotlight is a 7‑billion‑parameter vision‑language model derived from Qwen 2.5‑VL and fine‑tuned by Arcee AI for tight i

ctx 131K~0.0077gvision

Trinity Large Preview

Trinity-Large-Preview is a frontier-scale open-weight language model from Arcee, built as a 400B-parameter sparse Mixtur

ctx 131K~0.0076ggeneral

Trinity Large Thinking

Trinity Large Thinking is a powerful open source reasoning model from the team at Arcee AI. It shows strong performance

ctx 262K~0.012greasoning

Trinity Mini

Trinity Mini is a 26B-parameter (3B active) sparse mixture-of-experts language model featuring 128 experts with 8 active

ctx 131K~0.0023greasoning

Virtuoso Large

Virtuoso‑Large is Arcee's top‑tier general‑purpose LLM at 72 B parameters, tuned to tackle cross‑domain reasoning, creat

ctx 131K~0.034greasoning

MMiniMax7 моделей

MiniMax-01

multimodal

MiniMax-01 is a combines MiniMax-Text-01 for text generation and MiniMax-VL-01 for image understanding. It has 456 billi

ctx 1.0M~0.012gvision

MiniMax M1

MiniMax-M1 is a large-scale, open-weight reasoning model designed for extended context and high-efficiency inference. It

ctx 1M~0.024greasoning

MiniMax M2

MiniMax-M2 is a compact, high-efficiency large language model optimized for end-to-end coding and agentic workflows. Wit

ctx 205K~0.014greasoning

MiniMax M2-her

MiniMax M2-her is a dialogue-first large language model built for immersive roleplay, character-driven chat, and express

ctx 66K~0.016ggeneral

MiniMax M2.1

MiniMax-M2.1 is a lightweight, state-of-the-art large language model optimized for coding, agentic workflows, and modern

ctx 205K~0.015ggeneral

MiniMax M2.5

MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex

ctx 205K~0.010ggeneral

MiniMax M2.7

MiniMax-M2.7 is a next-generation large language model designed for autonomous, real-world productivity and continuous i

ctx 205K~0.016ggeneral

BBaidu6 моделей

ERNIE 4.5 21B A3B

A sophisticated text-based Mixture-of-Experts (MoE) model featuring 21B total parameters with 3B activated per token, de

ctx 131K~0.0038ggeneral

ERNIE 4.5 21B A3B Thinking

ERNIE-4.5-21B-A3B-Thinking is Baidu's upgraded lightweight MoE model, refined to boost reasoning depth and quality for t

ctx 131K~0.0038greasoning

ERNIE 4.5 300B A47B

ERNIE-4.5-300B-A47B is a 300B parameter Mixture-of-Experts (MoE) language model developed by Baidu as part of the ERNIE

ctx 131K~0.015ggeneral

ERNIE 4.5 VL 28B A3B

multimodal

A powerful multimodal Mixture-of-Experts chat model featuring 28B total parameters with 3B activated per token, deliveri

ctx 131K~0.0076gvision

ERNIE 4.5 VL 424B A47B

multimodal

ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baidu’s ERNIE 4.5 series, featuring 424B tota

ctx 131K~0.021gvision

Qianfan-OCR-Fast

multimodal

Qianfan-OCR-Fast is a domain-specific multimodal large model purpose-built for OCR. By leveraging specialized OCR traini

ctx 66K~0.037gvision

AAmazon5 моделей

Nova 2 Lite

multimodal

Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that can process text, images, and videos t

ctx 1M~0.021greasoning

Nova Lite 1.0

multimodal

Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focused on fast processing of image, video, an

ctx 300K~0.0033gvision

Nova Micro 1.0

Amazon Nova Micro 1.0 is a text-only model that delivers the lowest latency responses in the Amazon Nova family of model

ctx 128K~0.0019ggeneral

Nova Premier 1.0

multimodal

Amazon Nova Premier is the most capable of Amazon’s multimodal models for complex reasoning tasks and for use as the bes

ctx 1M~0.146greasoning

Nova Pro 1.0

multimodal

Amazon Nova Pro 1.0 is a capable multimodal model from Amazon focused on providing a combination of accuracy, speed, and

ctx 300K~0.044gvision

MMoonshot5 моделей

Kimi K2 0711

Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion

ctx 131K~0.031ggeneral

Kimi K2 0905

Kimi K2 0905 is the September update of Kimi K2 0711. It is a large-scale Mixture-of-Experts (MoE) language model develo

ctx 262K~0.033ggeneral

Kimi K2 Thinking

Kimi K2 Thinking is Moonshot AI’s most advanced open reasoning model to date, extending the K2 series into agentic, long

ctx 262K~0.033greasoning

Kimi K2.5

multimodal

Kimi K2.5 is Moonshot AI's native multimodal model, delivering state-of-the-art visual coding capability and a self-dire

ctx 262K~0.023gvision

Kimi K2.6

multimodal

Kimi K2.6 is Moonshot AI's next-generation multimodal model, designed for long-horizon coding, coding-driven UI/UX gener

ctx 262K~0.042gvision

NNousresearch5 моделей

Hermes 2 Pro - Llama-3 8B

Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the Ope

ctx 8K~0.0060ggeneral

Hermes 3 405B Instruct

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, m

ctx 131K~0.043greasoning

Hermes 3 70B Instruct

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, m

ctx 131K~0.013greasoning

Hermes 4 405B

Hermes 4 is a large-scale reasoning model built on Meta-Llama-3.1-405B and released by Nous Research. It introduces a hy

ctx 131K~0.051greasoning

Hermes 4 70B

Hermes 4 70B is a hybrid reasoning model from Nous Research, built on Meta-Llama-3.1-70B. It introduces the same hybrid

ctx 131K~0.0066greasoning

PPerplexity5 моделей

Sonar

multimodal

Sonar is lightweight, affordable, fast, and simple to use — now featuring citations and the ability to customize sources

ctx 127K~0.043gvision

Sonar Deep Research

Sonar Deep Research is a research-focused model designed for multi-step retrieval, synthesis, and reasoning across compl

ctx 128K~0.109greasoning

Sonar Pro

multimodal

Note: Sonar Pro pricing includes Perplexity search pricing. See details here For enterprises seeking more advanced capab

ctx 200K~0.175greasoning

Sonar Pro Search

multimodal

Sonar Pro's new Pro Search mode is Perplexity's most advanced agentic search system. It is designed for deeper reasoning

ctx 200K~0.175greasoning

Sonar Reasoning Pro

multimodal

Note: Sonar Pro pricing includes Perplexity search pricing. See details here Sonar Reasoning Pro is a premier reasoning

ctx 128K~0.109greasoning

SSao10K5 моделей

Llama 3 Euryale 70B v2.1

Euryale 70B v2.1 is a model focused on creative roleplay from Sao10k. - Better prompt adherence. - Better anatomy / spat

ctx 8K~0.063greasoning

Llama 3 8B Lunaris

Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It's a strategic merge of multiple models,

ctx 8K~0.0018greasoning

Llama 3.1 70B Hanami x1

This is Sao10K's experiment over Euryale v2.2

ctx 16K~0.129greasoning

Llama 3.1 Euryale 70B v2.2

Euryale L3.1 70B v2.2 is a model focused on creative roleplay from Sao10k. It is the successor of Euryale L3 70B v2.1

ctx 131K~0.036greasoning

Llama 3.3 Euryale 70B

Euryale L3.3 70B is a model focused on creative roleplay from Sao10k. It is the successor of Euryale L3 70B v2.2

ctx 131K~0.028greasoning

XXiaomi5 моделей

MiMo-V2-Flash

MiMo-V2-Flash is an open-source foundation language model developed by Xiaomi. It is a Mixture-of-Experts model with 309

ctx 262K~0.0051ggeneral

MiMo-V2-Omni

multimodal

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified arch

ctx 262K~0.023gvision

MiMo-V2-Pro

MiMo-V2-Pro is Xiaomi's flagship foundation model, featuring over 1T total parameters and a 1M context length, deeply op

ctx 1.0M~0.051ggeneral

MiMo-V2.5

multimodal

MiMo-V2.5 is a native omnimodal model by Xiaomi. It delivers Pro-level agentic performance at roughly half the inference

ctx 1.0M~0.023gvision

MiMo-V2.5-Pro

MiMo-V2.5-Pro is Xiaomi’s flagship model, delivering strong performance in general agentic capabilities, complex softwar

ctx 1.0M~0.051ggeneral

AAion4 модели

Aion-1.0

Aion-1.0 is a multi-model system designed for high performance across various tasks, including reasoning and coding. It

ctx 131K~0.187greasoning

Aion-1.0-Mini

Aion-1.0-Mini 32B parameter model is a distilled version of the DeepSeek-R1 model, designed for strong performance in re

ctx 131K~0.033greasoning

Aion-2.0

Aion-2.0 is a variant of DeepSeek V3.2 optimized for immersive roleplaying and storytelling. It is particularly strong a

ctx 131K~0.037ggeneral

Aion-RP 1.0 (8B)

Aion-RP-Llama-3.1-8B ranks the highest in the character evaluation portion of the RPBench-Auto benchmark, a roleplaying-

ctx 33K~0.037ggeneral

BBytedance Seed4 модели

Seed 1.6

multimodal

Seed 1.6 is a general-purpose model released by the ByteDance Seed team. It incorporates multimodal capabilities and ada

ctx 262K~0.018greasoning

Seed 1.6 Flash

multimodal

Seed 1.6 Flash is an ultra-fast multimodal deep thinking model by ByteDance Seed, supporting both text and visual unders

ctx 262K~0.0041greasoning

Seed-2.0-Lite

multimodal

Seed-2.0-Lite is a versatile, cost‑efficient enterprise workhorse that delivers strong multimodal and agent capabilities

ctx 262K~0.018gvision

Seed-2.0-Mini

multimodal

Seed-2.0-mini targets latency-sensitive, high-concurrency, and cost-sensitive scenarios, emphasizing fast response and f

ctx 262K~0.0055greasoning

CCohere4 модели

Command A

Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance acr

ctx 256K~0.137ggeneral

Command R (08-2024)

command-r-08-2024 is an update of the Command R with improved performance for multilingual retrieval-augmented generatio

ctx 128K~0.0082greasoning

Command R+ (08-2024)

command-r-plus-08-2024 is an update of the Command R+ with roughly 50% higher throughput and 25% lower latencies as comp

ctx 128K~0.137ggeneral

Command R7B (12-2024)

Command R7B (12-2024) is a small, fast update of the Command R+ model, delivered in December 2024. It excels at RAG, too

ctx 128K~0.0020greasoning

NNVIDIA4 модели

Llama 3.3 Nemotron Super 49B V1.5

Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3

ctx 131K~0.0055greasoning

Nemotron 3 Nano 30B A3B

NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers

ctx 262K~0.0027greasoning

Nemotron 3 Super

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute ef

ctx 1M~0.0053greasoning

Nemotron Nano 9B V2

NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified mod

ctx 131K~0.0022greasoning

TThedrummer4 модели

Cydonia 24B V4.1

Uncensored and creative writing model based on Mistral Small 3.2 24B with good recall, prompt adherence, and intelligenc

ctx 131K~0.014ggeneral

Rocinante 12B

Rocinante 12B is designed for engaging storytelling and rich prose. Early testers have reported: - Expanded vocabulary w

ctx 33K~0.0083ggeneral

Skyfall 36B V2

Skyfall 36B v2 is an enhanced iteration of Mistral Small 2501, specifically fine-tuned for improved creativity, nuanced

ctx 33K~0.025ggeneral

UnslopNemo 12B

UnslopNemo v4.1 is the latest addition from the creator of Rocinante, designed for adventure writing and role-play scena

ctx 33K~0.017ggeneral

~~Anthropic3 модели

Anthropic Claude Haiku Latest

multimodal

This model always redirects to the latest model in the Anthropic Claude Haiku family

ctx 200K~0.059gvision

Claude Opus Latest

multimodal

This model always redirects to the latest model in the Claude Opus family

ctx 1M~0.292gflagship

Anthropic Claude Sonnet Latest

multimodal

This model always redirects to the latest model in the Anthropic Claude Sonnet family

ctx 1M~0.175gvision

IInclusionai3 модели

Ling-2.6-1T

Ling-2.6-1T is an instant (instruct) model from inclusionAI and the company’s trillion-parameter flagship, designed for

ctx 262K~0.021ggeneral

Ling-2.6-flash

Ling-2.6-flash is an instant (instruct) model from inclusionAI with 104B total parameters and 7.4B active parameters, de

ctx 262K~0.0005ggeneral

Ring-2.6-1T

Ring-2.6-1T is a 1T-parameter-scale thinking model with 63B active parameters, built for real-world agent workflows that

ctx 262K~0.0054greasoning

MMicrosoft3 модели

Phi 4

Microsoft Research Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situation

ctx 16K~0.0031greasoning

Phi 4 Mini Instruct

Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered publicly available websites - wit

ctx 131K~0.0045greasoning

WizardLM-2 8x22B

WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared t

ctx 66K~0.027ggeneral

XxAI3 модели

Grok 4.20

multimodal

Grok 4.20 is a reasoning model from xAI with industry-leading speed and agentic tool calling capabilities. It combines t

ctx 2M~0.059gflagship

Grok 4.20 Multi-Agent

multimodal

Grok 4.20 Multi-Agent is a variant of xAI’s Grok 4.20 designed for collaborative, agent-based workflows. Multiple agents

ctx 2M~0.101gflagship

Grok 4.3

multimodal

Grok 4.3 is a reasoning model from xAI. It accepts text and image inputs with text output, and is suited for agentic wor

ctx 1M~0.059greasoning

~~Google2 модели

Google Gemini Flash Latest

multimodal

This model always redirects to the latest model in the Google Gemini Flash family

ctx 1.0M~0.094gvision

Google Gemini Pro Latest

multimodal

This model always redirects to the latest model in the Google Gemini Pro family

ctx 1.0M~0.125gvision

~~Openai2 модели

OpenAI GPT Latest

multimodal

This model always redirects to the latest model in the OpenAI GPT family

ctx 1.1M~0.312gvision

OpenAI GPT Mini Latest

multimodal

This model always redirects to the latest model in the OpenAI GPT Mini family

ctx 400K~0.047gvision

IIbm Granite2 модели

Granite 4.0 Micro

Granite-4.0-H-Micro is a 3B parameter from the Granite 4 family of models. These models are the latest in a series of mo

ctx 131K~0.0011ggeneral

Granite 4.1 8B

Granite 4.1 8B is a dense, decoder-only 8-billion-parameter language model from IBM, part of the Granite 4.1 family. It

ctx 131K~0.0023ggeneral

IInflection2 модели

Inflection 3 Pi

Inflection 3 Pi powers Inflection's Pi chatbot, including backstory, emotional intelligence, productivity, and safety. I

ctx 8K~0.137ggeneral

Inflection 3 Productivity

Inflection 3 Productivity is optimized for following instructions. It is better for tasks requiring JSON output or preci

ctx 8K~0.137ggeneral

MMorph2 модели

Morph V3 Fast

Morph's fastest apply model for code edits. ~10,500 tokens/sec with 96% accuracy for rapid code transformations. The mod

ctx 82K~0.036ggeneral

Morph V3 Large

Morph's high-accuracy apply model for complex code edits. ~4,500 tokens/sec with 98% accuracy for precise code transform

ctx 262K~0.043ggeneral

RRekaai2 модели

Reka Edge

multimodal

Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image/video+text inputs and generat

ctx 16K~0.0043gvision

Reka Flash 3

Reka Flash 3 is a general-purpose, instruction-tuned large language model with 21 billion parameters, developed by Reka.

ctx 66K~0.0047ggeneral

RRelace2 модели

Relace Apply 3

Relace Apply 3 is a specialized code-patching LLM that merges AI-suggested edits straight into your source files. It can

ctx 256K~0.038ggeneral

Relace Search

The relace-search model uses 4-12 `view_file` and `grep` tools in parallel to explore a codebase and return relevant fil

ctx 256K~0.051ggeneral

TTencent2 модели

Hunyuan A13B Instruct

Hunyuan-A13B is a 13B active parameter Mixture-of-Experts (MoE) language model developed by Tencent, with a total parame

ctx 131K~0.0077greasoning

Hy3 preview

Hy3 preview is a high-efficiency Mixture-of-Experts model from Tencent designed for agentic workflows and production use

ctx 262K~0.0036greasoning

~~Moonshotai1 модель

MoonshotAI Kimi Latest

multimodal

This model always redirects to the latest model in the MoonshotAI Kimi family

ctx 262K~0.042gvision

AAi21 модель

Olmo 3 32B Think

Olmo 3 32B Think is a large-scale, 32-billion-parameter model purpose-built for deep reasoning, complex logic chains and

ctx 66K~0.0078greasoning

AAI211 модель

Jamba Large 1.7

Jamba Large 1.7 is the latest model in the Jamba open family, offering improvements in grounding, instruction-following,

ctx 256K~0.109ggeneral

AAlfredpros1 модель

CodeLLaMa 7B Instruct Solidity

A finetuned 7 billion parameters Code LLaMA - Instruct model to generate Solidity smart contract using 4-bit QLoRA finet

ctx 4K~0.036gcoding

AAlibaba1 модель

Tongyi DeepResearch 30B A3B

Tongyi DeepResearch is an agentic large language model developed by Tongyi Lab, with 30 billion total parameters activat

ctx 131K~0.0053ggeneral

AAnthracite Org1 модель

Magnum v4 72B

This is a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet( and Opus

ctx 33K~0.137ggeneral

BBlack Forest Labs1 модель

FLUX.2 Flex

image

FLUX.2 Flex image generation (Black Forest Labs). Used by the arbi subproduct for ad creatives.

—general

BByteDance1 модель

UI-TARS 7B

multimodal

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, we

ctx 128K~0.0047gvision

DDeepcogito1 модель

Cogito v2.1 671B

Cogito v2.1 671B MoE represents one of the strongest open models globally, matching performance of frontier closed and o

ctx 128K~0.054ggeneral

EEssentialai1 модель

Rnj 1 Instruct

Rnj-1 is an 8B-parameter, dense, open-weight model family developed by Essential AI and trained from scratch with a focu

ctx 33K~0.0064greasoning

GGryphe1 модель

MythoMax 13B

One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge

ctx 4K~0.0026ggeneral

IInception1 модель

Mercury 2

Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLLM). Instead of generating tokens

ctx 128K~0.013greasoning

KKwaipilot1 модель

KAT-Coder-Pro V2

KAT-Coder-Pro V2 is the latest high-performance model in KwaiKAT’s KAT-Coder series, designed for complex enterprise-gra

ctx 256K~0.016gcoding

LLiquid1 модель

LFM2-24B-A2B

LFM2-24B-A2B is the largest model in the LFM2 family of hybrid architectures designed for efficient on-device deployment

ctx 128K~0.0016ggeneral

MMancer1 модель

Weaver (alpha)

An attempt to recreate Claude-style verbosity, but don't expect the same level of coherence or memory. Meant for use in

ctx 8K~0.033ggeneral

NNex Agi1 модель

DeepSeek V3.1 Nex N1

DeepSeek V3.1 Nex-N1 is the flagship release of the Nex-N1 series — a post-trained model designed to highlight agent aut

ctx 131K~0.0072ggeneral

PPerceptron1 модель

Perceptron Mk1

multimodal

Perceptron Mk1 (Mark One) is Perceptron's highest-quality vision-language model for video and embodied reasoning.** It a

ctx 33K~0.012greasoning

PPrime Intellect1 модель

INTELLECT-3

INTELLECT-3 is a 106B-parameter Mixture-of-Experts model (12B active) post-trained from GLM-4.5-Air-Base using supervise

ctx 131K~0.012ggeneral

SStepfun1 модель

Step 3.5 Flash

Step 3.5 Flash is StepFun's most capable open-source foundation model. Built on a sparse Mixture of Experts (MoE) archit

ctx 262K~0.0051ggeneral

SSwitchpoint1 модель

Switchpoint Router

Switchpoint AI's router instantly analyzes your request and directs it to the optimal AI from an ever-evolving library.

ctx 131K~0.046ggeneral

UUndi951 модель

ReMM SLERP 13B

A recreation trial of the original MythoMax-L2-B13 but with updated models. #merge

ctx 6K~0.020ggeneral

UUpstage1 модель

Solar Pro 3

Solar Pro 3 is Upstage's powerful Mixture-of-Experts (MoE) language model. With 102B total parameters and 12B active par

ctx 128K~0.0082ggeneral

WWriter1 модель

Palmyra X5

Palmyra X5 is Writer's most advanced model, purpose-built for building and scaling AI agents across the enterprise. It d

ctx 1.0M~0.047ggeneral