Status: Systems Operational

An open marketplace for AI inference endpoints.

ForgeRouter gives developers one OpenAI-compatible API to route requests across ForgeSoftware-hosted models, community providers, and future verified European providers — with transparent pricing, visible infrastructure, and explicit control over where inference runs.

Start Routing arrow_forward Become a Provider Read the docs

api OpenAI-Compatible API

group Community providers

verified Verified tiers

euro Euro billing

visibility Provider-level transparency

Choose the provider tier that matches your workload.

Community

• Lower-cost community hosts
• Explicit opt-in only
• Hardware/location declared
• Reliability measured
• Best effort

Observed

• Uptime/latency tracked
• Error rate monitored
• Better routing confidence
• Transparent limitations

RECOMMENDED

Platform-operated

• Operated by ForgeSoftware
• Clear infrastructure
• Example: Qwen3.6-27B on 2x RTX A5000 in France

Verified

• Identity checked
• Hosting/policy documented
• Stronger trust signals
• Future enterprise policies

Small providers can serve models. They just can't easily sell them.

Many people can run vLLM, llama.cpp or SGLang. Fewer can build billing, token accounting, dashboards, monitoring, and trust signals.

terminal

For Developers

close Managing dozens of API keys and different SDKs.
close Hard to compare real providers and actual hardware used.
close Opaque routing—not knowing exactly where user data is processed.

dns

For Providers

close Struggling to gain visibility and get listed on aggregators.
close Complex billing, invoicing, and cross-border payment collection.
close Idle GPU capacity that is difficult to monetise effectively.

Know the deployment behind the model.

ForgeRouter does not hide every backend behind a single model name. Users can inspect the actual deployment: provider, country, runtime, hardware, quantization, context, pricing, uptime and logging policy.

Model: forge-1/Qwen3.6-27B

Provider: ForgeSoftware

Tier: Platform-operated beta

Location: France

Runtime: vLLM

Hardware: 2x RTX A5000

Quantization: AWQ INT4 / Marlin

Context: up to 131k

Policy: No prompt logging

Status: Best effort

Route by price, trust and performance.

Exact deployment

model="forge-1/Qwen3.6-27B"

Balanced

model="qwen-27b",
strategy="balanced"

Economy community

model="qwen-27b",
strategy="cost",
allow_community=true

ForgeNode: from local endpoint to marketplace deployment.

ForgeNode is the lightweight Python agent providers install next to their inference runtime.

✓ Detects GPU & VRAM
✓ Detects Runtime (vLLM, SGLang, etc.)
✓ Detects Model & Quantization
✓ Reports Health & Metrics

$ pipx install forgenode
$ forgenode enroll --token <YOUR_TOKEN>
$ forgenode detect
$ forgenode publish

Transparent marketplace pricing.

Provider inference price + ForgeRouter platform fee = Effective customer cost.

euroEuro billing

account_balance_walletPrepaid wallet

visibilityNo hidden markup

Community providers unlock the long tail of open inference.

Better prices

Access unused capacity at lower rates.

More model variety

Find niche fine-tunes and specialized models.

Transparent risk

Opt-in only, with clear tracking of reliability.

Upgrade path

Good community nodes can become Verified.

Private beta

Join the ForgeRouter beta.

Request early access to the OpenAI-compatible gateway, provider marketplace previews, and routing policy experiments.

Create account Become a provider