An open marketplace for AI inference endpoints.
ForgeRouter gives developers one OpenAI-compatible API to route requests across ForgeSoftware-hosted models, community providers, and future verified European providers — with transparent pricing, visible infrastructure, and explicit control over where inference runs.
Choose the provider tier that matches your workload.
Community
- • Lower-cost community hosts
- • Explicit opt-in only
- • Hardware/location declared
- • Reliability measured
- • Best effort
Observed
- • Uptime/latency tracked
- • Error rate monitored
- • Better routing confidence
- • Transparent limitations
Platform-operated
- • Operated by ForgeSoftware
- • Clear infrastructure
- • Example: Qwen3.6-27B on 2x RTX A5000 in France
Verified
- • Identity checked
- • Hosting/policy documented
- • Stronger trust signals
- • Future enterprise policies
Small providers can serve models. They just can't easily sell them.
Many people can run vLLM, llama.cpp or SGLang. Fewer can build billing, token accounting, dashboards, monitoring, and trust signals.
For Developers
- close Managing dozens of API keys and different SDKs.
- close Hard to compare real providers and actual hardware used.
- close Opaque routing—not knowing exactly where user data is processed.
For Providers
- close Struggling to gain visibility and get listed on aggregators.
- close Complex billing, invoicing, and cross-border payment collection.
- close Idle GPU capacity that is difficult to monetise effectively.
Know the deployment behind the model.
ForgeRouter does not hide every backend behind a single model name. Users can inspect the actual deployment: provider, country, runtime, hardware, quantization, context, pricing, uptime and logging policy.
Route by price, trust and performance.
Exact deployment
model="forge-1/Qwen3.6-27B"
Balanced
model="qwen-27b",
strategy="balanced"
Economy community
model="qwen-27b",
strategy="cost",
allow_community=true
ForgeNode: from local endpoint to marketplace deployment.
ForgeNode is the lightweight Python agent providers install next to their inference runtime.
- ✓ Detects GPU & VRAM
- ✓ Detects Runtime (vLLM, SGLang, etc.)
- ✓ Detects Model & Quantization
- ✓ Reports Health & Metrics
$ pipx install forgenode
$ forgenode enroll --token <YOUR_TOKEN>
$ forgenode detect
$ forgenode publish
Transparent marketplace pricing.
Provider inference price + ForgeRouter platform fee = Effective customer cost.
Community providers unlock the long tail of open inference.
Better prices
Access unused capacity at lower rates.
More model variety
Find niche fine-tunes and specialized models.
Transparent risk
Opt-in only, with clear tracking of reliability.
Upgrade path
Good community nodes can become Verified.
Private beta
Join the ForgeRouter beta.
Request early access to the OpenAI-compatible gateway, provider marketplace previews, and routing policy experiments.