The Home for Open Source Model Inference
Run any open source model — use our platform models instantly, deploy your own on dedicated GPUs, or build and fine-tune custom models. One API, multi-region, production-ready.
from openai import OpenAIclient = OpenAI( base_url="https://api.ecohash.com/v1", api_key="eco_your_api_key",)response = client.chat.completions.create( model="llama-3.1-8b-instruct", messages=[{"role": "user", "content": "Hello!"}],)print(response.choices[0].message.content)Drop-in OpenAI SDK compatible. Switch with one line.
Everything you need to build with AI
From instant API access to dedicated GPU infrastructure
What you can build
One API for every AI workload — text, vision, image, speech, video
Built for production AI
Enterprise-grade infrastructure with developer-friendly APIs
Intelligent Workload-Aware Routing
Every request automatically routes to the fastest available GPU across regions. Real-time load balancing ensures optimal latency and throughput.
Multi-Region Failover
Inference endpoints deploy across multiple regions with automatic DNS failover. If one region goes down, traffic seamlessly routes to the next.
OpenAI-Compatible API
Drop-in replacement for the OpenAI SDK. Same endpoints, same request format. Switch providers with one line of code.
Automatic Fallback & Retry
Built-in retry with intelligent fallback across GPU clusters. Failed requests automatically route to healthy alternatives.
Adaptive GPU Scheduling
Multi-tier priority system ensures your inference endpoints stay running. Autoscaling adjusts capacity based on live request volume.
Built for Multimodal AI & Rendering
Professional server GPUs with up to 96GB memory — ideal for multimodal AI, real-time rendering, ray tracing, and single-GPU model fine-tuning.
Model Marketplace
Production-ready platform models and community-published models
Powered by EcoLink
EcoLink is our end-to-end inference platform — unifying GPU cloud, model serving, and intelligent workload distribution across distributed physical infrastructure. It keeps inference fast and always-on so your AI pipelines run without you managing the orchestration underneath.
Workspace with GPU
On-demand GPU environments for building, fine-tuning, and experimentation. Full root access with shared filesystems and up to 96GB memory.
Model Fine-Tuning
Fine-tune language and multimodal models on high-memory GPUs.
Support LoRA, QLoRA, and full fine-tuning on up to 96GB VRAM.
Rendering & Ray Tracing
Render scenes and assets with professional-grade GPU instances.
Run Blender, Unreal Engine, or custom rendering pipelines.
AI Research & Development
Experiment freely with root access, terminal, and shared storage.
Use your preferred framework with JupyterLab-ready environments.
Available across multiple global regions with low-latency access.
Simple, transparent pricing
Pay only for what you use. $1 free credit to get started.
View full pricingInference API
Pay per token across all platform models. No minimum commitment.
- Text, vision, image, speech, video
- OpenAI-compatible
- Multi-region routing