Skip to main content
$1 free credit on signup

The Home for Open Source Model Inference

Run any open source model — use our platform models instantly, deploy your own on dedicated GPUs, or build and fine-tune custom models. One API, multi-region, production-ready.

quickstart
from openai import OpenAIclient = OpenAI(    base_url="https://api.ecohash.com/v1",    api_key="eco_your_api_key",)response = client.chat.completions.create(    model="llama-3.1-8b-instruct",    messages=[{"role": "user", "content": "Hello!"}],)print(response.choices[0].message.content)

Drop-in OpenAI SDK compatible. Switch with one line.

Built for production AI

Enterprise-grade infrastructure with developer-friendly APIs

Intelligent Workload-Aware Routing

Every request automatically routes to the fastest available GPU across regions. Real-time load balancing ensures optimal latency and throughput.

Multi-Region Failover

Inference endpoints deploy across multiple regions with automatic DNS failover. If one region goes down, traffic seamlessly routes to the next.

OpenAI-Compatible API

Drop-in replacement for the OpenAI SDK. Same endpoints, same request format. Switch providers with one line of code.

Automatic Fallback & Retry

Built-in retry with intelligent fallback across GPU clusters. Failed requests automatically route to healthy alternatives.

Adaptive GPU Scheduling

Multi-tier priority system ensures your inference endpoints stay running. Autoscaling adjusts capacity based on live request volume.

Built for Multimodal AI & Rendering

Professional server GPUs with up to 96GB memory — ideal for multimodal AI, real-time rendering, ray tracing, and single-GPU model fine-tuning.

Model Marketplace

Production-ready platform models and community-published models

Workspace with GPU

On-demand GPU environments for building, fine-tuning, and experimentation. Full root access with shared filesystems and up to 96GB memory.

Model Fine-Tuning

Fine-tune language and multimodal models on high-memory GPUs.

Support LoRA, QLoRA, and full fine-tuning on up to 96GB VRAM.

Rendering & Ray Tracing

Render scenes and assets with professional-grade GPU instances.

Run Blender, Unreal Engine, or custom rendering pipelines.

AI Research & Development

Experiment freely with root access, terminal, and shared storage.

Use your preferred framework with JupyterLab-ready environments.

Available across multiple global regions with low-latency access.

Americas

Simple, transparent pricing

Pay only for what you use. $1 free credit to get started.

View full pricing

Inference API

Pay per token across all platform models. No minimum commitment.

From $0.03/1M tokens
  • Text, vision, image, speech, video
  • OpenAI-compatible
  • Multi-region routing
Popular

Dedicated Inference

Your own model endpoint. Same API, same SDK - just change the model name in your code.

Per GPU-hour · varies by GPU
  • Multi-region health failover
  • Any HuggingFace model
  • Unified API - one endpoint for all

Building Models with GPUs

On-demand GPU environments for fine-tuning, training, and building custom models.

Per GPU-hour · varies by GPU
  • Web terminal access
  • Shared filesystems
  • Rendering & ray tracing