Skip to main content

Inference API

One API for every model type

Run open-source models instantly through a single OpenAI-compatible endpoint — text, vision, embeddings, reranking, image, speech, and video. No infrastructure to manage, just an API key and a wallet.

Why developers choose it

Built to drop into your stack

Keep the SDK and tools you already use. Point the base URL at EcoHash and start sending requests.

OpenAI-compatible endpoints

A drop-in replacement for the OpenAI SDK — the same request and response shapes. Switch providers with one line of code.

Multi-model gateway

Reach text, vision, image, speech, and video models through a single API key and one base URL.

Workload-aware routing

Every request is routed to the fastest healthy GPU across regions in real time, for low latency and steady throughput.

Pay per token

Usage-based pricing with no infrastructure to manage and no idle GPU costs to carry.

Every modality

One endpoint, every model type

Text, vision, embeddings, reranking, images, audio, and video — all behind the same authentication and billing.

Chat & reasoning

Open LLMs like Llama 3.1, Qwen2.5, and Gemma for multi-turn chat with long context.

Vision

Multimodal understanding — send images alongside text in the standard messages format.

Embeddings

Vectorize text for semantic search, RAG, and clustering via /v1/embeddings.

Reranking

Re-score retrieved candidates with a cross-encoder reranker for higher precision.

Image generation

Fast text-to-image with FLUX.1, priced per image.

Speech-to-text

Transcribe audio with Whisper, billed per minute.

Text-to-speech

Low-latency voice synthesis with Kokoro.

Video generation

Asynchronous text-to-video — submit a job and poll for the result.

Quickstart

Switch with one line

Use the official OpenAI client — only the base URL changes.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.ecohash.com/v1",
    api_key="eco_your_api_key",
)

response = client.chat.completions.create(
    model="llama-3.1-8b-instruct",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Video generation

Asynchronous by design

Video runs as a background job: submit a prompt, then poll for the finished result. You're billed per generation, not per second of compute.

# 1. Submit a generation job
POST /v1/video/generations
{ "model": "wan21-t2v-1-3b", "prompt": "a calm ocean at sunrise" }
# -> { "id": "vid_123", "status": "queued" }

# 2. Poll until the job is ready
GET /v1/video/jobs/vid_123
# -> { "status": "succeeded", "url": "https://..." }

Start building in minutes

Create an API key, drop in your base URL, and ship. Pay only for what you use.