What you get
Endpoints built for production
Dedicated capacity with the resilience and flexibility real workloads need.
Same unified API
Call your instance through the same OpenAI-compatible API as platform models — just address it with the model ID served_name:instance_id. No new base URL.
HuggingFace or custom containers
Deploy a managed vLLM server from any HuggingFace model, or bring your own container image and start command.
Multi-region routing
Run across regions and let the unified API route each request to a healthy one. As soon as a region is up, your endpoint is usable.
Multiple LoRA adapters
Serve many fine-tuned adapters on one instance and address each by its own alias (e.g. ft10:1) — no restart to add or swap them.
Sandboxed & isolated
Every instance runs in a security-hardened, network-isolated sandbox.
Adaptive scaling
Autoscaling adjusts capacity to live request volume. Pay per GPU-hour.
How you call it
One API, addressed by model ID
Your instance is reachable through the same endpoint as platform models. Set the model field to served_name:instance_id and keep your existing SDK.
from openai import OpenAI
client = OpenAI(
base_url="https://api.ecohash.com/v1",
api_key="eco_your_api_key",
)
# Address your instance with served_name:instance_id
response = client.chat.completions.create(
model="qwen3.5-35b-a3b:144",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)At a glance
The essentials
- Base URL
- api.ecohash.com/v1
- Model ID
- served_name:instance_id
- Billing
- Per GPU-hour