Documentation
Get started with RigRun
Everything you need to deploy, configure, and integrate.
Quick Start
Up and running in three steps
System Requirements
NVIDIA GPU with 24GB+ VRAM, 64GB system RAM, Ubuntu 22.04+, RHEL 8+ / Rocky Linux 8+, or Windows 11.
RTX PRO 6000 Blackwell (96GB), 128GB RAM, Ubuntu 24.04.
Installation
Single static binary. Download, configure config.toml, run. No Docker required for base deployment. Docker is used optionally for the code-interpreter sandbox.
Companion apps available: RigRun Desktop (Electron, pre-release) and RigRun Mobile (Flutter, not yet released).
$ ./rigrun serve
First Query
OpenAI-compatible API at /v1/chat/completions.
Drop-in replacement — change your base URL, keep your existing code. Supports streaming (SSE), tool calling, and structured output.
Topology
What's running on the box
A full RigRun deployment is several cooperating services on different ports. The Go server is the front door; the rest are optional but recommended. Auxiliary models run on CPU to keep VRAM free for the main inference engine.
:8787 :3100 :8096 :8081 :8082 :8083 :8084 Memory layer
Rolling Memory v4 — infinite context proxy
A drop-in OpenAI/Anthropic-compatible reverse proxy on port 8096 that gives any LLM effectively unlimited context via per-session disk-backed verbatim, summary, and embedding tiers. Latency stays flat with depth because retrieval keeps each forward pass roughly the same size regardless of total context length.
Verified 2026-04-11: needle-in-haystack 100% pass through 10M chars (8.9–9.1s), 5/5 fact consistency, 10/10 multi-turn retrieval, 17/17 smoke tests passed. Includes per-CWD session isolation, dashboard at /memory/ui, Prometheus metrics, and a built-in /search endpoint backed by SearxNG.
Reference
API Reference
Endpoints listen on different ports per service. Authentication via Authorization: Bearer header. RigRun has ~260 routes total — selected groups shown below.
RigRun
http://localhost:8787 /v1/chat/completions Chat inference (OpenAI-compatible) /v1/messages Chat inference (Anthropic-compatible) /v1/models List available models /v1/audio/transcribe Audio transcription /v1/tools/{name}/execute Tool execution with approval handshake /v1/memory/search Semantic memory search /v1/research/sessions Agentic research session (4-phase A-RAG) /metrics Prometheus-format metrics Pyros Safety Engine
http://localhost:3100 /v1/health Engine health + backend probe /v1/layers Pillar status and per-layer state /v1/stats Engine statistics, vitals, throughput /v1/signals Active signal blackboard (gate signals) /v1/breakers Circuit breaker states /metrics Prometheus-format metrics Rolling Memory v4
http://localhost:8096 /v1/chat/completions OpenAI-compatible chat with infinite context /v1/messages Anthropic-compatible chat with infinite context /memory/session/{id}/summaries Per-session summary tier /memory/session/{id}/export Export full session for training/audit /memory/session/{id} Wipe a session (cache + disk) /memory/ui Live dashboard with semantic search /search Built-in web search (SearxNG-backed) Example Request
curl http://localhost:8787/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $RIGRUN_API_KEY" \
-d '{
"model": "default",
"messages": [{"role": "user", "content": "Hello"}],
"stream": true
}' Setup
Configuration
All configuration lives in a single config.toml file. Sensible defaults ship out of the box — most deployments only need to set the local backend URL and model name.
[routing] Controls how requests are routed between local models and optional cloud fallback. Set offline_mode = true to enforce zero external traffic. paranoid_mode enables full request/response auditing.
[local] Points to your local inference backend. Supports Ollama, llama.cpp (OpenAI-compatible server), and vLLM. The model name here is what the router resolves when a request arrives.
[cloud] Optional cloud fallback for when local inference is unavailable or the request exceeds local model capability. Disabled by default. Only activates if explicitly configured and offline_mode is false.
[security] Set the baseline classification level (UNCLASSIFIED, CUI, SECRET). When audit_enabled is true, every request and response is logged to an append-only audit trail. MFA gates the desktop app and API key management.
[training] Controls the self-improving overnight pipeline. Schedule accepts cron syntax. 11 training methods supported (DPO, SimPO, ORPO, KTO, CPO, GRPO, IPO, SPPO, RPO, AERO, iterative DPO-VP) plus ZO2 Ultimate (zeroth-order optimization with 12 advanced techniques: sparse MeZO, HiZOO Hessian, curriculum, cosine epsilon annealing, cross-night EMA, etc.). Pairs are dual-sourced from rolling-memory and legacy sessions, scored by a Skywork reward model (runs locally, no external service dependency) + a sandboxed CodeRL+ execution scorer + the 122B itself as judge.
Internals
Pyros pipeline architecture
Pyros is the standalone safety engine that runs alongside RigRun (or any other LLM server) on port 3100. Every request flows through 7 pillars in middleware order — first listed is outermost wrapper. The pipeline is variable-width: easy queries skip expensive pillars based on Oracle's gate signals.
Pyros 7-pillar pipeline
Note: Pyros runs as a separate process. RigRun has its own 11-layer Go middleware stack (CORS → Auth → SessionTimeout → JWT → Trace → Metrics → RateLimit → Logging → MaxBody → SecurityHeaders → Recovery) that runs before the request is forwarded to Pyros. The two engines are independent — either can run without the other.
Extensibility
Creating Custom Agents
Agents are created through the Agent Factory — describe what you need in plain English, and the 14-step pipeline handles design, testing, and deployment.
System prompt synthesis
Multi-pass generation with independent critique. A second model reviews and hardens the prompt before deployment.
Tool selection from typed registry
The factory auto-selects from a typed tool registry based on the agent's domain requirements. Each tool has input/output schemas and usage examples.
Knowledge base assembly
Vector store with semantic chunking. Documents are ingested, chunked by semantic boundaries (not token count), embedded, and indexed in ChromaDB.
Adversarial stress testing
Five probe types: prompt injection, hallucination induction, scope violation, confidentiality extraction, and output format corruption.
Quality gate
Human approval required before any agent goes live. Automated scoring checks 100% happy-path pass rate and 50% edge-case pass rate minimum.
Help
Support
Founding Access
Direct line to the builder. Three founding seats, hand-selected. Roadmap input, locked-in pricing when paid tiers open, and your conversations preserved for your own eventual local copy.
License Customers
Pricing finalizes after the Founding Access program closes. Email support included with every RigRun license. Configuration assistance and upgrade guidance.
Enterprise
Dedicated technical support. On-premise installation assistance. Custom SLA available. Multi-node and classified environment design experience.