Documentation

Get started with RigRun

Everything you need to deploy, configure, and integrate.

Quick Start

Up and running in three steps

01

System Requirements

Minimum (small models)

NVIDIA GPU with 24GB+ VRAM, 64GB system RAM, Ubuntu 22.04+, RHEL 8+ / Rocky Linux 8+, or Windows 11.

Recommended (122B model + KV compression)

RTX PRO 6000 Blackwell (96GB), 128GB RAM, Ubuntu 24.04.

vLLM llama.cpp Ollama
02

Installation

Single static binary. Download, configure config.toml, run. No Docker required for base deployment. Docker is used optionally for the code-interpreter sandbox.

Companion apps available: RigRun Desktop (Electron, pre-release) and RigRun Mobile (Flutter, not yet released).

$ ./rigrun serve
03

First Query

OpenAI-compatible API at /v1/chat/completions.

Drop-in replacement — change your base URL, keep your existing code. Supports streaming (SSE), tool calling, and structured output.

SSE Streaming Tool Calling Structured Output

Topology

What's running on the box

A full RigRun deployment is several cooperating services on different ports. The Go server is the front door; the rest are optional but recommended. Auxiliary models run on CPU to keep VRAM free for the main inference engine.

:8787
RigRun
Go server: chat, tools, research, memory, audio, training control
:3100
Pyros
Pure-Go safety engine: 7-pillar pipeline, optional but recommended
:8096
Rolling Memory
Infinite-context proxy: speaks OpenAI + Anthropic, per-session disk store
:8081
llama-embed
CPU embedding service (Qwen3-Embedding-4B)
:8082
llama-rerank
CPU reranker (Qwen3-Reranker-0.6B)
:8083
llama-classify
CPU classifier for routing (Qwen3-0.6B)
:8084
vision
Vision model (Qwen3-VL-4B), idle-unloaded to free VRAM

Memory layer

Rolling Memory v4 — infinite context proxy

A drop-in OpenAI/Anthropic-compatible reverse proxy on port 8096 that gives any LLM effectively unlimited context via per-session disk-backed verbatim, summary, and embedding tiers. Latency stays flat with depth because retrieval keeps each forward pass roughly the same size regardless of total context length.

10M
Characters tested
2.5M
Tokens tested
100%
Needle recall
~10s
Latency at depth

Verified 2026-04-11: needle-in-haystack 100% pass through 10M chars (8.9–9.1s), 5/5 fact consistency, 10/10 multi-turn retrieval, 17/17 smoke tests passed. Includes per-CWD session isolation, dashboard at /memory/ui, Prometheus metrics, and a built-in /search endpoint backed by SearxNG.

Reference

API Reference

Endpoints listen on different ports per service. Authentication via Authorization: Bearer header. RigRun has ~260 routes total — selected groups shown below.

RigRun

http://localhost:8787
POST /v1/chat/completions Chat inference (OpenAI-compatible)
POST /v1/messages Chat inference (Anthropic-compatible)
GET /v1/models List available models
POST /v1/audio/transcribe Audio transcription
POST /v1/tools/{name}/execute Tool execution with approval handshake
GET /v1/memory/search Semantic memory search
POST /v1/research/sessions Agentic research session (4-phase A-RAG)
GET /metrics Prometheus-format metrics

Pyros Safety Engine

http://localhost:3100
GET /v1/health Engine health + backend probe
GET /v1/layers Pillar status and per-layer state
GET /v1/stats Engine statistics, vitals, throughput
GET /v1/signals Active signal blackboard (gate signals)
GET /v1/breakers Circuit breaker states
GET /metrics Prometheus-format metrics

Rolling Memory v4

http://localhost:8096
POST /v1/chat/completions OpenAI-compatible chat with infinite context
POST /v1/messages Anthropic-compatible chat with infinite context
GET /memory/session/{id}/summaries Per-session summary tier
GET /memory/session/{id}/export Export full session for training/audit
DELETE /memory/session/{id} Wipe a session (cache + disk)
GET /memory/ui Live dashboard with semantic search
GET /search Built-in web search (SearxNG-backed)

Example Request

bash
Streaming chat completion
curl http://localhost:8787/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $RIGRUN_API_KEY" \
  -d '{
    "model": "default",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": true
  }'

Setup

Configuration

All configuration lives in a single config.toml file. Sensible defaults ship out of the box — most deployments only need to set the local backend URL and model name.

[routing]
default_modemax_tierparanoid_modeoffline_mode

Controls how requests are routed between local models and optional cloud fallback. Set offline_mode = true to enforce zero external traffic. paranoid_mode enables full request/response auditing.

[local]
ollama_urlollama_model

Points to your local inference backend. Supports Ollama, llama.cpp (OpenAI-compatible server), and vLLM. The model name here is what the router resolves when a request arrives.

[cloud]
openrouter_keydefault_model

Optional cloud fallback for when local inference is unavailable or the request exceeds local model capability. Disabled by default. Only activates if explicitly configured and offline_mode is false.

[security]
classificationaudit_enabledsession_timeoutmfa

Set the baseline classification level (UNCLASSIFIED, CUI, SECRET). When audit_enabled is true, every request and response is logged to an append-only audit trail. MFA gates the desktop app and API key management.

[training]
enabledschedulemethodsscoring

Controls the self-improving overnight pipeline. Schedule accepts cron syntax. 11 training methods supported (DPO, SimPO, ORPO, KTO, CPO, GRPO, IPO, SPPO, RPO, AERO, iterative DPO-VP) plus ZO2 Ultimate (zeroth-order optimization with 12 advanced techniques: sparse MeZO, HiZOO Hessian, curriculum, cosine epsilon annealing, cross-night EMA, etc.). Pairs are dual-sourced from rolling-memory and legacy sessions, scored by a Skywork reward model (runs locally, no external service dependency) + a sandboxed CodeRL+ execution scorer + the 122B itself as judge.

Internals

Pyros pipeline architecture

Pyros is the standalone safety engine that runs alongside RigRun (or any other LLM server) on port 3100. Every request flows through 7 pillars in middleware order — first listed is outermost wrapper. The pipeline is variable-width: easy queries skip expensive pillars based on Oracle's gate signals.

Pyros 7-pillar pipeline

I Oracle Predictive intel — UCB1 cascade routing, gate signals to skip easy work
II Tribunal Adversarial verification — 6-feature hallucination probe, courtroom debate
III Fortress Pre-inference defense — prompt injection blocks at conf ≥ 0.7, SmoothLLM, AIS detectors
IV MindEye Metacognition — knowledge graph injection, isotonic calibration, QualityScore
V Forge Self-improvement — EvoPrompt evolution, MDL prompt compression, DICE exemplar capture
VI Crucible Operational safety — circuit breakers, OTel spans, stigmergic blackboard
VII Singularity Homeostatic regulation — PID over runtime vitals, superposition, digital twin
LLM Inference Token generation via the configured backend (RigRun, vLLM, llama.cpp, OpenAI, Anthropic)
Response passes back through all pillars in reverse for post-processing

Note: Pyros runs as a separate process. RigRun has its own 11-layer Go middleware stack (CORS → Auth → SessionTimeout → JWT → Trace → Metrics → RateLimit → Logging → MaxBody → SecurityHeaders → Recovery) that runs before the request is forwarded to Pyros. The two engines are independent — either can run without the other.

7
Pyros pillars
11
RigRun middleware
≥99%
Adversarial block
0
External dependencies

Extensibility

Creating Custom Agents

Agents are created through the Agent Factory — describe what you need in plain English, and the 14-step pipeline handles design, testing, and deployment.

01

System prompt synthesis

Multi-pass generation with independent critique. A second model reviews and hardens the prompt before deployment.

02

Tool selection from typed registry

The factory auto-selects from a typed tool registry based on the agent's domain requirements. Each tool has input/output schemas and usage examples.

03

Knowledge base assembly

Vector store with semantic chunking. Documents are ingested, chunked by semantic boundaries (not token count), embedded, and indexed in ChromaDB.

04

Adversarial stress testing

Five probe types: prompt injection, hallucination induction, scope violation, confidentiality extraction, and output format corruption.

05

Quality gate

Human approval required before any agent goes live. Automated scoring checks 100% happy-path pass rate and 50% edge-case pass rate minimum.

Help

Support

Founding Access

Founding Access

Direct line to the builder. Three founding seats, hand-selected. Roadmap input, locked-in pricing when paid tiers open, and your conversations preserved for your own eventual local copy.

License

License Customers

Pricing finalizes after the Founding Access program closes. Email support included with every RigRun license. Configuration assistance and upgrade guidance.

Enterprise

Enterprise

Dedicated technical support. On-premise installation assistance. Custom SLA available. Multi-node and classified environment design experience.