How it works

What you get under the hood

Four systems that turn a single GPU (or a room full of laptops) into a private AI stack your team controls end-to-end. No cloud, no per-token billing, no data leaving your building.

RigRun

Unlimited inference on your hardware

A frontier-class model running in your server room, priced like electricity instead of tokens. One GPU, one binary, unlimited queries. Your sensitive code and documents never leave your network.

RigRun ships with everything your team needs on day one: a safety stack that blocks adversarial prompts, a fast-and-slow dual-model cascade that keeps latency down, and an overnight self-improvement loop so the model gets better on your workload without ever going offline.

Fit more context into the same GPU

TurboQuant compresses the KV cache 5.2× (paper submitted, under review) so you get longer conversations and larger codebases in memory without buying a second GPU. Combined with Rolling Memory v4, effective context becomes unlimited.

Model learns your workflow overnight

ZO2 training pipeline improves the model from your real usage while inference keeps serving. Wake up to a model that knows your codebase better than yesterday. Zero downtime.

Safety stack you can defend in an audit

Five layers — action gating, learned classifier, trajectory anomaly detection, declarative policy DSL, and classification spillage detection — stop prompt-injection and data exfiltration before they reach the model.

Fast by default, deep when it matters

Cascade routes between the 122B MoE fast-path (17B active parameters at 184 tok/s) and full expert activation (105 tok/s). Single model, dual speed. Your users get answers fast; full expert depth engages only when it matters.

Specs

Model Qwen3.5-122B-A10B

Parameters 397B total / 17B active (MoE)

Generation 105 tok/s (122B)

Fast mode 184 tok/s (35B)

Context 1M tokens (YaRN)

Effective context Unlimited (Rolling Memory)

KV Compression 5.2x (TurboQuant)

VRAM 96GB (98% utilized)

Hardware RTX PRO 6000 Blackwell

Per-token fee $0 — ever

Data egress None (fully on-prem)

Architecture Model-agnostic. Any GGUF, GPTQ, or AWQ model is hot-swappable.

Pyros

A safety perimeter for any LLM

A 17,000-line pure-Go engine that wraps any large language model in a 7-pillar safety pipeline. Most LLM safety stacks are Python wrappers around a moderation API call. Pyros is a separate process that implements the actual algorithms from the literature — SmoothLLM, isotonic calibration, EvoPrompt, negative-selection AIS, PID homeostasis — instead of forwarding the question to a hosted classifier.

Pyros doesn't require RigRun. Drop it in front of OpenAI, Anthropic, vLLM, llama.cpp, Ollama, or anything else that speaks the OpenAI-compatible chat completions format.

Real prompt-injection blocking, not vibes

Multi-signal injection scanner blocks at confidence ≥ 0.7 before the request reaches the model. SmoothLLM perturbation + supermajority vote layered on top. Negative-selection AIS detectors learn what your traffic normally looks like and flag what does not.

Hallucination probe with measured weights

6 weighted features: 4-gram repetition (0.25), entity consistency (0.20), numeric density (0.10), hedge density (-0.15), entropy variance (0.25), confidence gap (0.20). Negative hedge weight is intentional — hedging is a signal of calibration, not failure.

Pure Go, ships as one binary

Reimplements PAVA isotonic regression, UCB1 contextual bandit, cosine vector store, discrete-event simulator, MDL compressor, and EvoPrompt crossover from primary sources. No Python runtime, no CGO, no transitive C library surface. Drops onto an air-gapped system without an installer.

Hardened and fuzz-tested

Constant-time bearer auth (subtle.ConstantTimeCompare), admission semaphore, in-flight WaitGroup tracked, async PostWorker drained before persistence, HMAC-chained audit log, Prometheus metrics. Fuzz tests across the safety surface, race tests across all 28 packages.

The 7 Pillars

I Oracle Predictive intel

II Tribunal Adversarial verification

III Fortress Pre-inference defense

IV MindEye Metacognition

V Forge Self-improvement

VI Crucible Operational safety

VII Singularity Homeostatic regulation

17K

LOC pure Go

Python deps

Device Tiers

Phone 0.6-1.7B 15-30 tok/s

Laptop 4-30B 10-40 tok/s

Workstation 30-72B 20-50 tok/s

Server 122B+ 105+ tok/s

Cluster 400B+ distributed Variable

Mycelium

Scale out without a data center

Turn any collection of devices — phones, laptops, vehicles, servers — into a single AI cluster that heals itself. Need more capacity? Plug in another node. Lose a node in the field? The mesh routes around it. Works offline, survives network partitions, and has no single point of failure to defend.

Mycelium runs over the network you already have. No InfiniBand. No specialized hardware. No data center buildout.

Works over the WiFi you already have

Inference distributes across devices by routing only the small subset of MoE experts each token actually needs. No InfiniBand, no specialized interconnects, no datacenter-grade switches required.

Trust nodes you do not own

Every node proves it actually ran the work — through behavioral spot checks, signed execution traces, and reputation scoring. A compromised or lazy node gets caught and kicked out. No trusted hardware chips required.

One consistent voice across devices

A query that lands on a phone and one that lands on your server return answers in the same style and format. Your users never know (or care) which device answered.

Classification stays put

UNCLASSIFIED routes anywhere. CUI stays on your mesh. SECRET never leaves the device it originated on. Enforced at the routing layer — not as an afterthought.

Agent Factory

Custom AI agents in days, not months

Tell us what you need your AI to do, in plain English. You get back a tested agent — with the right tools, the right knowledge base, the right guardrails, and an evaluation rubric your team can audit. Every agent goes through adversarial stress tests and a human approval gate before it ever touches your users.

The result: a bespoke AI specialist tuned to your workflow, delivered in days instead of months of prompt engineering. And every agent keeps getting better from real usage.

See the agents →

Pipeline
steps

Tools in
registry

Adversarial
probes

Compliance

Security & Compliance

Controls implemented in source. No formal security assessment, third-party audit, or accreditation has been completed. CAC/PKI is implemented but not yet validated for deployment.

Implemented Controls

NIST 800-53 Rev 5

AC-2 Account Management

AC-4 Information Flow

AU-2 Audit Events

AU-9 Audit Protection

IA-2 Multi-Factor Auth

IA-5 Authenticator Mgmt

IR-4 Incident Handling

SC-8 Transmission Confidentiality

SC-13 Cryptographic Protection

SC-28 Data at Rest

PS-6 Access Agreements

Classification enforcement (UNCLASS through TS)

AES-256-GCM encryption at rest with PBKDF2 key derivation

HMAC-chained audit logs with tamper detection

JWT/OAuth/OIDC authentication with MFA (TOTP)

Certification Roadmap

SOC 2 Type II

In planning

CMMC Level 2

Controls implemented, assessment pending

FedRAMP

Not yet pursued (requires sponsoring agency)

ATO

SSP in development, pen test pending

FIPS 140-3

Runtime verification implemented, formal validation pending

Thornveil is transparent about what is implemented and what is in progress. No certification is claimed until formally granted.