How it works

What you get under the hood

Four systems that turn a single GPU (or a room full of laptops) into a private AI stack your team controls end-to-end. No cloud, no per-token billing, no data leaving your building.

RigRun

Unlimited inference on your hardware

A frontier-class model running in your server room, priced like electricity instead of tokens. One GPU, one binary, unlimited queries. Your sensitive code and documents never leave your network.

RigRun ships with everything your team needs on day one: a safety stack that blocks adversarial prompts, a fast-and-slow dual-model cascade that keeps latency down, and an overnight self-improvement loop so the model gets better on your workload without ever going offline.

Fit more context into the same GPU
TurboQuant compresses the KV cache 5.2× (paper submitted, under review) so you get longer conversations and larger codebases in memory without buying a second GPU. Combined with Rolling Memory v4, effective context becomes unlimited.
Model learns your workflow overnight
ZO2 training pipeline improves the model from your real usage while inference keeps serving. Wake up to a model that knows your codebase better than yesterday. Zero downtime.
Safety stack you can defend in an audit
Five layers — action gating, learned classifier, trajectory anomaly detection, declarative policy DSL, and classification spillage detection — stop prompt-injection and data exfiltration before they reach the model.
Fast by default, deep when it matters
Cascade routes between the 122B MoE fast-path (17B active parameters at 184 tok/s) and full expert activation (105 tok/s). Single model, dual speed. Your users get answers fast; full expert depth engages only when it matters.

Specs

Model Qwen3.5-122B-A10B
Parameters 397B total / 17B active (MoE)
Generation 105 tok/s (122B)
Fast mode 184 tok/s (35B)
Context 1M tokens (YaRN)
Effective context Unlimited (Rolling Memory)
KV Compression 5.2x (TurboQuant)
VRAM 96GB (98% utilized)
Hardware RTX PRO 6000 Blackwell
Per-token fee $0 — ever
Data egress None (fully on-prem)
Architecture Model-agnostic. Any GGUF, GPTQ, or AWQ model is hot-swappable.

Pyros

A safety perimeter for any LLM

A 17,000-line pure-Go engine that wraps any large language model in a 7-pillar safety pipeline. Most LLM safety stacks are Python wrappers around a moderation API call. Pyros is a separate process that implements the actual algorithms from the literature — SmoothLLM, isotonic calibration, EvoPrompt, negative-selection AIS, PID homeostasis — instead of forwarding the question to a hosted classifier.

Pyros doesn't require RigRun. Drop it in front of OpenAI, Anthropic, vLLM, llama.cpp, Ollama, or anything else that speaks the OpenAI-compatible chat completions format.

Real prompt-injection blocking, not vibes
Multi-signal injection scanner blocks at confidence ≥ 0.7 before the request reaches the model. SmoothLLM perturbation + supermajority vote layered on top. Negative-selection AIS detectors learn what your traffic normally looks like and flag what does not.
Hallucination probe with measured weights
6 weighted features: 4-gram repetition (0.25), entity consistency (0.20), numeric density (0.10), hedge density (-0.15), entropy variance (0.25), confidence gap (0.20). Negative hedge weight is intentional — hedging is a signal of calibration, not failure.
Pure Go, ships as one binary
Reimplements PAVA isotonic regression, UCB1 contextual bandit, cosine vector store, discrete-event simulator, MDL compressor, and EvoPrompt crossover from primary sources. No Python runtime, no CGO, no transitive C library surface. Drops onto an air-gapped system without an installer.
Hardened and fuzz-tested
Constant-time bearer auth (subtle.ConstantTimeCompare), admission semaphore, in-flight WaitGroup tracked, async PostWorker drained before persistence, HMAC-chained audit log, Prometheus metrics. Fuzz tests across the safety surface, race tests across all 28 packages.

The 7 Pillars

I Oracle Predictive intel
II Tribunal Adversarial verification
III Fortress Pre-inference defense
IV MindEye Metacognition
V Forge Self-improvement
VI Crucible Operational safety
VII Singularity Homeostatic regulation
17K
LOC pure Go
0
Python deps

Device Tiers

Phone 0.6-1.7B 15-30 tok/s
Laptop 4-30B 10-40 tok/s
Workstation 30-72B 20-50 tok/s
Server 122B+ 105+ tok/s
Cluster 400B+ distributed Variable

Mycelium

Scale out without a data center

Turn any collection of devices — phones, laptops, vehicles, servers — into a single AI cluster that heals itself. Need more capacity? Plug in another node. Lose a node in the field? The mesh routes around it. Works offline, survives network partitions, and has no single point of failure to defend.

Mycelium runs over the network you already have. No InfiniBand. No specialized hardware. No data center buildout.

Works over the WiFi you already have
Inference distributes across devices by routing only the small subset of MoE experts each token actually needs. No InfiniBand, no specialized interconnects, no datacenter-grade switches required.
Trust nodes you do not own
Every node proves it actually ran the work — through behavioral spot checks, signed execution traces, and reputation scoring. A compromised or lazy node gets caught and kicked out. No trusted hardware chips required.
One consistent voice across devices
A query that lands on a phone and one that lands on your server return answers in the same style and format. Your users never know (or care) which device answered.
Classification stays put
UNCLASSIFIED routes anywhere. CUI stays on your mesh. SECRET never leaves the device it originated on. Enforced at the routing layer — not as an afterthought.

Agent Factory

Custom AI agents in days, not months

Tell us what you need your AI to do, in plain English. You get back a tested agent — with the right tools, the right knowledge base, the right guardrails, and an evaluation rubric your team can audit. Every agent goes through adversarial stress tests and a human approval gate before it ever touches your users.

The result: a bespoke AI specialist tuned to your workflow, delivered in days instead of months of prompt engineering. And every agent keeps getting better from real usage.

14
Pipeline
steps
12
Tools in
registry
5
Adversarial
probes

Compliance

Security & Compliance

Controls implemented in source. No formal security assessment, third-party audit, or accreditation has been completed. CAC/PKI is implemented but not yet validated for deployment.

Implemented Controls

NIST 800-53 Rev 5

AC-2 Account Management
AC-4 Information Flow
AU-2 Audit Events
AU-9 Audit Protection
IA-2 Multi-Factor Auth
IA-5 Authenticator Mgmt
IR-4 Incident Handling
SC-8 Transmission Confidentiality
SC-13 Cryptographic Protection
SC-28 Data at Rest
PS-6 Access Agreements
Classification enforcement (UNCLASS through TS)
AES-256-GCM encryption at rest with PBKDF2 key derivation
HMAC-chained audit logs with tamper detection
JWT/OAuth/OIDC authentication with MFA (TOTP)

Certification Roadmap

SOC 2 Type II
In planning
CMMC Level 2
Controls implemented, assessment pending
FedRAMP
Not yet pursued (requires sponsoring agency)
ATO
SSP in development, pen test pending
FIPS 140-3
Runtime verification implemented, formal validation pending

Thornveil is transparent about what is implemented and what is in progress. No certification is claimed until formally granted.