Products

Five ways to put AI to work

Local AI infrastructure, tested agents, custom ML models trained on your data, a standalone safety engine for any LLM, and a P2P mesh that scales across whatever devices you already have. Everything runs on your hardware. Nothing touches the cloud.

RigRun

Complete AI server. One GPU. Zero cloud dependency.

122B Model Inference

Run a 122-billion parameter model on a single GPU. 105 tokens/second, 1M-token context window (YaRN-scaled). Zero per-token costs.

5-Layer Safety Stack

Prompt injection detection, action gating, trajectory anomaly detection, learned classification, and spillage prevention. Adversarial prompt injection blocked at confidence ≥ 0.7 across all test vectors.

Self-Improving

Overnight training pipeline learns from every conversation. DPO preference optimization on your own data. The model gets better at your specific workflows.

Rolling Memory — Effectively Infinite Context

Verified 2026-04-11

Beyond the 1M-token native window, Rolling Memory v4 extends effective context indefinitely through per-session disk-backed verbatim, summary, and embedding tiers. Latency stays flat with depth because retrieval keeps each forward pass roughly the same size regardless of total context length.

10M
Characters tested
2.5M
Tokens tested
100%
Needle recall
~10s
Latency at any depth

Test rig: needle-in-haystack 100% pass through 10M chars (8.9–9.1s), 5/5 fact consistency, 10/10 multi-turn retrieval, 17/17 smoke tests passed. Drop-in OpenAI/Anthropic-compatible reverse proxy on port 8096 with per-session isolation, dashboard, and Prometheus metrics.

RTX PRO 6000 Blackwell  ///  Qwen3.5-122B  ///  105 tok/s  ///  1M token context  ///  Runs on your hardware

Companion Apps

Talk to your stack from any surface

Native desktop and mobile clients that speak directly to your RigRun server. Both ship with the same OpenAI-compatible API contract auto-generated from the Go server's OpenAPI spec.

RigRun Desktop

Pre-release

Electron + Next.js 14 + React. Bundles its own RigRun Go backend so a single installer ships the server and the UI together. WebAssembly LLM in the renderer (via @mlc-ai/web-llm) provides a second inference path independent of the Go backend.

ElectronNext.js 14ReactTiptapRadix UIVitest + Playwright
~24,000 lines  TypeScript / TSX  ·  Windows / macOS / Linux

RigRun Mobile

Not yet released

Flutter native app for iOS and Android. Riverpod state, Hive storage, Dio HTTP, freezed models, flutter_secure_storage for credentials. Connects to your RigRun server over Tailscale or any private mesh — your data never touches a third-party cloud.

FlutterRiverpodHiveDioMaestro E2EFirebase Test Lab
~50,000 lines  Dart  ·  iOS / Android  ·  ~170 test files

Both apps are in active development. Desktop pre-release builds available on request. Mobile public release timing TBD.

Pricing

Technical preview now. Production licensing soon.

3 seats — Technical Preview

Founding Access

Invite Only

Three founding seats. 90 days of unlimited inference on the same stack we use to ship Thornveil's products. Hand-selected previewers who shape the roadmap and lock in pricing when paid access opens.

  • Unlimited inference on Qwen3.5-122B (no metering)
  • All 6 domain agents included
  • OpenAI-compatible API endpoint
  • Direct line to the builder
  • Roadmap input
  • Locked-in pricing when paid tiers open
  • Your conversations are preserved — when you bring RigRun in-house, your local copy ships pre-trained on your own usage
Apply for Founding Access
Coming Soon

RigRun License

TBD

Deploy on your own hardware. Your data never leaves your building. Pricing finalizes after the Founding Access program closes.

  • Full server binary + desktop app
  • Unlimited local inference
  • 5-layer safety proxy + routing
  • Self-training pipeline
  • Agent Factory (unlimited agents)
  • Self-regulating inference engine (7 autonomous optimization layers)
  • 1 year of updates + support
Contact for Inquiries

Enterprise

Custom

Multi-node, classified environments, mesh networking.

  • Everything in RigRun License
  • Mycelium mesh networking
  • Classification routing (CUI–TS)
  • On-premise installation support
  • Custom agent development
  • Dedicated technical support
Contact Us

Domain Expert Agents

Built by the Agent Factory. 14-step pipeline. Included with every Founding Access seat. Custom agents available as standalone projects — contact us for scoping.

Code Reviewer

9 tools
Senior Principal Security Engineer

Finds real bugs, security vulnerabilities, and performance bottlenecks. Produces structured JSON reviews with severity ratings, file:line locations, impact analysis, and concrete fix suggestions.

Output Format

Structured JSON with verdict, severity, file:line, impact, fix suggestion

SecurityBug DetectionPerformanceStatic Analysis

Security Auditor

11 tools
Principal Application Security Engineer

Comprehensive code audits mapped to CWE, OWASP Top 10, and CVSS 3.1 scores. Traces data flow from source to sink. Checks auth, authz, crypto, and dependencies.

Output Format

Audit report with vulnerability table, data flow maps, attack paths, remediation

CWEOWASPCVSSData Flow Tracing

Documentation Writer

10 tools
Senior Technical Documentation Architect

Reads actual code before writing anything. Produces READMEs, API docs, architecture overviews, and setup guides. Every claim traceable to specific file:line references.

Output Format

Markdown documentation with file:line citations and verification steps

READMEAPI DocsArchitectureSetup Guides

Sprint Planner

10 tools
Senior Engineering Manager

Analyzes codebases and git history to create actionable sprint plans. P0/P1/P2 priority. Each task 30-120 minutes with concrete steps and verifiable done-when criteria.

Output Format

JSON sprint plan with priority, duration, steps, and completion criteria

AgileGit AnalysisTask DecompositionDelegation

SBIR Proposal Writer

7 tools
Senior Federal Grant Strategist

Writes SBIR/STTR proposals for DoD, DHS, and NSF. Leads with the agency problem. Quantifies with benchmarks. TAM/SAM/SOM analysis. Work plans with milestones.

Output Format

Complete proposal sections with compliance matrix and evaluation alignment

DoDDHSNSFTAM/SAM/SOMCompliance

Patent Drafter

10 tools
Software AI/ML Patent Attorney

Drafts provisional patent applications for software and AI/ML inventions. 15-25 claims with proper legal language. Analyzes prior art. Handles Alice Corp. eligibility.

Output Format

Complete provisional application with claims, specification, and abstract

USPTOClaimsAlice Corp.Prior Art

Powered by Agent Factory

How agents are made

Every agent goes through a 14-step manufacturing pipeline. No manual prompt engineering. No guesswork.

Design
01
Requirements decomposition
Natural language in, structured spec out
02
Multi-pass prompt synthesis
Draft by 122B, critique by independent model
03
Tool selection
Auto-selects from 12-tool registry
04
Knowledge base assembly
ChromaDB with semantic chunking
05
Few-shot example generation
2-3 gold examples per agent
06
Tool choreography
Frontier model shows how to use tools
Test
07
Frontier calibration
Compare output against Claude Opus
08
Gap analysis
Identify and fix quality delta
09
Adversarial stress test
5 probes: injection, hallucination, scope
10
Prompt hardening
Auto-patch for discovered failure modes
Deploy
11
Functional testing
Keyword + structural validation
12
Quality gate
100% happy-path, 50% edge-case pass rate
13
Human approval
Pending until explicitly activated
14
Continuous improvement
DPO pairs from real usage

Need something specific?

The Agent Factory can build agents for any domain. Legal research, financial analysis, medical literature review, real estate, HR policy, marketing copy. Describe what you need.

HawkStack

Custom ML models trained on your data. Thornveil predicts performance before training, build sub-2M parameter models that beat 50M+ competitors, and deploy to edge hardware. Six domains validated. One architecture.

Proven Domains

ThermalHawk Thermal drone detection 82.95% mAP 1.77M params
SentryHawk IRST / missile warning 92.29% IoU 798K params
DepthHawk Sonar target detection 88.0% mAP 1.2M params
CardioHawk ECG arrhythmia (NSV classes; F/Q in development) 94.1% F1 8.9K params
ForgeHawk PCB defect inspection 97.63% mAP 82K params
WildHawk Wildlife camera trap ID Active R&D < 2M params

Predict Before Training

Topology analysis predicts the performance ceiling, optimal architecture, and training recipe before a single model is trained. You know what's achievable before paying for compute.

Tiny Models Beat Giants

Custom WEM (Weighted Expert Mixture) backbone with domain-adaptive receptive field branches. The result: models under 2M parameters that run on microcontrollers and SBCs (Jetson, Kria, COTS-ruggedized) while matching models 30x their size.

Performance Commitment

If the predicted ceiling is 92% mAP and the delivered model achieves 85%, deliverable acceptance is tied to the predicted performance ceiling. Thornveil only commits to delivering what the topology math supports.

6 domains  ///  4 modalities  ///  1 architecture  ///  Patent pending

Service Tiers

Your data. Your model. Your hardware.

Topology Audit

$2-5K

Know what's achievable before you commit. Send us your dataset characteristics — Thornveil predicts the ceiling.

  • 3-parameter topology analysis
  • Predicted performance ceiling
  • Recommended architecture
  • Estimated training time
  • Dataset sufficiency assessment
Request Audit
Recommended

Custom Model Build

$10-25K

Full model design, training, and validation. Deliverable: trained model (usually <100KB), training code, deployment guide.

  • Everything in Topology Audit
  • WEM backbone with custom RF branches
  • SGDR cosine-restart training with loss surface basin analysis
  • Prototypical network heads for imbalanced classes
  • Trained model + training code
  • Performance commitment vs ceiling
Start Build

Edge Deployment

$25-50K

Full build plus production deployment. ONNX/TensorRT/CoreML export, latency benchmarking, drift detection.

  • Everything in Custom Model Build
  • ONNX / TensorRT / CoreML export
  • Target hardware benchmarking
  • Adaptive inference pipeline
  • Monitoring + drift detection
  • Integration support
Contact Us

Target markets: defense/intelligence, medical devices, industrial inspection, wildlife conservation.
ROI story: a $15K model that runs on a COTS MCU saves $500K/year in GPU deployment costs across your fleet.

Pyros Safety Engine

A 17,000-line pure-Go engine that wraps any LLM in a 7-pillar safety pipeline. Drop it in front of RigRun, OpenAI, Anthropic, vLLM, llama.cpp, or Ollama — Pyros doesn't care which model is downstream.

17K LOC
pure Go
7 pillars
consolidating 23+ packages
144 files
36 test files
Zero CGO
one binary
Zero Python
no runtime deps

Why this exists

Most LLM safety stacks are Python wrappers around a moderation API call. Pyros is a separate process you put between your application and the model, written in pure Go with no Python dependency, that implements the actual safety algorithms from the literature — SmoothLLM, isotonic calibration, EvoPrompt, negative-selection AIS, PID homeostasis — instead of forwarding the question to a hosted classifier.

Pyros does not require RigRun. It runs as a standalone HTTP service on port 3100 and accepts any backend that speaks the OpenAI-compatible chat completions format.

The seven pillars

Each pillar consolidates 2–5 algorithmic packages into a single Layer in the pipeline. The pipeline is variable-width — easy queries skip expensive pillars.

I

Oracle

Predictive intel

Forecasts request load via time-series, UCB1-routes by difficulty, emits skip signals so downstream pillars can short-circuit easy queries.

II

Tribunal

Adversarial verification

6-feature hallucination probe with weighted scoring. Optional reward-model integration. Courtroom-style structured verification workflow.

III

Fortress

Pre-inference defense

Prompt injection BLOCKS at confidence ≥ 0.7. SmoothLLM stability check (Robey 2023). Negative-selection AIS detectors learn your traffic's baseline.

IV

MindEye

Metacognition

Knowledge-graph context injection. Epigenetic temperature/token suggestions. Pure-Go isotonic calibration writing a QualityScore on every response.

V

Forge

Self-improvement

EvoPrompt crossover for prompt evolution. MDL prompt compression. DICE training-exemplar capture. The pipeline gets sharper on your workload over time.

VI

Crucible

Operational safety

Circuit breakers, OpenTelemetry trace spans, Shannon-entropy criticality scoring, pheromone-style stigmergic blackboard for inter-agent signaling.

VII

Singularity

Homeostatic regulation

PID controller over runtime vitals with anti-windup. Superposition activation for parallel candidates. Discrete-event digital-twin simulator.

Every feature has a file:line citation

Pyros doesn't have marketing claims. It has code. Each feature below points at the specific file and line in the source where the algorithm lives.

Prompt injection blocking

pillar/fortress.go:47

Blocks at confidence ≥ 0.7 before the request reaches the model. SmoothLLM perturbation + supermajority vote layered on top.

Hallucination probe

hallucination/probe.go:70

6 weighted features: 4-gram repetition, entity consistency, numeric density, hedge density, entropy variance, confidence gap.

Isotonic calibration

calibration/calibration.go:226

Pure-Go PAVA implementation, validated against sklearn 1.4. Honestly labeled "QualityScore" — not a calibrated probability.

Constant-time auth

server_auth.go:53

Mandatory bearer token for non-localhost binds. Compared via subtle.ConstantTimeCompare. Timing-attack hardened by default.

Admission + graceful shutdown

pyros.go:138

Sized inbound semaphore, in-flight WaitGroup, async PostWorker drained before persistence so bookkeeping writes never get lost.

Audit chain

audit/file_sink.go

JSON-Lines audit sink with mutex serialization. Pluggable backend interface — file sink ships, syslog/network sinks slot in.

Why pure Go

No Python.
No CGO.
One binary.

Pyros reimplements the algorithms it needs from primary sources rather than importing sklearn, numpy, chromadb, or simpy. PAVA isotonic regression. UCB1 bandit. Cosine vector store. Discrete-event simulator. MDL compressor. EvoPrompt crossover.

Ships as a single statically-linked artifact. Drops onto an air-gapped system without an installer. PAVA implementation is bit-for-bit validated against sklearn 1.4 in calibration/pava_reference_test.go.

Research grounding

Built on the literature, not on vibes

SmoothLLM
Fortress stability check
Robey, Wong, Hassani, Kolter (2023)
EvoPrompt
Forge prompt evolution
Guo et al. (2023)
PAVA / isotonic regression
MindEye calibration
Zadrozny & Elkan (2002)
UCB1 contextual bandit
Oracle cascade routing
Auer, Cesa-Bianchi, Fischer (2002)
Negative Selection (AIS)
Fortress immune detectors
Forrest, Perelson, Allen (1994)
PID with anti-windup
Singularity homeostasis
Aström & Hägglund (2006)
Minimum Description Length
Forge prompt compression
Rissanen (1978)

Works with anything that speaks chat completions

Provider packages ship for Ollama, vLLM, and llama.cpp. Anything else that speaks the OpenAI-compatible chat completions format slots in through the same adapter.

Ollama
vLLM
llama.cpp
OpenAI
Anthropic
Azure
Groq
OpenRouter

Want to put Pyros in front of your stack?

Pyros is in early access. If you're running an LLM in production and want a real safety perimeter — not a moderation API call — get in touch.

github.com/jeranaias/pyros  ·  v0.1.0

Mycelium Mesh

A peer-to-peer mesh network that turns heterogeneous devices — phones, laptops, workstations, servers — into nodes in a distributed AI inference system. Built for defense, enterprise, and edge environments where centralized cloud AI is not viable. No center, no config, no exfiltration, no connectivity requirement, no capability floor.

65K LOC
Go server
8 layers
P2P stack
5 phases
complete
40 claims
THRN-022
Phase 6
in progress

Why this exists

Most distributed inference systems assume a datacenter — InfiniBand, RDMA, dedicated edge servers, or trusted hardware enclaves. Mycelium assumes none of that. It's built for the WiFi you already have, the laptops your team already owns, and nodes you don't fully trust. Designed for DoD contested communications, disaster response, privacy-sensitive environments, and developing regions where the centralized cloud is unreliable, expensive, or hostile.

Three core innovations are backed by USPTO provisional THRN-022 (40 claims): distributed expert routing with predictive prefetching, post-generation style normalization for cross-model consistency, and a 3-layer proof-of-inference protocol that catches lazy or compromised nodes.

Three core innovations

Each is implemented in source. Each has a USPTO provisional claim attached. Each one solves a problem that has stopped prior distributed-MoE work from running on consumer hardware.

Distributed MoE Expert Routing

The mesh expert system tracks which peers host which model experts via the gossip protocol. When a query arrives, the expert registry locates the best peer for each required expert computation by load and latency. An expert proxy dispatches computation requests with predictive prefetching — firing async requests during the attention computation window to hide network latency.

mesh/expert_proxy.go

Cross-Model Style Consistency

Different nodes may run different model sizes (0.6B to 122B parameters). The style normalizer applies post-generation processing to ensure consistent tone, formatting, and verbosity regardless of which node generated the response. Hedging removal, preamble stripping, code-block normalization, verbosity control. Seamless escalation uses entropy monitoring to detect when a small model is uncertain and transparently hand off mid-generation to a larger peer.

mesh/style.go

Proof-of-Inference (PoI)

Three-layer cryptographic verification ensuring mesh nodes perform honest computation. Behavioral fingerprinting catches model substitution. Merkle execution traces catch computation shortcuts. Economic reputation with 1% spot-check rate makes persistent dishonesty unprofitable. Combined verdict: 0.3·fingerprint + 0.3·trace + 0.4·reputation.

mesh/poi.go

Architecture

An 8-layer P2P stack

Each layer is a standalone Go package. The full stack runs as a single self-contained inference server on every node. Hardware detection at the bottom assigns roles automatically; the application layer at the top exposes a drop-in OpenAI-compatible API.

The mesh layer alone is 19 Go files / 11K LOC covering DHT, gossip, expert proxy, NAT traversal, load balancing, proof-of-inference, and style normalization. Phase 1 reuses 32 packages extracted from RigRun (auth, router, backend, security, training, memory, RAG).

8
Application
OpenAI-compatible API, chat, files
7
Routing & Verification
Cascade router, PoI, reputation
6
Style Consistency
Post-generation normalization
5
Expert Orchestrator
Registry, proxy, prefetching
4
Transport
HTTP/SSE forwarding, NAT traversal
3
Network Topology
Kademlia DHT, gossip, mDNS
2
Backend Abstraction
Ollama, llama.cpp, SGLang
1
Hardware Detection
GPU/CPU/NPU discovery & tiering

Hardware tiers

Mycelium auto-detects hardware capability and assigns each node a role. A phone running a 1B model is just as much a mesh participant as a multi-GPU fortress server — they just route different workloads.

Micro
Raspberry Pi, Mobile
0.6B – 3B
Edge routing, simple queries
Edge
Laptop (8GB), Jetson
3B – 8B
Token gen, classification
Standard
Desktop GPU (RTX 3060+)
8B – 14B
General inference
Power
Workstation (24GB+ VRAM)
14B – 70B
Complex reasoning, expert hosting
Fortress
Multi-GPU server
70B+
Heavy inference, training, aggregation

Discovery + propagation protocols

How nodes find each other, share trained adapters, and distribute model files across the mesh.

Discovery (mDNS + DHT + STUN)

LAN: mDNS advertisement under _mycelium._tcp.local. with capability metadata. DHT: Kademlia 160-bit node IDs, k=20 buckets. WAN: UPnP port mapping + STUN (RFC 5389) for NAT traversal. Manual peer registration available for cross-subnet connectivity.

Adapter Gossip Protocol (AGP)

Nodes share locally-trained LoRA/QLoRA adapters via epidemic gossip. Announcements propagate with fanout=3 every 2 minutes. Adapters carry loss metrics, dataset size, and weight hashes. Receiving nodes evaluate against local test sets. BitTorrent-style chunked transfer (64MB chunks, SHA-256 verified).

Model Distribution Protocol (MDP)

Full model files distributed across the mesh. Chunked transfer with parallel downloads (4 concurrent, 64MB per chunk), SHA-256 per-chunk and full-file verification, resume support for interrupted transfers.

Project status

Five of six phases complete. Phase 6 (production hardening) is in progress — end-to-end integration tests across multi-node deployments.

1

Core Protocol

Complete
mDNS discovery, DHT, peer registry, health checking
2

Query Routing

Complete
7-stage cascade, SSE forwarding, load balancing, gossip
3

Distributed Inference

Complete
Expert registry, computation proxy, prefetching, PoI
4

Consistency

Complete
Style normalization, seamless escalation, adapter sharing
5

WAN Scaling

Complete
NAT traversal (UPnP + STUN), DHT-based WAN discovery
6

Production Hardening

In Progress
End-to-end integration tests, multi-node stress testing

Security baseline

Auth   JWT, OAuth/OIDC, API keys, MFA (TOTP)
Authz   Classification-based (UNCLASS → TS)
Audit   HMAC-chained tamper-evident log
Crypto   TLS 1.3 in transit, AES-256-GCM at rest
Compliance   NIST 800-53 Rev 5 controls
Verification   Cryptographic proof-of-inference

Want to deploy a mesh?

Mycelium licensing is available for defense, enterprise, and research deployments. Tell Thornveil what you're trying to put on the mesh for a scoping conversation.

github.com/jeranaias/mycelium  ·  private  ·  THRN-022