Products
Five ways to put AI to work
Local AI infrastructure, tested agents, custom ML models trained on your data, a standalone safety engine for any LLM, and a P2P mesh that scales across whatever devices you already have. Everything runs on your hardware. Nothing touches the cloud.
RigRun
Complete AI server. One GPU. Zero cloud dependency.
122B Model Inference
Run a 122-billion parameter model on a single GPU. 105 tokens/second, 1M-token context window (YaRN-scaled). Zero per-token costs.
5-Layer Safety Stack
Prompt injection detection, action gating, trajectory anomaly detection, learned classification, and spillage prevention. Adversarial prompt injection blocked at confidence ≥ 0.7 across all test vectors.
Self-Improving
Overnight training pipeline learns from every conversation. DPO preference optimization on your own data. The model gets better at your specific workflows.
Rolling Memory — Effectively Infinite Context
Verified 2026-04-11Beyond the 1M-token native window, Rolling Memory v4 extends effective context indefinitely through per-session disk-backed verbatim, summary, and embedding tiers. Latency stays flat with depth because retrieval keeps each forward pass roughly the same size regardless of total context length.
Test rig: needle-in-haystack 100% pass through 10M chars (8.9–9.1s), 5/5 fact consistency, 10/10 multi-turn retrieval, 17/17 smoke tests passed. Drop-in OpenAI/Anthropic-compatible reverse proxy on port 8096 with per-session isolation, dashboard, and Prometheus metrics.
RTX PRO 6000 Blackwell /// Qwen3.5-122B /// 105 tok/s /// 1M token context /// Runs on your hardware
Companion Apps
Talk to your stack from any surface
Native desktop and mobile clients that speak directly to your RigRun server. Both ship with the same OpenAI-compatible API contract auto-generated from the Go server's OpenAPI spec.
RigRun Desktop
Pre-releaseElectron + Next.js 14 + React. Bundles its own RigRun Go backend so a single installer ships the server and the UI together. WebAssembly LLM in the renderer (via @mlc-ai/web-llm) provides a second inference path independent of the Go backend.
RigRun Mobile
Not yet releasedFlutter native app for iOS and Android. Riverpod state, Hive storage, Dio HTTP, freezed models, flutter_secure_storage for credentials. Connects to your RigRun server over Tailscale or any private mesh — your data never touches a third-party cloud.
Both apps are in active development. Desktop pre-release builds available on request. Mobile public release timing TBD.
Pricing
Technical preview now. Production licensing soon.
Founding Access
Three founding seats. 90 days of unlimited inference on the same stack we use to ship Thornveil's products. Hand-selected previewers who shape the roadmap and lock in pricing when paid access opens.
- Unlimited inference on Qwen3.5-122B (no metering)
- All 6 domain agents included
- OpenAI-compatible API endpoint
- Direct line to the builder
- Roadmap input
- Locked-in pricing when paid tiers open
- Your conversations are preserved — when you bring RigRun in-house, your local copy ships pre-trained on your own usage
RigRun License
Deploy on your own hardware. Your data never leaves your building. Pricing finalizes after the Founding Access program closes.
- Full server binary + desktop app
- Unlimited local inference
- 5-layer safety proxy + routing
- Self-training pipeline
- Agent Factory (unlimited agents)
- Self-regulating inference engine (7 autonomous optimization layers)
- 1 year of updates + support
Enterprise
Multi-node, classified environments, mesh networking.
- Everything in RigRun License
- Mycelium mesh networking
- Classification routing (CUI–TS)
- On-premise installation support
- Custom agent development
- Dedicated technical support
Domain Expert Agents
Built by the Agent Factory. 14-step pipeline. Included with every Founding Access seat. Custom agents available as standalone projects — contact us for scoping.
Code Reviewer
9 toolsFinds real bugs, security vulnerabilities, and performance bottlenecks. Produces structured JSON reviews with severity ratings, file:line locations, impact analysis, and concrete fix suggestions.
Structured JSON with verdict, severity, file:line, impact, fix suggestion
Security Auditor
11 toolsComprehensive code audits mapped to CWE, OWASP Top 10, and CVSS 3.1 scores. Traces data flow from source to sink. Checks auth, authz, crypto, and dependencies.
Audit report with vulnerability table, data flow maps, attack paths, remediation
Documentation Writer
10 toolsReads actual code before writing anything. Produces READMEs, API docs, architecture overviews, and setup guides. Every claim traceable to specific file:line references.
Markdown documentation with file:line citations and verification steps
Sprint Planner
10 toolsAnalyzes codebases and git history to create actionable sprint plans. P0/P1/P2 priority. Each task 30-120 minutes with concrete steps and verifiable done-when criteria.
JSON sprint plan with priority, duration, steps, and completion criteria
SBIR Proposal Writer
7 toolsWrites SBIR/STTR proposals for DoD, DHS, and NSF. Leads with the agency problem. Quantifies with benchmarks. TAM/SAM/SOM analysis. Work plans with milestones.
Complete proposal sections with compliance matrix and evaluation alignment
Patent Drafter
10 toolsDrafts provisional patent applications for software and AI/ML inventions. 15-25 claims with proper legal language. Analyzes prior art. Handles Alice Corp. eligibility.
Complete provisional application with claims, specification, and abstract
Powered by Agent Factory
How agents are made
Every agent goes through a 14-step manufacturing pipeline. No manual prompt engineering. No guesswork.
Need something specific?
The Agent Factory can build agents for any domain. Legal research, financial analysis, medical literature review, real estate, HR policy, marketing copy. Describe what you need.
HawkStack
Custom ML models trained on your data. Thornveil predicts performance before training, build sub-2M parameter models that beat 50M+ competitors, and deploy to edge hardware. Six domains validated. One architecture.
Proven Domains
Predict Before Training
Topology analysis predicts the performance ceiling, optimal architecture, and training recipe before a single model is trained. You know what's achievable before paying for compute.
Tiny Models Beat Giants
Custom WEM (Weighted Expert Mixture) backbone with domain-adaptive receptive field branches. The result: models under 2M parameters that run on microcontrollers and SBCs (Jetson, Kria, COTS-ruggedized) while matching models 30x their size.
Performance Commitment
If the predicted ceiling is 92% mAP and the delivered model achieves 85%, deliverable acceptance is tied to the predicted performance ceiling. Thornveil only commits to delivering what the topology math supports.
6 domains /// 4 modalities /// 1 architecture /// Patent pending
Service Tiers
Your data. Your model. Your hardware.
Topology Audit
Know what's achievable before you commit. Send us your dataset characteristics — Thornveil predicts the ceiling.
- 3-parameter topology analysis
- Predicted performance ceiling
- Recommended architecture
- Estimated training time
- Dataset sufficiency assessment
Custom Model Build
Full model design, training, and validation. Deliverable: trained model (usually <100KB), training code, deployment guide.
- Everything in Topology Audit
- WEM backbone with custom RF branches
- SGDR cosine-restart training with loss surface basin analysis
- Prototypical network heads for imbalanced classes
- Trained model + training code
- Performance commitment vs ceiling
Edge Deployment
Full build plus production deployment. ONNX/TensorRT/CoreML export, latency benchmarking, drift detection.
- Everything in Custom Model Build
- ONNX / TensorRT / CoreML export
- Target hardware benchmarking
- Adaptive inference pipeline
- Monitoring + drift detection
- Integration support
Target markets: defense/intelligence, medical devices, industrial inspection, wildlife conservation.
ROI story: a $15K model that runs on a COTS MCU saves $500K/year in GPU deployment costs across your fleet.
Pyros Safety Engine
A 17,000-line pure-Go engine that wraps any LLM in a 7-pillar safety pipeline. Drop it in front of RigRun, OpenAI, Anthropic, vLLM, llama.cpp, or Ollama — Pyros doesn't care which model is downstream.
Why this exists
Most LLM safety stacks are Python wrappers around a moderation API call. Pyros is a separate process you put between your application and the model, written in pure Go with no Python dependency, that implements the actual safety algorithms from the literature — SmoothLLM, isotonic calibration, EvoPrompt, negative-selection AIS, PID homeostasis — instead of forwarding the question to a hosted classifier.
Pyros does not require RigRun. It runs as a standalone HTTP service on port 3100 and accepts any backend that speaks the OpenAI-compatible chat completions format.
The seven pillars
Each pillar consolidates 2–5 algorithmic packages into a single Layer in the pipeline. The pipeline is variable-width — easy queries skip expensive pillars.
Oracle
Predictive intelForecasts request load via time-series, UCB1-routes by difficulty, emits skip signals so downstream pillars can short-circuit easy queries.
Tribunal
Adversarial verification6-feature hallucination probe with weighted scoring. Optional reward-model integration. Courtroom-style structured verification workflow.
Fortress
Pre-inference defensePrompt injection BLOCKS at confidence ≥ 0.7. SmoothLLM stability check (Robey 2023). Negative-selection AIS detectors learn your traffic's baseline.
MindEye
MetacognitionKnowledge-graph context injection. Epigenetic temperature/token suggestions. Pure-Go isotonic calibration writing a QualityScore on every response.
Forge
Self-improvementEvoPrompt crossover for prompt evolution. MDL prompt compression. DICE training-exemplar capture. The pipeline gets sharper on your workload over time.
Crucible
Operational safetyCircuit breakers, OpenTelemetry trace spans, Shannon-entropy criticality scoring, pheromone-style stigmergic blackboard for inter-agent signaling.
Singularity
Homeostatic regulationPID controller over runtime vitals with anti-windup. Superposition activation for parallel candidates. Discrete-event digital-twin simulator.
Every feature has a file:line citation
Pyros doesn't have marketing claims. It has code. Each feature below points at the specific file and line in the source where the algorithm lives.
Prompt injection blocking
pillar/fortress.go:47Blocks at confidence ≥ 0.7 before the request reaches the model. SmoothLLM perturbation + supermajority vote layered on top.
Hallucination probe
hallucination/probe.go:706 weighted features: 4-gram repetition, entity consistency, numeric density, hedge density, entropy variance, confidence gap.
Isotonic calibration
calibration/calibration.go:226Pure-Go PAVA implementation, validated against sklearn 1.4. Honestly labeled "QualityScore" — not a calibrated probability.
Constant-time auth
server_auth.go:53Mandatory bearer token for non-localhost binds. Compared via subtle.ConstantTimeCompare. Timing-attack hardened by default.
Admission + graceful shutdown
pyros.go:138Sized inbound semaphore, in-flight WaitGroup, async PostWorker drained before persistence so bookkeeping writes never get lost.
Audit chain
audit/file_sink.goJSON-Lines audit sink with mutex serialization. Pluggable backend interface — file sink ships, syslog/network sinks slot in.
Why pure Go
No Python.
No CGO.
One binary.
Pyros reimplements the algorithms it needs from primary sources rather than importing sklearn, numpy, chromadb, or simpy. PAVA isotonic regression. UCB1 bandit. Cosine vector store. Discrete-event simulator. MDL compressor. EvoPrompt crossover.
Ships as a single statically-linked artifact. Drops onto an air-gapped system without an installer. PAVA implementation is bit-for-bit validated against sklearn 1.4 in calibration/pava_reference_test.go.
Research grounding
Built on the literature, not on vibes
Works with anything that speaks chat completions
Provider packages ship for Ollama, vLLM, and llama.cpp. Anything else that speaks the OpenAI-compatible chat completions format slots in through the same adapter.
Want to put Pyros in front of your stack?
Pyros is in early access. If you're running an LLM in production and want a real safety perimeter — not a moderation API call — get in touch.
Mycelium Mesh
A peer-to-peer mesh network that turns heterogeneous devices — phones, laptops, workstations, servers — into nodes in a distributed AI inference system. Built for defense, enterprise, and edge environments where centralized cloud AI is not viable. No center, no config, no exfiltration, no connectivity requirement, no capability floor.
Why this exists
Most distributed inference systems assume a datacenter — InfiniBand, RDMA, dedicated edge servers, or trusted hardware enclaves. Mycelium assumes none of that. It's built for the WiFi you already have, the laptops your team already owns, and nodes you don't fully trust. Designed for DoD contested communications, disaster response, privacy-sensitive environments, and developing regions where the centralized cloud is unreliable, expensive, or hostile.
Three core innovations are backed by USPTO provisional THRN-022 (40 claims): distributed expert routing with predictive prefetching, post-generation style normalization for cross-model consistency, and a 3-layer proof-of-inference protocol that catches lazy or compromised nodes.
Three core innovations
Each is implemented in source. Each has a USPTO provisional claim attached. Each one solves a problem that has stopped prior distributed-MoE work from running on consumer hardware.
Distributed MoE Expert Routing
The mesh expert system tracks which peers host which model experts via the gossip protocol. When a query arrives, the expert registry locates the best peer for each required expert computation by load and latency. An expert proxy dispatches computation requests with predictive prefetching — firing async requests during the attention computation window to hide network latency.
mesh/expert_proxy.go Cross-Model Style Consistency
Different nodes may run different model sizes (0.6B to 122B parameters). The style normalizer applies post-generation processing to ensure consistent tone, formatting, and verbosity regardless of which node generated the response. Hedging removal, preamble stripping, code-block normalization, verbosity control. Seamless escalation uses entropy monitoring to detect when a small model is uncertain and transparently hand off mid-generation to a larger peer.
mesh/style.go Proof-of-Inference (PoI)
Three-layer cryptographic verification ensuring mesh nodes perform honest computation. Behavioral fingerprinting catches model substitution. Merkle execution traces catch computation shortcuts. Economic reputation with 1% spot-check rate makes persistent dishonesty unprofitable. Combined verdict: 0.3·fingerprint + 0.3·trace + 0.4·reputation.
mesh/poi.go Architecture
An 8-layer P2P stack
Each layer is a standalone Go package. The full stack runs as a single self-contained inference server on every node. Hardware detection at the bottom assigns roles automatically; the application layer at the top exposes a drop-in OpenAI-compatible API.
The mesh layer alone is 19 Go files / 11K LOC covering DHT, gossip, expert proxy, NAT traversal, load balancing, proof-of-inference, and style normalization. Phase 1 reuses 32 packages extracted from RigRun (auth, router, backend, security, training, memory, RAG).
Hardware tiers
Mycelium auto-detects hardware capability and assigns each node a role. A phone running a 1B model is just as much a mesh participant as a multi-GPU fortress server — they just route different workloads.
Discovery + propagation protocols
How nodes find each other, share trained adapters, and distribute model files across the mesh.
Discovery (mDNS + DHT + STUN)
LAN: mDNS advertisement under _mycelium._tcp.local. with capability metadata. DHT: Kademlia 160-bit node IDs, k=20 buckets. WAN: UPnP port mapping + STUN (RFC 5389) for NAT traversal. Manual peer registration available for cross-subnet connectivity.
Adapter Gossip Protocol (AGP)
Nodes share locally-trained LoRA/QLoRA adapters via epidemic gossip. Announcements propagate with fanout=3 every 2 minutes. Adapters carry loss metrics, dataset size, and weight hashes. Receiving nodes evaluate against local test sets. BitTorrent-style chunked transfer (64MB chunks, SHA-256 verified).
Model Distribution Protocol (MDP)
Full model files distributed across the mesh. Chunked transfer with parallel downloads (4 concurrent, 64MB per chunk), SHA-256 per-chunk and full-file verification, resume support for interrupted transfers.
Project status
Five of six phases complete. Phase 6 (production hardening) is in progress — end-to-end integration tests across multi-node deployments.
Core Protocol
CompleteQuery Routing
CompleteDistributed Inference
CompleteConsistency
CompleteWAN Scaling
CompleteProduction Hardening
In ProgressSecurity baseline
Want to deploy a mesh?
Mycelium licensing is available for defense, enterprise, and research deployments. Tell Thornveil what you're trying to put on the mesh for a scoping conversation.