Sovereign HPC Cluster for Rare Earth Research
Proposed Compute and Storage Cluster for Rare Earth Research
Air-Gapped, Deterministic Scientific Computing Platform for Strategic Mineral Research
A National-Interest Scientific Computing Platform
The rare-earth research mission demands a computing environment that is secure, largely offline, deterministic, and auditable. The platform is shaped around the physics and evidence chain of the mission — CPU density, memory capacity, and air-gapped ingest — not around GPU-heavy AI cluster fashion.
Local Language Model
Helps scientists interrogate results and navigate a large scientific corpus without sending sensitive data to external cloud.
~5 Million PDF Library
Starting corpus covering magnetic fields, materials science, and related technical literature — fully indexed on-cluster.
Field Telemetry Ingestion
Stream of electromagnetic and multi-modal signals gathered during mining operations — physically transported and air-gapped on import.
Molecular-Level Modeling
Atomic and molecular simulation that moves the problem beyond data analytics into real scientific computing.
Competing Model Portfolio
Parallel evaluation of multiple analytical approaches to assess processing value and determine which methods are economically meaningful.
Controlled Offline Ingest
Field data arrives via removable media through a formal quarantine boundary — chain-of-custody is part of scientific defensibility.
Five Principles. One Sovereign Platform.
Determinism Before Novelty
Analytical outputs must be explainable and reproducible. Default tools are numerical transforms, explicit statistical methods, and solver-based approaches. ML may assist discovery but never becomes the sole source of critical conclusions.
Air-Gapped by Operating Model
Field data is physically transported. The cluster has a formal ingest boundary — quarantine nodes, signed checksums, media handling procedures, and release workflows are architectural components, not operational afterthoughts.
CPU-Dominant Compute
Electromagnetic analysis, waveform processing, inverse modeling, and model portfolio testing scale better with CPU cores and memory than with GPUs. Capital is spent where the real compute pressure lies.
Storage as a First-Class Capability
Storage is part of the scientific method. Researchers need rapid access to active datasets, economical capacity for historical telemetry, and preserved snapshots that allow any run to be reconstructed.
Small but High-Quality AI Tier
A local language model helps scientists navigate the evidence base and query structured repositories — clearly bounded by provenance and human oversight. Not a black box. Not the center of the architecture.
28 Servers · 4 Racks · 7 Functional Tiers
Each tier is purpose-matched to its dominant workload. Every dollar of capex is aligned with the actual compute it performs.
| Tier | Count | CPU / Storage | Memory | GPU |
|---|---|---|---|---|
Management / Orchestration Scheduler, RBAC, provenance logging, package control, immutable audit trail. | 2 nodes · 1U | 🔒 Confidential | 128–256 GB ECC RAM | None |
Air-Gap Ingest / Quarantine Offline import, checksum validation, malware scanning, metadata normalization, signed ingest records. | 2 nodes · 1U | 🔒 Confidential | Standard | None |
Deterministic CPU HPC Signal processing, inversion, statistics, EM analysis, model portfolio runs. | 12 nodes · 2U | 🔒 Confidential | 1 TB ECC RAM / node | None |
High-Memory Science Materials and molecular simulation, large in-memory jobs, matrix operations. | 4 nodes · 2U | 🔒 Confidential | 2 TB ECC RAM / node | Optional (deferred) |
GPU Inference Local LLM, embeddings, scientific literature navigation, limited multimodal assistance. | 2 nodes · 4U | 🔒 Confidential | 512 GB ECC RAM | 🔒 Confidential |
Hot NVMe Storage Active datasets, vector index, project scratch, current telemetry. | 2 nodes · 2U | — | 256 GB RAM | ~250–280 TB usable |
Warm Storage PDF corpus, telemetry history, processed data, reproducibility snapshots. | 4 nodes · 4U | — | NVMe cache | ~1.2–1.5 PB usable |
Storage as a Scientific Instrument
Five million documents become raw PDFs, normalized text, vector embeddings, graph relationships, extracted tables, and cross-run artifacts. Field telemetry multiplies similarly. Day 1 targets 1.5 PB usable online — split into fast and economical tiers to avoid exhausting capacity just as the research team relies on the platform.
Active projects, vector index, fast scratch, current telemetry. All NVMe with 256 GB RAM and dual high-speed links.
PDF corpus, telemetry history, processed datasets, reproducibility snapshots. Dense HDD with NVMe cache tier.
Long-term retention and campaign archive. Object or tape storage. Critical for multi-year research continuity.
Chain-of-Custody Is Part of the Science
Without a formal ingest boundary, the organization may later be unable to prove that an analytical result came from a specific field acquisition — or that data were not altered or mixed with another campaign. For strategic mineral research, that is too large a weakness to accept.
Physical Media Arrives
Removable media from field operations lands on quarantine nodes — isolated from the research fabric.
Checksum Verification
Integrity verified against field-recorded checksums. Operator context and physical media identity are recorded.
Payload Scanning
Malicious content scan and metadata normalization performed in the quarantine environment.
Signed Ingest Record
An immutable, signed ingest record is created. Only then is the approved dataset released into the research fabric.
Intentionally Modest
Capital Investment
Day 1 IT Stack
Into existing secure room. ROM estimate.
Fully Deployed
Includes IT stack, facility hardening, power redundancy, digital twin, installation, and 36-month support.
Physical Footprint
Three Paths to Sovereign Compute
Each architecture reflects a different philosophy of performance density, operational control, and capital efficiency. The right choice depends on timeline, budget envelope, and long-term strategic positioning.
Modular Practical
Balanced · scalable · operationally efficient
Dense, modular HPC cluster optimized for CPU-dominant workloads with strong price-to-performance efficiency. Designed for rapid deployment, predictable scaling, and straightforward operations.
Configuration
- Multi-node dense compute chassis
- 12 standard compute nodes
- 4 high-memory science nodes
- 2 GPU-assisted nodes (limited scope)
- NVMe hot storage + high-capacity warm storage
- High-speed fabric (200 Gb class)
Strengths
- ✓Best balance of cost, performance, deployability
- ✓Flexible expansion without architectural lock-in
- ✓Easier maintenance and component replacement
- ✓Suitable for field-adjacent deployments
Limitations
- —Slightly lower peak density vs Premium HPC
- —Less integrated cooling and fabric optimization
IT Stack Only
$1.7M – $2.9M
Fully Deployed
$2.1M – $3.5M
Best fit: National lab pilots · sovereign edge · scalability-first programs
Premium HPC
Maximum density · integrated system · national infrastructure
Fully integrated supercomputing-class system designed for maximum compute density, tightly coupled workloads, and long-term expansion into large-scale national infrastructure.
Configuration
- Fully integrated HPC cabinets
- Advanced cooling (air or liquid-assisted)
- High-density compute nodes
- High-memory nodes integrated into fabric
- GPU capability (expandable)
- Tightly integrated high-performance interconnect
Strengths
- ✓Highest performance ceiling
- ✓Best suited for large-scale scientific modeling
- ✓Superior thermal efficiency at high densities
- ✓Strong long-term scalability to multi-megawatt
Limitations
- —Higher capital cost
- —More complex deployment and integration
- —Vendor ecosystem dependency
- —Over-provisioned for smaller clusters
IT Stack Only
$2.8M – $4.8M
Fully Deployed
$3.5M – $5.5M
Best fit: National flagship research · long-term sovereign infrastructure
Retail Option
Component-based · lowest upfront cost · highest operational burden
Individually assembled server components sourced from standard enterprise or workstation-grade suppliers. Emphasizes low upfront cost but sacrifices system-level optimization.
Configuration
- Individually racked servers
- Standard enterprise motherboards and chassis
- Mixed storage nodes
- Conventional networking (Ethernet or entry HPC fabric)
- Minimal system-level integration
Strengths
- ✓Lowest initial capital expenditure
- ✓Maximum flexibility in component sourcing
- ✓Rapid procurement in constrained environments
Limitations
- —Higher failure rates over time
- —Increased operational complexity
- —Lower density, higher power per compute unit
- —Difficult to manage at scale
- —Weak deterministic performance consistency
IT Stack Only
$1.2M – $2.0M
Fully Deployed
$1.5M – $2.5M
Best fit: Early-stage experimentation · prototyping phases only
| Attribute | Modular Practical | Premium HPC | Retail Option |
|---|---|---|---|
| Performance Density | High | Very High | Moderate |
| Deterministic Behavior | Strong | Very Strong | Variable |
| Scalability | Excellent | Excellent (large-scale) | Limited |
| Deployment Speed | Fast | Moderate | Fast |
| Operational Complexity | Moderate | High | High |
| Cost Efficiency | Best | Lowest efficiency | Lowest upfront |
| Long-term Viability | High | Very High | Low |
| IT Stack Price Range | $1.7M – $2.9M | $2.8M – $4.8M | $1.2M – $2.0M |
End-to-End. From Financing to Operations.
Your platform includes c-suite level consultation and hardware selection. We walk with you through the entire process — financing, commissioning, digital twin deployment, and long-term hardware support.
Access to Financing
- Subject to approval — 75% CAPEX financing available
- Your savings over 5 years are typically double what you would have paid a cloud provider
- Full data sovereignty — no ongoing per-core cloud costs
- Structured as infrastructure program financing
Operational Digital Twin
- Real-time monitoring of cluster health and workload
- Power, thermal, and GPU utilization telemetry
- Maintenance alerts and workload management tools
- Centralized dashboard for all rack and node status
Mobilization & Installation
- Installation services — 5 days on-site
- Technician labor: 2 technicians × 6 days
- Engineering oversight: site scan, remote prep and validation
- Full power-on testing and handover
Digital Twin & Monitoring (36 months)
- Digital twin modeling and deployment
- Integration and telemetry mapping
- Monitoring platform license included (36 months)
- Remote support included (36 months)
Hardware Support & Preventive Maintenance
- Next business day on-site response upon monitoring alert
- Physical hardware diagnostics and fault isolation
- Component removal and installation: GPU, CPU, RAM, SSD, PSU, NIC, cables
- Coordination for replacement parts procurement
- Post-swap hardware verification and power-on testing
- Quarterly preventive maintenance: thermal checks, firmware review
- Spare parts inventory maintained — repairs within 48 hours of issue
- Handoff to remote monitoring team for software validation
Building a Sovereign Research Environment?
We design, finance, and govern sovereign HPC clusters for national-interest scientific missions — air-gapped, auditable, and purpose-built.