Strategic Research · National Interest

Sovereign HPC Cluster for Rare Earth Research

Proposed Compute and Storage Cluster for Rare Earth Research

Air-Gapped, Deterministic Scientific Computing Platform for Strategic Mineral Research

2,300–3,000
CPU Cores
~20 TB
Aggregate RAM
~1.5 PB
Online Storage
$2.3–2.9M
Day 1 Estimate
Mission Context

A National-Interest Scientific Computing Platform

The rare-earth research mission demands a computing environment that is secure, largely offline, deterministic, and auditable. The platform is shaped around the physics and evidence chain of the mission — CPU density, memory capacity, and air-gapped ingest — not around GPU-heavy AI cluster fashion.

Local Language Model

Helps scientists interrogate results and navigate a large scientific corpus without sending sensitive data to external cloud.

~5 Million PDF Library

Starting corpus covering magnetic fields, materials science, and related technical literature — fully indexed on-cluster.

Field Telemetry Ingestion

Stream of electromagnetic and multi-modal signals gathered during mining operations — physically transported and air-gapped on import.

Molecular-Level Modeling

Atomic and molecular simulation that moves the problem beyond data analytics into real scientific computing.

Competing Model Portfolio

Parallel evaluation of multiple analytical approaches to assess processing value and determine which methods are economically meaningful.

Controlled Offline Ingest

Field data arrives via removable media through a formal quarantine boundary — chain-of-custody is part of scientific defensibility.

Design Principles

Five Principles. One Sovereign Platform.

🔬

Determinism Before Novelty

Analytical outputs must be explainable and reproducible. Default tools are numerical transforms, explicit statistical methods, and solver-based approaches. ML may assist discovery but never becomes the sole source of critical conclusions.

🔒

Air-Gapped by Operating Model

Field data is physically transported. The cluster has a formal ingest boundary — quarantine nodes, signed checksums, media handling procedures, and release workflows are architectural components, not operational afterthoughts.

⚙️

CPU-Dominant Compute

Electromagnetic analysis, waveform processing, inverse modeling, and model portfolio testing scale better with CPU cores and memory than with GPUs. Capital is spent where the real compute pressure lies.

💾

Storage as a First-Class Capability

Storage is part of the scientific method. Researchers need rapid access to active datasets, economical capacity for historical telemetry, and preserved snapshots that allow any run to be reconstructed.

🤖

Small but High-Quality AI Tier

A local language model helps scientists navigate the evidence base and query structured repositories — clearly bounded by provenance and human oversight. Not a black box. Not the center of the architecture.

Day 1 Architecture

28 Servers · 4 Racks · 7 Functional Tiers

Each tier is purpose-matched to its dominant workload. Every dollar of capex is aligned with the actual compute it performs.

TierCountCPU / StorageMemoryGPU
Management / Orchestration
Scheduler, RBAC, provenance logging, package control, immutable audit trail.
2 nodes · 1U🔒 Confidential128–256 GB ECC RAMNone
Air-Gap Ingest / Quarantine
Offline import, checksum validation, malware scanning, metadata normalization, signed ingest records.
2 nodes · 1U🔒 ConfidentialStandardNone
Deterministic CPU HPC
Signal processing, inversion, statistics, EM analysis, model portfolio runs.
12 nodes · 2U🔒 Confidential1 TB ECC RAM / nodeNone
High-Memory Science
Materials and molecular simulation, large in-memory jobs, matrix operations.
4 nodes · 2U🔒 Confidential2 TB ECC RAM / nodeOptional (deferred)
GPU Inference
Local LLM, embeddings, scientific literature navigation, limited multimodal assistance.
2 nodes · 4U🔒 Confidential512 GB ECC RAM🔒 Confidential
Hot NVMe Storage
Active datasets, vector index, project scratch, current telemetry.
2 nodes · 2U256 GB RAM~250–280 TB usable
Warm Storage
PDF corpus, telemetry history, processed data, reproducibility snapshots.
4 nodes · 4UNVMe cache~1.2–1.5 PB usable
Storage Architecture

Storage as a Scientific Instrument

Five million documents become raw PDFs, normalized text, vector embeddings, graph relationships, extracted tables, and cross-run artifacts. Field telemetry multiplies similarly. Day 1 targets 1.5 PB usable online — split into fast and economical tiers to avoid exhausting capacity just as the research team relies on the platform.

Hot NVMe2 nodes
~250–280 TB usable

Active projects, vector index, fast scratch, current telemetry. All NVMe with 256 GB RAM and dual high-speed links.

Warm Storage4 nodes
~1.2–1.5 PB usable

PDF corpus, telemetry history, processed datasets, reproducibility snapshots. Dense HDD with NVMe cache tier.

Cold ArchiveExternal tier
3–5 PB (optional)

Long-term retention and campaign archive. Object or tape storage. Critical for multi-year research continuity.

Secure Offline Ingest

Chain-of-Custody Is Part of the Science

Without a formal ingest boundary, the organization may later be unable to prove that an analytical result came from a specific field acquisition — or that data were not altered or mixed with another campaign. For strategic mineral research, that is too large a weakness to accept.

01

Physical Media Arrives

Removable media from field operations lands on quarantine nodes — isolated from the research fabric.

02

Checksum Verification

Integrity verified against field-recorded checksums. Operator context and physical media identity are recorded.

03

Payload Scanning

Malicious content scan and metadata normalization performed in the quarantine environment.

04

Signed Ingest Record

An immutable, signed ingest record is created. Only then is the approved dataset released into the research fabric.

Power Envelope

Intentionally Modest

Average IT Load~32.5 kW
Realistic Peak IT Load~43 kW
IT Design Envelope50 kW
Facility Envelope (inc. cooling + UPS)70–85 kW
CPU HPC Tier (avg / peak)14.4 kW / 19.2 kW
GPU Inference Nodes4.0 kW avg
Budget

Capital Investment

Day 1 IT Stack

Into existing secure room. ROM estimate.

$2.3–2.9M

Fully Deployed

Includes IT stack, facility hardening, power redundancy, digital twin, installation, and 36-month support.

$6.9M – $8.7M

Physical Footprint

Rack 16 CPU HPC nodes, 2 management nodes, IB switch, networking
Rack 26 CPU HPC nodes, 4 high-memory nodes, 2 ingest nodes
Rack 32 GPU nodes, 2 hot NVMe nodes, 2 warm storage nodes
Rack 42 warm storage nodes, UPS/power, future expansion space
Deployment Architecture Options

Three Paths to Sovereign Compute

Each architecture reflects a different philosophy of performance density, operational control, and capital efficiency. The right choice depends on timeline, budget envelope, and long-term strategic positioning.

Recommended
Option 1

Modular Practical

Balanced · scalable · operationally efficient

Dense, modular HPC cluster optimized for CPU-dominant workloads with strong price-to-performance efficiency. Designed for rapid deployment, predictable scaling, and straightforward operations.

Configuration

  • Multi-node dense compute chassis
  • 12 standard compute nodes
  • 4 high-memory science nodes
  • 2 GPU-assisted nodes (limited scope)
  • NVMe hot storage + high-capacity warm storage
  • High-speed fabric (200 Gb class)

Strengths

  • Best balance of cost, performance, deployability
  • Flexible expansion without architectural lock-in
  • Easier maintenance and component replacement
  • Suitable for field-adjacent deployments

Limitations

  • Slightly lower peak density vs Premium HPC
  • Less integrated cooling and fabric optimization

IT Stack Only

$1.7M – $2.9M

Fully Deployed

$2.1M – $3.5M

Best fit: National lab pilots · sovereign edge · scalability-first programs

Option 2

Premium HPC

Maximum density · integrated system · national infrastructure

Fully integrated supercomputing-class system designed for maximum compute density, tightly coupled workloads, and long-term expansion into large-scale national infrastructure.

Configuration

  • Fully integrated HPC cabinets
  • Advanced cooling (air or liquid-assisted)
  • High-density compute nodes
  • High-memory nodes integrated into fabric
  • GPU capability (expandable)
  • Tightly integrated high-performance interconnect

Strengths

  • Highest performance ceiling
  • Best suited for large-scale scientific modeling
  • Superior thermal efficiency at high densities
  • Strong long-term scalability to multi-megawatt

Limitations

  • Higher capital cost
  • More complex deployment and integration
  • Vendor ecosystem dependency
  • Over-provisioned for smaller clusters

IT Stack Only

$2.8M – $4.8M

Fully Deployed

$3.5M – $5.5M

Best fit: National flagship research · long-term sovereign infrastructure

Option 3

Retail Option

Component-based · lowest upfront cost · highest operational burden

Individually assembled server components sourced from standard enterprise or workstation-grade suppliers. Emphasizes low upfront cost but sacrifices system-level optimization.

Configuration

  • Individually racked servers
  • Standard enterprise motherboards and chassis
  • Mixed storage nodes
  • Conventional networking (Ethernet or entry HPC fabric)
  • Minimal system-level integration

Strengths

  • Lowest initial capital expenditure
  • Maximum flexibility in component sourcing
  • Rapid procurement in constrained environments

Limitations

  • Higher failure rates over time
  • Increased operational complexity
  • Lower density, higher power per compute unit
  • Difficult to manage at scale
  • Weak deterministic performance consistency

IT Stack Only

$1.2M – $2.0M

Fully Deployed

$1.5M – $2.5M

Best fit: Early-stage experimentation · prototyping phases only

AttributeModular PracticalPremium HPCRetail Option
Performance DensityHighVery HighModerate
Deterministic BehaviorStrongVery StrongVariable
ScalabilityExcellentExcellent (large-scale)Limited
Deployment SpeedFastModerateFast
Operational ComplexityModerateHighHigh
Cost EfficiencyBestLowest efficiencyLowest upfront
Long-term ViabilityHighVery HighLow
IT Stack Price Range$1.7M – $2.9M$2.8M – $4.8M$1.2M – $2.0M
What Apex Foundry Delivers

End-to-End. From Financing to Operations.

Your platform includes c-suite level consultation and hardware selection. We walk with you through the entire process — financing, commissioning, digital twin deployment, and long-term hardware support.

A

Access to Financing

  • Subject to approval — 75% CAPEX financing available
  • Your savings over 5 years are typically double what you would have paid a cloud provider
  • Full data sovereignty — no ongoing per-core cloud costs
  • Structured as infrastructure program financing
B

Operational Digital Twin

  • Real-time monitoring of cluster health and workload
  • Power, thermal, and GPU utilization telemetry
  • Maintenance alerts and workload management tools
  • Centralized dashboard for all rack and node status
C

Mobilization & Installation

  • Installation services — 5 days on-site
  • Technician labor: 2 technicians × 6 days
  • Engineering oversight: site scan, remote prep and validation
  • Full power-on testing and handover
D

Digital Twin & Monitoring (36 months)

  • Digital twin modeling and deployment
  • Integration and telemetry mapping
  • Monitoring platform license included (36 months)
  • Remote support included (36 months)
E

Hardware Support & Preventive Maintenance

  • Next business day on-site response upon monitoring alert
  • Physical hardware diagnostics and fault isolation
  • Component removal and installation: GPU, CPU, RAM, SSD, PSU, NIC, cables
  • Coordination for replacement parts procurement
  • Post-swap hardware verification and power-on testing
  • Quarterly preventive maintenance: thermal checks, firmware review
  • Spare parts inventory maintained — repairs within 48 hours of issue
  • Handoff to remote monitoring team for software validation

Building a Sovereign Research Environment?

We design, finance, and govern sovereign HPC clusters for national-interest scientific missions — air-gapped, auditable, and purpose-built.