Architecture
NodeFoundry is a bare-metal Ceph cluster management platform. It automates the entire lifecycle — from PXE booting raw servers to running a production storage cluster.
Components
Sentinel (Master Node)
The Sentinel is the control plane running on the master node. It handles:
- iPXE boot server — serves boot images to new nodes over HTTP
- DHCP server — assigns IPs and boot parameters to nodes
- Task engine — schedules and tracks background operations (OSD creation, image downloads, etc.)
- REST API — exposes all functionality via HTTP endpoints
- Web dashboard — browser-based UI for cluster management
- SSE event stream — real-time updates for UI and monitoring
Worker Nodes
Worker nodes are bare-metal servers that run Ceph daemons. Each worker:
- PXE boots from the Sentinel and registers automatically
- Reports hardware inventory (disks, NICs, CPU, RAM) to the master
- Runs Ceph daemons assigned to it: MON, MGR, OSD, MDS, RGW
- Executes tasks sent by the Sentinel’s task engine
nf CLI
The nf command-line tool communicates with the Sentinel’s REST API. It provides a human-friendly interface for all operations — from listing nodes to deploying Ceph daemons.
How it fits together
┌─────────────────────────┐
│ Sentinel │
│ (Master Node) │
│ │
│ ┌─────┐ ┌──────────┐ │
nf CLI ──────────▶│ │ API │ │ Task │ │
│ └──┬──┘ │ Engine │ │
Dashboard ───────▶│ │ └─────┬────┘ │
│ │ │ │
│ ┌──▼───────────▼──┐ │
│ │ iPXE / DHCP │ │
│ └────────┬────────┘ │
└───────────┼─────────────┘
│
┌─────────────────┼─────────────────┐
│ │ │
┌─────▼─────┐ ┌─────▼─────┐ ┌──────▼────┐
│ Worker 01 │ │ Worker 02 │ │ Worker 03 │
│ │ │ │ │ │
│ MON OSD │ │ MON OSD │ │ MON OSD │
│ MGR OSD │ │ OSD │ │ OSD │
└───────────┘ └───────────┘ └───────────┘ Node lifecycle
- Discovery — node PXE boots, gets an IP, pulls the bootstrap image
- Registration — node reports its hardware to the Sentinel and enters
pendingstatus - Provisioning — admin deploys Ceph daemons onto the node via CLI or API
- Active — node runs its assigned daemons and participates in the cluster
- Maintenance — node can be drained for hardware work without affecting cluster health