Get started

Architecture

NodeFoundry is a bare-metal Ceph cluster management platform. It automates the entire lifecycle — from PXE booting raw servers to running a production storage cluster.


Components

Sentinel (Master Node)

The Sentinel is the control plane running on the master node. It handles:

  • iPXE boot server — serves boot images to new nodes over HTTP
  • DHCP server — assigns IPs and boot parameters to nodes
  • Task engine — schedules and tracks background operations (OSD creation, image downloads, etc.)
  • REST API — exposes all functionality via HTTP endpoints
  • Web dashboard — browser-based UI for cluster management
  • SSE event stream — real-time updates for UI and monitoring

Worker Nodes

Worker nodes are bare-metal servers that run Ceph daemons. Each worker:

  • PXE boots from the Sentinel and registers automatically
  • Reports hardware inventory (disks, NICs, CPU, RAM) to the master
  • Runs Ceph daemons assigned to it: MON, MGR, OSD, MDS, RGW
  • Executes tasks sent by the Sentinel’s task engine

nf CLI

The nf command-line tool communicates with the Sentinel’s REST API. It provides a human-friendly interface for all operations — from listing nodes to deploying Ceph daemons.

How it fits together

                    ┌─────────────────────────┐
                    │       Sentinel          │
                    │   (Master Node)         │
                    │                         │
                    │  ┌─────┐  ┌──────────┐ │
  nf CLI ──────────▶│  │ API │  │ Task     │ │
                    │  └──┬──┘  │ Engine   │ │
  Dashboard ───────▶│     │     └─────┬────┘ │
                    │     │           │      │
                    │  ┌──▼───────────▼──┐   │
                    │  │  iPXE / DHCP    │   │
                    │  └────────┬────────┘   │
                    └───────────┼─────────────┘
                                │
              ┌─────────────────┼─────────────────┐
              │                 │                  │
        ┌─────▼─────┐    ┌─────▼─────┐    ┌──────▼────┐
        │ Worker 01 │    │ Worker 02 │    │ Worker 03 │
        │           │    │           │    │           │
        │ MON  OSD  │    │ MON  OSD  │    │ MON  OSD  │
        │ MGR  OSD  │    │      OSD  │    │      OSD  │
        └───────────┘    └───────────┘    └───────────┘

Node lifecycle

  1. Discovery — node PXE boots, gets an IP, pulls the bootstrap image
  2. Registration — node reports its hardware to the Sentinel and enters pending status
  3. Provisioning — admin deploys Ceph daemons onto the node via CLI or API
  4. Active — node runs its assigned daemons and participates in the cluster
  5. Maintenance — node can be drained for hardware work without affecting cluster health