Troubleshooting
Common issues and how to resolve them.
Node Won’t PXE Boot
| Check | Fix |
|---|---|
| DHCP not reaching node | Verify switch DHCP options 66/67 point to master IP |
| Wrong boot mode | Ensure node firmware matches (BIOS or UEFI) |
| No boot image set | Run nf images default --id <id> to set a default image |
| Firewall blocking | Ensure master node ports are accessible from the boot network |
Node Registers but Shows “pending”
This is normal. Nodes enter pending status after registration and stay there until you deploy daemons to them.
Task Failed
Check the task logs:
$ nf tasks list # find the task ID
Then check the logs via the API: GET /task/run/{runId}/logs
Common failures:
- Disk already in use — the disk has an existing filesystem or OSD
- Node unreachable — network connectivity issue between master and worker
- Insufficient resources — not enough disk space or memory
OSD Won’t Start
- Check that the disk is not already formatted (
hasFilesystem: trueinnf disks list) - Ensure the node has network connectivity to the monitors
- Check task logs for specific error messages
CephFS Not Accessible
- Verify at least one MDS is running: check
nf node listfor the MDS column - Ensure the CephFS was created: check
nf ceph fswas run - Verify client has the correct Ceph keyring
S3/RGW Not Working
- Verify at least one RGW instance is running:
nf ceph rgw list - Check that an RGW user exists with valid credentials
- Test with
nf s3 --accessKey ... --secretKey ... bucket list - Ensure the RGW port (default 7480) is accessible
Getting Help
- Check task logs for detailed error messages
- Review the Architecture page to understand component relationships
- Use
--jsonflag on CLI commands for machine-readable output for debugging