A practical guide for sysadmins to use cloud-init in OpenCHAMI for predictable boots and easier day-2 ops.
Bringing up HPC nodes should be boring. You want the same outcome every time, with no hand edits, and a clear record of what changed. OpenCHAMI uses cloud-init to do that. It gives you a simple way to describe how a node should boot and configure itself. Then it applies the same steps across the fleet.
In this post, you will see how cloud-init fits into OpenCHAMI, where configs live, and how to test a change on one node before rolling it out. The goal is to help you do less manual work and get to a steady state faster.
Cloud-init has been battle-tested in cloud providers for years. It runs early in boot, reads a small config, and applies the steps you define. In OpenCHAMI, we reuse the same idea for HPC. That means you get a standard flow, small configs, and fewer surprises.
OpenCHAMI services provide data and templates for node boots. The Boot Script Service (BSS) hands the node the right blob at the right time. Cloud-init on the node reads that blob and does the work. Other services, like SMD (System Management Database), keep track of system state and inventory.
This example installs packages, writes an agent config, and starts a service. Keep the file small and readable.
#cloud-config
package_update: true
packages:
- htop
- jq
write_files:
- path: /etc/myagent/config.yaml
permissions: '0644'
content: |
cluster: ochami-prod
node_role: compute
metrics: true
runcmd:
- systemctl enable --now myagentRegister the config In most clusters, you keep configs in Git and push a reference into BSS. Here is a safe, four-command workflow you can adapt. Replace names to match your environment.
# 1) Clone your infra repo (holds cloud-init blobs/templates)
git clone https://github.com/OpenCHAMI/cloud-init-configs.git && cd cloud-init-configs
# 2) Add or update the config for a group (e.g., compute-default.yaml)
$EDITOR groups/compute-default.yaml
# 3) Commit and push so others can review/audit
git commit -am "feat(cloud-init): enable myagent on compute defaults" && git push
# 4) Tell BSS to use the new version for the group
curl -X POST http://bss.api.cluster/v1/groups/compute-default/refreshFor things like NTP servers, sysctls, or agent toggles, use small templates with variables for rack or site. Keep the template simple, and fill variables from inventory in SMD. This keeps the logic in one place and avoids diverging copies.
Once cloud-init sets the baseline, your day-2 work gets easier. You have a repeatable starting point and can push small, safe changes when you need them. You also spend less time chasing drift between nodes.
Ready to try OpenCHAMI?