Ansible is a tool that automates the configuration of systems.
Install Ansible on any system which can access target nodes via SSH. This can be a laptop, small virtual machine, or cluster management server. This system is known in these docs as the provisioning node.
Ansible is:
- Agentless (there’s nothing that needs to be installed on other nodes in the cluster)
- Idempotent (you can run the same playbook or task over and over again without repercussions - and tasks that do not require modification of the target nodes will result in Ansible skipping those tasks)
- Easy to maintain & scale (rather than custom scripts)
- Easy to read & use (via YAML playbooks, roles, and tasks)
- Control machine with supported OS to run Ansible
- Passwordless (SSH key) access from Ansible system to Universal GPU servers
A script is provided to install Ansible on Ubuntu and RHEL/CentOS machines. Ansible can also be installed on Mac OS and Windows (WSL).
# Install Ansible and required roles from Ansible Galaxy
./scripts/setup.sh
See the official Ansible documentation for more detailed installation information.
Systems are easier to manage with Ansible if you don't have to type passwords. To configure SSH for passwordless access using SSH keys. Run the following commands on the control machine where Ansible is installed:
# Generate an SSH keypair for the current user (hit enter to accept defaults)
ssh-keygen
# Copy the new SSH public key to each system that Ansible will configure
# where <username> is the remote username and <host> is the IP or hostname of the remote system
ssh-copy-id <username>@<host>
To use Ansible without SSH keys, you can add flags to have ansible prompt for a password:
If SSH requires a password, add the -k
flag
If sudo requires a password, add the -K
flag
Ansible playbooks are files which manage the configuration of remote machines.
Green indicates nothing changed as a result of the task
Yellow indicates something changed as a result of the task
Blue indicates the task was skipped
Red indicates the task failed
For more verbose output, add -v
, -vv
, -vvv
, etc. flags
A successful ansible-playbook run should provide a list of hosts and changes and indicate no failures, for example:
PLAY RECAP ************************************************************************************************************
localhost : ok=1 changed=0 unreachable=0 failed=0
node1 : ok=401 changed=121 unreachable=0 failed=0
Create server inventory
# Copy the default configuration
cp -r config.example config
# Review and edit the inventory file to set IPs/hostnames for servers
cat config/inventory
# Review and edit configuration under config/group_vars/*.yml
cat config/group_vars/all.yml
cat config/group_vars/gpu-servers.yml
Run Commands
To run arbitrary commands in parallel across nodes in the cluster, you can use ansible and the groups or hosts defined in the inventory file, for example:
# ansible <host-group> -a hostname
ansible management -a hostname
Run Playbooks
To run playbooks, use the ansible-playbook
command:
# If sudo requires a password, add the -K flag
# ansible-playbook <host-group> playbooks/<playbook>.yml
ansible-playbook -l management,localhost -b playbooks/k8s-cluster.yml
Debugging
Show host vars: ansible all -m debug -a 'var=hostvars'
Inventory reference: https://docs.ansible.com/ansible/latest/user_guide/intro_inventory.html
Variable configuration reference: https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html