Ansible
We are moving towards using Ansible for both initial configuration of machines as well as deployment.
The main repository for this is here which contains our configuration, inventories, playbooks, roles and tasks.
Ansible playbooks used for remotely automating tasks on NDX machines and other machines we look after.
These run on your own computer (the control node) and orchestrate steps on remote machines.
Playbooks should be idempotent, ie. they can be run multiple times and will always result in the same output - this is marked in Ansible as OK/Changed if a task is skipped or changed respectively. For example, checking that a dependency is a certain version and skipping a set of tasks to update it (which may remove the previous version) and so on. This means that the “deploy” playbook can be run absolutely everywhere, and machines that are already up to date will get most tasks skipped completely. Most Ansible modules are built to be idempotent by default.
We are working towards these playbooks being the declarative steps to provision an instrument machine so it can easily be (re)created. As a developer you should keep this in mind - Ansible playbooks can be used for one-shot roll outs of a specific set of tasks, but we should make them repeatable. This repo should not become a ‘dumping ground’ of playbooks we’ve written to roll out a certain change - they should be applied to the blueprints of an instrument machine so we can use them again.
Setup
Preliminary steps to run these:
If you haven’t already, set up a keeper account with access to our group’s passwords.
Make sure your ssh public keys (which should be stored here) are deployed to instruments - see below for the playbook that does this
Set up the WSL if you’re using a Windows control node - Ansible does not support running on a windows control node natively. You will need your SSH keys registered in the WSL. If you are running Linux you can install Ansible natively.
Install Ansible, including plugins we require, by:
sudo snap install astral-uv --classicuv venv&source .venv/bin/activateuv pip install -r requirements.txtInstall the galaxy collections and roles by running
ansible-galaxy install -r requirements.ymlAdd the DNS search suffix (
isis.cclrc.ac.uk) to/etc/resolv.conf(to edit the file, use e.g.nano /etc/resolv.conf) by adding the following line:
...
search isis.cclrc.ac.uk
...
To test if this works, run an Ad-hoc commands and use the module ping (for linux machines) or win_ping (for windows machines ie. NDXes) to test for aliveness.
PRs will be linted by CI. To run this locally run ansible-lint - this is included in requirements.txt. Configuration is set by .ansible-lint.
Hosts.yaml
This is the main inventory file which lists hosts in groups such as by target station and/or by deploy group. It is contained in the inventory directory.
to limit running a playbook to just TS1 instruments, for example, you can use ansible-playbook playbook.yaml --limit ts1. This works because ts1 is a host group in the inventory file.
Windows-only playbooks (windows/)
Notes
These playbooks will prompt for a host group to run on, equivalent to (and taking precedence over) --limit as a command line argument.
As an example, to run the playbook on all NDXes other than NDXENGINX, enter ndxes,!NDXENGINX - this syntax is documented here
truncate_databases.yaml
This performs a backup and truncation of the local databases on instruments. It will prompt for hosts so to run use:
ansible-playbook windows/truncate_databases.yaml
instrument_deploy.yaml
This is the main playbook for deploying software to NDXes. Currently this stops the server if it is running and installs the JDK using the jdk role.
To use this you need to run ansible-playbook windows/instrument_deploy.yaml - it should prompt for hosts.
Updating JDK version
To update the JDK version, you will need to set the vars in roles\defaults\main.yaml.
These are:
jdk_major_ver- the major version (excluding .min.patch) of the JDK.jdk_full_ver- the full version string of the JDK.jdk_url- the URL to download the JDK from.jdk_checksum- the checksum for the downloaded JDK.(optionally)
jdk_force_update- whether to overwrite the current version if it already exists
deploy_ssh_keys.yaml
This is the windows equivalent of the Linux workflow for deploying the group’s ssh keys and turning off password auth in favour of public key auth.
To run this on all NDXes run ansible-playbook windows/deploy_ssh_keys.yaml. To limit to certain hosts/host groups use --limit ie.
--limit NDXHRPD_SETUP to just run on NDXHRPD_SETUP or --limit muons to only run on muon NDXes.
nsclient.yaml
Install nsclient++ monitor program. Needs to be run with
ansible-playbook --ask-vault-password windows/nsclient.yaml
There is a host group nsclient that is used to hold encrypted passwords and can also be used
for deploying
instrument_deploy.yaml
This is for deploying software to NDXes. Currently this stops the server if it is running and installs the JDK using the jdk role.
To use this you need to run ansible-playbook windows/instrument_deploy.yaml - it should prompt for hosts.
deploy_wincred.yaml
This is for deploying a new set of credentials to NDXes. The credentials should be updated in keeper first; this playbook reads the credentials from keeper.
To use this you need to run ansible-playbook windows/deploy_wincred.yaml --ask-vault-pass - it should prompt for a vault password (found in keeper - search for ansible) and hosts to target.
Linux-only playbooks (linux/)
Initial SSH set up
This involves:
copying over the keys in this repo
disabling OpenSSH password-authentication (ie. you can only use pub/priv keys!)
It can be run multiple times if for example you wanted to update the deployed keys if someone new joins the team or someone gets a new computer with a different public key.
Deploying ssh keys (playbook_deploy_ssh_keys.yaml)
This can be run on multiple hosts by using ansible-playbook playbook_deploy_ssh_keys.yaml if you already have ssh key-based auth set up (ie. if you wanted to update the list of keys because someone’s public key changed), but if this is a new machine you should use --ask-pass (to ask for the ssh password) along with --limit to limit the single host (as your ssh passwords may/should be different between machines)
For example, to run this on madara:
ansible-playbook playbook_deploy_ssh_keys.yaml --ask-pass --limit madara
which will prompt for madaras ssh password.
This step also prompts for a personal access token to access the keys repo, which is in Keeper.
Turning off sshd password authentication (playbook_turn_off_ssh_passwd_auth.yaml)
This is fairly obvious but to do this you need to have done the above step otherwise you’ll lock yourself out.
to run on all hosts, use:
ansible-playbook playbook_turn_off_ssh_passwd_auth.yaml
This will prompt for the vault password, which is in keeper (ds-config ansible vault)
General system updates
playbook_system_updates.yaml exists to update the system packages.
Kafka cluster provisioning
deploy_kafka.yaml and deploy_redpanda_console.yaml exist to provision a Kafka cluster, currently on the test machines in R55.
Notes
Setting up a windows hyper-v virtual machine for testing
To test playbooks on a local virtual machine running in Hyper-V, you need to set the following up:
A virtual machine itself. You can use the evaluation
.isoimages for thisIf using the
Default switchon the VM, you need to forward the WSL network so it can reach the VM. To do this runGet-NetIPInterface | where {$_.InterfaceAlias -eq 'vEthernet (WSL (Hyper-V firewall))' -or $_.InterfaceAlias -eq 'vEthernet (Default Switch)'} | Set-NetIPInterface -Forwarding Enabled -Verbosefrom an elevated powershell window. Note that by default most server windows images do not respond to ping requests. You can enable this by enabling the inbound ruleFile and Printer Sharing (Echo Request - ICMPv4-In)in the firewall settings.OpenSSH set up on the VM
Ad-hoc commands
To run an adhoc command to ie. win_ping to TS2 machines to check they’re reachable, you can run:
ansible ts2 -m win_ping
This can also be used to perform one-line shell/bash commands remotely.
Chaining commands
There is an issue with passing environment variables on windows specifically between chained commands.
If you wanted to run config_env.bat to define MYPVPREFIX then call caput after, you need to use a batch script and copy it across instead.
This can be done with a playbook similar to the following:
Show playbook
---
- name: set some PVs
hosts: ndxes
tasks:
- name: create temp dir
ansible.windows.win_tempfile:
state: directory
register: tmpdir
- name: copy bat over
ansible.windows.win_copy:
src: mybat.bat
dest: "{{tmpdir.path}}"
- name: call bat
ansible.windows.win_command: "{{tmpdir.path}}\\mybat.bat"
register: output
- name: print output
debug:
var: output
which copies over a batch script mybat.bat to a temporary dir, calls it, then Ansible cleans up the temp dir containing the batch script.
Example batch script, which resets the power check with a for loop:
call \instrument\apps\epics\config_env.bat
@echo on
FOR /L %%v IN (1, 1, 9) DO (
echo running caput %MYPVPREFIX%MOT:DMC0%%v:PWRDET:RESET:SP 1
caput %MYPVPREFIX%MOT:DMC0%%v:PWRDET:RESET:SP 1
)
exit /B 0