Ansible

We are moving towards using Ansible for both initial configuration of machines as well as deployment.

The main repository for this is here which contains our configuration, inventories, playbooks, roles and tasks.

Ansible playbooks used for remotely automating tasks on NDX machines and other machines we look after.

These run on your own computer (the control node) and orchestrate steps on remote machines.

Playbooks should be idempotent, ie. they can be run multiple times and will always result in the same output - this is marked in Ansible as OK/Changed if a task is skipped or changed respectively. For example, checking that a dependency is a certain version and skipping a set of tasks to update it (which may remove the previous version) and so on. This means that the “deploy” playbook can be run absolutely everywhere, and machines that are already up to date will get most tasks skipped completely. Most Ansible modules are built to be idempotent by default.

We are working towards these playbooks being the declarative steps to provision an instrument machine so it can easily be (re)created. As a developer you should keep this in mind - Ansible playbooks can be used for one-shot roll outs of a specific set of tasks, but we should make them repeatable. This repo should not become a ‘dumping ground’ of playbooks we’ve written to roll out a certain change - they should be applied to the blueprints of an instrument machine so we can use them again.

Setup

Preliminary steps to run these:

If you haven’t already, set up a keeper account with access to our group’s passwords.
Make sure your ssh public keys (which should be stored here) are deployed to instruments - see below for the playbook that does this
Set up the WSL if you’re using a Windows control node - Ansible does not support running on a windows control node natively. You will need your SSH keys registered in the WSL. If you are running Linux you can install Ansible natively.
Install Ansible, including plugins we require, by:
sudo snap install astral-uv --classic
uv venv & source .venv/bin/activate
uv pip install -r requirements.txt
Install the galaxy collections and roles by running ansible-galaxy install -r requirements.yml
Add the DNS search suffix (isis.cclrc.ac.uk) to /etc/resolv.conf (to edit the file, use e.g. nano /etc/resolv.conf) by adding the following line:

...
search isis.cclrc.ac.uk
...

To test if this works, run an Ad-hoc commands and use the module ping (for linux machines) or win_ping (for windows machines ie. NDXes) to test for aliveness.

PRs will be linted by CI. To run this locally run ansible-lint - this is included in requirements.txt. Configuration is set by .ansible-lint.

`Hosts.yaml`

This is the main inventory file which lists hosts in groups such as by target station and/or by deploy group. It is contained in the inventory directory.

to limit running a playbook to just TS1 instruments, for example, you can use ansible-playbook playbook.yaml --limit ts1. This works because ts1 is a host group in the inventory file.

Windows-only playbooks (`windows/`)

Notes

These playbooks will prompt for a host group to run on, equivalent to (and taking precedence over) --limit as a command line argument.

As an example, to run the playbook on all NDXes other than NDXENGINX, enter ndxes,!NDXENGINX - this syntax is documented here

`truncate_databases.yaml`

This performs a backup and truncation of the local databases on instruments. It will prompt for hosts so to run use:

ansible-playbook windows/truncate_databases.yaml

`instrument_deploy.yaml`

This is the main playbook for deploying software to NDXes. Currently this stops the server if it is running and installs the JDK using the jdk role.

To use this you need to run ansible-playbook windows/instrument_deploy.yaml - it should prompt for hosts.

Updating JDK version

To update the JDK version, you will need to set the vars in roles\defaults\main.yaml. These are:

jdk_major_ver - the major version (excluding .min.patch) of the JDK.
jdk_full_ver - the full version string of the JDK.
jdk_url - the URL to download the JDK from.
jdk_checksum - the checksum for the downloaded JDK.
(optionally) jdk_force_update - whether to overwrite the current version if it already exists

`deploy_ssh_keys.yaml`

This is the windows equivalent of the Linux workflow for deploying the group’s ssh keys and turning off password auth in favour of public key auth.

To run this on all NDXes run ansible-playbook windows/deploy_ssh_keys.yaml. To limit to certain hosts/host groups use --limit ie. --limit NDXHRPD_SETUP to just run on NDXHRPD_SETUP or --limit muons to only run on muon NDXes.

`nsclient.yaml`

Install nsclient++ monitor program. Needs to be run with

ansible-playbook --ask-vault-password windows/nsclient.yaml

There is a host group nsclient that is used to hold encrypted passwords and can also be used for deploying

`instrument_deploy.yaml`

This is for deploying software to NDXes. Currently this stops the server if it is running and installs the JDK using the jdk role.

To use this you need to run ansible-playbook windows/instrument_deploy.yaml - it should prompt for hosts.

`deploy_wincred.yaml`

This is for deploying a new set of credentials to NDXes. The credentials should be updated in keeper first; this playbook reads the credentials from keeper.

To use this you need to run ansible-playbook windows/deploy_wincred.yaml --ask-vault-pass - it should prompt for a vault password (found in keeper - search for ansible) and hosts to target.

Linux-only playbooks (`linux/`)

Initial SSH set up

This involves:

copying over the keys in this repo
disabling OpenSSH password-authentication (ie. you can only use pub/priv keys!)

It can be run multiple times if for example you wanted to update the deployed keys if someone new joins the team or someone gets a new computer with a different public key.

Deploying ssh keys (`playbook_deploy_ssh_keys.yaml`)

This can be run on multiple hosts by using ansible-playbook playbook_deploy_ssh_keys.yaml if you already have ssh key-based auth set up (ie. if you wanted to update the list of keys because someone’s public key changed), but if this is a new machine you should use --ask-pass (to ask for the ssh password) along with --limit to limit the single host (as your ssh passwords may/should be different between machines)

For example, to run this on madara:

ansible-playbook playbook_deploy_ssh_keys.yaml --ask-pass --limit madara which will prompt for madaras ssh password.

This step also prompts for a personal access token to access the keys repo, which is in Keeper.

Turning off `sshd` password authentication (`playbook_turn_off_ssh_passwd_auth.yaml`)

This is fairly obvious but to do this you need to have done the above step otherwise you’ll lock yourself out.

to run on all hosts, use:

ansible-playbook playbook_turn_off_ssh_passwd_auth.yaml

This will prompt for the vault password, which is in keeper (ds-config ansible vault)

General system updates

playbook_system_updates.yaml exists to update the system packages.

Kafka cluster provisioning

deploy_kafka.yaml and deploy_redpanda_console.yaml exist to provision a Kafka cluster, currently on the test machines in R55.

Notes

Setting up a windows hyper-v virtual machine for testing

To test playbooks on a local virtual machine running in Hyper-V, you need to set the following up:

A virtual machine itself. You can use the evaluation .iso images for this
If using the Default switch on the VM, you need to forward the WSL network so it can reach the VM. To do this run Get-NetIPInterface | where {$_.InterfaceAlias -eq 'vEthernet (WSL (Hyper-V firewall))' -or $_.InterfaceAlias -eq 'vEthernet (Default Switch)'} | Set-NetIPInterface -Forwarding Enabled -Verbose from an elevated powershell window. Note that by default most server windows images do not respond to ping requests. You can enable this by enabling the inbound rule File and Printer Sharing (Echo Request - ICMPv4-In) in the firewall settings.
OpenSSH set up on the VM

Ad-hoc commands

To run an adhoc command to ie. win_ping to TS2 machines to check they’re reachable, you can run:

ansible ts2 -m win_ping

This can also be used to perform one-line shell/bash commands remotely.

Chaining commands

There is an issue with passing environment variables on windows specifically between chained commands. If you wanted to run config_env.bat to define MYPVPREFIX then call caput after, you need to use a batch script and copy it across instead.

This can be done with a playbook similar to the following:

Show playbook

---
- name: set some PVs
  hosts: ndxes
  tasks:
    - name: create temp dir
      ansible.windows.win_tempfile:
        state: directory
      register: tmpdir
    - name: copy bat over
      ansible.windows.win_copy:
        src: mybat.bat
        dest: "{{tmpdir.path}}"
    - name: call bat
      ansible.windows.win_command: "{{tmpdir.path}}\\mybat.bat"
      register: output
    - name: print output
      debug:
        var: output

which copies over a batch script mybat.bat to a temporary dir, calls it, then Ansible cleans up the temp dir containing the batch script.

Example batch script, which resets the power check with a for loop:

call \instrument\apps\epics\config_env.bat
@echo on
FOR /L %%v IN (1, 1, 9) DO (
    echo running caput %MYPVPREFIX%MOT:DMC0%%v:PWRDET:RESET:SP 1
caput %MYPVPREFIX%MOT:DMC0%%v:PWRDET:RESET:SP 1
)

exit /B 0

Ansible

Setup

Hosts.yaml

Windows-only playbooks (windows/)

Notes

truncate_databases.yaml

instrument_deploy.yaml

Updating JDK version

deploy_ssh_keys.yaml

nsclient.yaml

instrument_deploy.yaml

deploy_wincred.yaml

Linux-only playbooks (linux/)

Initial SSH set up

Deploying ssh keys (playbook_deploy_ssh_keys.yaml)

Turning off sshd password authentication (playbook_turn_off_ssh_passwd_auth.yaml)

General system updates

Kafka cluster provisioning

Notes

Setting up a windows hyper-v virtual machine for testing

Ad-hoc commands

Chaining commands

`Hosts.yaml`

Windows-only playbooks (`windows/`)

`truncate_databases.yaml`

`instrument_deploy.yaml`

`deploy_ssh_keys.yaml`

`nsclient.yaml`

`instrument_deploy.yaml`

`deploy_wincred.yaml`

Linux-only playbooks (`linux/`)

Deploying ssh keys (`playbook_deploy_ssh_keys.yaml`)

Turning off `sshd` password authentication (`playbook_turn_off_ssh_passwd_auth.yaml`)