Project:Infrastructure/Developer Machines/Automation

Automation of developer machine configuration management

Problem statement

Systems for developers to use directly have no consistent management. This includes multiple cases, but is not limited to: arch-specific development/testing, release engineering/builds, or other testing.

The systems must provide full access to the developers, as some use-cases may be impossible without full root (e.g. kernel debugging).

The requirement of full-access means that shared secrets that might permit lateral movement from these systems onto other Infrastructure machines should NOT be present.

At the same time, users of the systems desire configuration management to help consistently manage the fleet.

Background on prior efforts

Puppet for releng

Infra attempted to use the existing infra Puppet-based system to manage some releng build hosts, that were nominally Infra systems. This required full integration with Puppet, and while the initial cases gave releng members access to the infra puppet repo, that meant that some secrets were unintentionally accessible to releng members. The change process also required releng members emailing patches for review & inclusion, rather than just being able to change things themselves.

Vapier's user scripts

User:Vapier built a set of scripts that created/deleted users & groups on development systems, but these scripts were not easy to modify or use for other people.

Conceptual solution

Provide a configuration management system that preserves the trust boundary between systems. - Reduce unintentional lateral movement from any given development system onto any other systems (be it infra or other development systems) - Provide a publicly visible configuration for any managed parts of the system - Manage only the minimal components - Have a way to push secrets down onto systems without being widely accessible (assume that anybody with root on a given system can read the secret there)

Implementation

Provide a MINIMALLY intrusive Ansible-based configuration management.

It should consist of:

git repo, publically accessible
- both infra & releng can commit!
trusted ansible runner
- on push to repo, runner should fire playbook(s)
- the developers would NOT have root access at the ansible runner point
ansible code to fetch secrets from infra stores

Things that probably should be managed in Ansible

Users: based on LDAP data & some local users (non-Gentoo-dev)
- create/delete user/group, populate SSH keys
Ensure firewall meets both releng & infra agreed requirements
- User:Dilfridge has stated that the only permitted inbound port should be SSH; I feel it might need HTTP+rsync as well.
sudo rules so that users can become root.
- should be possible to have non-root users on some development systems that CANNOT sudo
cronjob:
- keep gentoo repo up to date (emerge --sync)
- update any GLSAs? (emerge -uv @security)
mail:
- outbound-only configuration that uses infra relayhost w/ password to get mail
infra-inventory
- scripts that capture overall state of system, look for any changes over time that might be otherwise missed (e.g. RAM amount changes due to DIMM failure, non-critical PCI card vanishes, drive vanishes)
health monitoring?
- active checks for system problems
- disk space
- OOM conditions
- loadavg?
- SSH responsive
- RAID healthy
- SMART healthy
- temp/fan sensors