Project:Infrastructure/Developer Machines/ia64

= ia64 Admin Notes =

These are various notes mainly targeted at people administrating Gentoo dev machines, although most things are probably generally useful. These are not general "how do I administrate a Gentoo box" notes.

dolphin
Host: dolphin.ia64.dev.gentoo.org

HP RX2600, CD writer. Donated by HP in 2003. This machine is powered off since two years ago to save power/cooling resources.

ILO is accessible using port 1 of console2. I used to access it using  but console2 doesn't seem to answer now.

ttyS0 is accessible using port 2 of console2,

This machine had 4GB of RAM, 2x900MHz processors, 1x36GB HDD SCSI 80pin, 2x72GB HDD SCSI 80pin. No RAID.

This machine should still be in gentoo's rack in OSL, on top of bender. It does not have rails.

beluga
HP RX2620, CD/DVD reader only. Donated by HP in 2012, previously it used to be in HP's datacenter. It's stored in OSL but not in Gentoo's rack. It was sent as-is from HP, so iLO is configured with wrong parameters, probably. Also it will have static IP in the OS, wrongly configured too. I think it had a RAID5 by HW using 72GB HDDs. Cannot remember how many, probably 4 or 5. It had 2x 1.6GHz processors and 12GB of RAM.

It was stored in case guppy failed in the future and we had no other option.

guppy
HP RX3600. DVD/CD writer IIRC. Used to be in HP's DC but was sent to OSL when HP pulled the plug in DC. iLO is accessible from port 5 in console2. Once logged in you can access the remote console too.

Hostnames
These are the current systems we have available. See machine specific notes at bottom for more details.

Console Access
iLO2 is accessible over telnet and SSH from dev.gentoo.org box (ssh needs some legacy ciphers). Ask infra@ for credentials and IP address.

You can use this to:
 * Interact with the EFI (e.g. to select recovery kernel, boot from plugged Gentoo DVD, change boot order)
 * Log in directly over ttyS1 to recover
 * Reboot machine

Hardware notes
List devices over MP console as: 'CM' > 'DF'

PSU status
PSU status can be checked over MP console as: 'CM' > 'PS': Power supplies               State ---   Power Supply 0                Fault Power Supply 1               Normal

Here we see that PSU-0 needs to be swapped. Tracked (and fixed) at.

HDD status
Disk array needs to be checked from operating system:

Here we see that HDD-6 needs to be swapped. Tracked (and fixed) at. Leaving the error example here for posterity.

Batteries are also dead. I'm not sure how many batteries are there: one per controller or one per SAS I/O card. TODO: find out how to check those as well.

Common iLO commands

 * Get remote console output (ttyS1):
 * Get interactive console (to login and recover system on ttyS1):
 * Reboot main machine:
 * Power cycle main machine and RAID:
 * Manage iLO users:
 * Get builtin help:

Other stuff
There are a few concepts to keep in mind when using iLO:
 * MP (iLO): a separate from main machine board that accepts telnet and ssh connections, issues commands to main machine over BMC interface, can to I/O on ttyS1
 * BMC: an FPGA on motherboard of main machine, accepts commands from MP. Can reboot machine, return hardware parts, report health status, etc.
 * main machine itself: a few ia64 CPUs, RAM and so on.

user --- ---> MP --> [ BMC <-> ia64-machine ].

Mysterious hangups on reboot
Sometimes BMC hangs up on main machine reboot. Not clear why.

You can usually still access MP but I have not figured out how to reboot the machine in this state without physical help. End up asking infra/on-site staff to reboot a machine.

Makes each reboot a challenge.

Kernel Management
ia64 systems are EFI systems. guppy uses standard grub2 efi64 setup.

To update a kernel:
 * build kernel in /usr/src/linux
 * install kernel as
 * boot-test new kernel over iLO by changing path to vmlinux.
 * regenerate configs via

Needed patches/configs

 * : stack canary has to be removed as it assumes that one of stack tops is unused
 * :  has to be ignored as it breaks BPF and kernel module loading sometimes by corrupting vmalloc state.
 * kernel command (or disable ) because linear mapping is not accounted in usercopy check. Full of false positives on any buffer checks.

Recovery notes
iLO serial console runs on ,   is wired to physycal(?) console.

Console is configured in EFI as.

EFI shell
In interactive EFI boot menu pick. And run the DVD kernel:

fs0:\> ls fs0:\efi\boot Directory of: fs0:\efi\boot
 * 1) inspect cdrom

09/27/09 08:42p           2,048. 09/27/09 08:42p           2,048  .. 09/27/09 08:42p                  698  elilo.conf 09/27/09 08:42p            7,020,793  gentoo 09/27/09 08:42p              374,212  bootia64.efi 09/27/09 08:42p            6,092,363  gentoo.igz 09/27/09 08:42p                  380  elilo.msg

fs0:\> fs0:\efi\boot\bootia64.efi -i gentoo.igz gentoo initrd=gentoo.igz root=/dev/ram0 init=/linuxrc dokeymap looptype=squashfs loop=/image.squashfs cdroot console=ttyS1,115200n8 ... livecd ~ # uname -r 2.6.30-gentoo-r6
 * 1) run kernel with custom arguments (cdrom's defaults and not very suitable)

Or alternatively you can boot directly from HDD if you need non-standard arguments:

Shell> fs1:\EFI\gentoo\elilo.efi boot\vmlinuz-4.9.72-gentoo root=/dev/cciss!c0d0p3

For newer kernel (4.19+) devices got renamed from /dev/cciss!c0d0p${N} to /dev/sda${N}: Shell> fs1:\EFI\gentoo\elilo.efi boot\vmlinuz-4.19.86-gentoo root=/dev/sda3

To get network setup just configure the addresses (see below for up-to-date setup):
 * 1) ip addr add 140.211.166.179/27 dev eth1
 * 2) ip link set up dev eth1
 * 3) ip r add default via 140.211.166.161 dev eth1

eth1 is a NIC with MAC ..:..:..:51:cf:57.

ELILO shell
Type  at   prompt to interrupt boot process.

TODO: actual syntax to load initrd

Config snippets
Config snippets on plugged Gentoo-2009 cdrom:

/etc/inittab
... c1:12345:respawn:/sbin/agetty 38400 tty1 linux c2:2345:respawn:/sbin/agetty 38400 tty2 linux c3:2345:respawn:/sbin/agetty 38400 tty3 linux c4:2345:respawn:/sbin/agetty 38400 tty4 linux c5:2345:respawn:/sbin/agetty 38400 tty5 linux c6:2345:respawn:/sbin/agetty 38400 tty6 linux
 * 1) TERMINALS

...
 * 1) SERIAL CONSOLES
 * s0:12345:respawn:/sbin/agetty 9600 ttyS0 vt100
 * s1:12345:respawn:/sbin/agetty 9600 ttyS1 vt100

elilo.conf
prompt message=/efi/boot/elilo.msg chooser=simple timeout=50 relocatable

image=/efi/boot/gentoo label=gentoo append="initrd=gentoo.igz root=/dev/ram0 init=/linuxrc dokeymap looptype=squashfs loop=/image.squashfs cdroot" initrd=/efi/boot/gentoo.igz

image=/efi/boot/gentoo label=gentoo-serial append="initrd=gentoo.igz root=/dev/ram0 init=/linuxrc dokeymap looptype=squashfs loop=/image.squashfs cdroot console=tty0 console=ttyS0,9600" initrd=/efi/boot/gentoo.igz

image=/efi/boot/gentoo label=gentoo-sgi append="initrd=gentoo.igz root=/dev/ram0 init=/linuxrc dokeymap looptype=squashfs loop=/image.squashfs cdroot console=tty0 console=ttySG0,115200" initrd=/efi/boot/gentoo.igz

/etc/conf.d/net
Useful for livecd as DHCP does not acquire data:

config_eth1="140.211.166.179/27" routes_eth1="default via 140.211.166.161"