Project:Infrastructure/SPARC server recovery
This document covers how to recover from hard failure on the Sun Fire T2000 development servers, bender.sparc.dev.gentoo.org and totoro.sparc.dev.gentoo.org.
Thanks to User:Iamben for writing this document. Wikified and edited by User:Robbat2.
- SSH to ALOM
- Press Enter at SILO prompt:
- Press #, . to disconnect console
Connecting to ALOM
First, login to something with access to the Gentoo LAN subnet at OSUOSL (another host or the OSL VPN).
Then SSH to the ALOM (SPARC Out Of Band management system), ensuring you tell SSH to use legacy options, as newer SSH security is not supported by the ALOM.
|Host||ALOM IP, internal network|
ssh -oKexAlgorithms=diffie-hellman-group1-sha1 10.0.0.176
Copyright 2006 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. Sun(tm) Advanced Lights Out Manager CMT v1.3.1 Please login: admin Please Enter password: *********
You should now have the ALOM console, denoted by
ALOM: Manual host poweroff
From ALOM console, run poweroff. You will be prompted for confirmation, and then it will return to the prompt. You need to wait for shutdown confirmation!
Are you sure you want to power off the system [y/n]? y SC Alert: SC Request to Power Off Host. sc> SC Alert: Host system has shut down. sc>
ALOM: Manual host poweron
If you try to run this too early, you will get a message that the host is still shutting down.
From ALOM console, run poweron.
SC Alert: Host System has Reset sc>
ALOM: connect to host console
From ALOM console, run console -f. The -f option is needed in case there is a stale connection to the console, as sometimes happens if SSH is disconnected without an explicit logout. You will be prompted to disconnect the stale connection.
SC Alert: Host System has Reset Warning: User < > currently has write permission to this console and forcibly removing them will terminate any current write actions and all work will be lost. Would you like to continue? [y/n] y Enter #. to return to ALOM.
Host console: POST output
Review the POST output; it might contain hardware faults (unlikely, and should pause).
Enter #. to return to ALOM. Sun Fire T200, No Keyboard Copyright 2006 Sun Microsystems, Inc. All rights reserved. OpenBoot 4.25.0, 16376 MB memory available, Serial #75611764. Ethernet address 0:14:4f:81:be:74, Host ID: 8481be74. <dozens of lines of POST output, takes several minutes>
Host console: SILO bootloader
Press Enter at SILO prompt to boot the default Gentoo Linux kernel.
Boot device: disk File and args: SILO Version 1.4.14_git20170829 boot: boot: Allocated 64 Megs of memory at 0x40000000 for kernel Uncompressing image... Loaded kernel version 4.20.2 [ 0.000028] PROMLIB: Sun IEEE Boot Prom 'OBP 4.25.0 2006/11/07 23:24' [ 0.000037] PROMLIB: Root node compatible: sun4v [ 0.000062] Linux version 4.20.2-gentoo (root@bender) (gcc version 8.2.0 (Gentoo 8.2.0-r6 p1.7)) #1 SMP Wed Jan 16 14:16:59 -00 2019 [ 1.797025] printk: bootconsole [earlyprom0] enabled [ 2.037192] ARCH: SUN4V ... This is bender.gentoo.osuosl.org (Linux sparc64 4.20.2-gentoo) 22:37:04 bender login:
Host console: exit from host console to ALOM prompt
Press #, . to disconnect console and return to
This is bender.gentoo.osuosl.org (Linux sparc64 4.20.2-gentoo) 22:37:04 bender login: sc>
Properly logout from the ALOM console.
Connection to 10.0.0.176 closed.