Project:Infrastructure/SPARC server recovery

From Gentoo Wiki
Jump to: navigation, search

This document covers how to recover from hard failure on the Sun Fire T2000 development servers, bender.sparc.dev.gentoo.org and totoro.sparc.dev.gentoo.org.

Thanks to User:Iamben for writing this document. Wikified and edited by User:Robbat2.

Short version

  1. SSH to ALOM
  2. sc>poweroff
  3. sc>poweron
  4. sc>console -f
  5. Press Enter at SILO prompt:
    boot: boot: 
  6. Press #, . to disconnect console
  7. sc>logout


Long version

Connecting to ALOM

First, login to something with access to the Gentoo LAN subnet at OSUOSL (another host or the OSL VPN).

Then SSH to the ALOM (SPARC Out Of Band management system), ensuring you tell SSH to use legacy options, as newer SSH security is not supported by the ALOM.

Host ALOM IP, internal network
bender.sparc.dev.gentoo.org 10.0.0.176
totoro.sparc.dev.gentoo.org 10.0.0.32
user $ ssh -oKexAlgorithms=diffie-hellman-group1-sha1 10.0.0.176
  
Copyright 2006 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.

Sun(tm) Advanced Lights Out Manager CMT v1.3.1

Please login: admin
Please Enter password: *********
 

You should now have the ALOM console, denoted by sc>:

sc>  


ALOM: Manual host poweroff

From ALOM console, run poweroff. You will be prompted for confirmation, and then it will return to the prompt. You need to wait for shutdown confirmation!

sc>poweroff
Are you sure you want to power off the system [y/n]?  y
SC Alert: SC Request to Power Off Host.
sc>
SC Alert: Host system has shut down.
sc>
 


ALOM: Manual host poweron

Important
If you try to run this too early, you will get a message that the host is still shutting down.

From ALOM console, run poweron.

sc>poweron
SC Alert: Host System has Reset
sc>
 


ALOM: connect to host console

From ALOM console, run console -f. The -f option is needed in case there is a stale connection to the console, as sometimes happens if SSH is disconnected without an explicit logout. You will be prompted to disconnect the stale connection.

sc>console -f
SC Alert: Host System has Reset
Warning: User < > currently has write permission to this console and forcibly
removing them will terminate any current write actions and all work will be
lost.  Would you like to continue? [y/n] y
Enter #. to return to ALOM.

Host console: POST output

Review the POST output; it might contain hardware faults (unlikely, and should pause).

Enter #. to return to ALOM.
Sun Fire T200, No Keyboard
Copyright 2006 Sun Microsystems, Inc.  All rights reserved.
OpenBoot 4.25.0, 16376 MB memory available, Serial #75611764.
Ethernet address 0:14:4f:81:be:74, Host ID: 8481be74.
<dozens of lines of POST output, takes several minutes>
 

Host console: SILO bootloader

Press Enter at SILO prompt to boot the default Gentoo Linux kernel.

boot: boot:  
Boot device: disk  File and args:
SILO Version 1.4.14_git20170829
boot: boot:
Allocated 64 Megs of memory at 0x40000000 for kernel
Uncompressing image...
Loaded kernel version 4.20.2

[    0.000028] PROMLIB: Sun IEEE Boot Prom 'OBP 4.25.0 2006/11/07 23:24'
[    0.000037] PROMLIB: Root node compatible: sun4v
[    0.000062] Linux version 4.20.2-gentoo (root@bender) (gcc version 8.2.0 (Gentoo 8.2.0-r6 p1.7)) #1 SMP Wed Jan 16 14:16:59 -00 2019
[    1.797025] printk: bootconsole [earlyprom0] enabled
[    2.037192] ARCH: SUN4V
...
This is bender.gentoo.osuosl.org (Linux sparc64 4.20.2-gentoo) 22:37:04

bender login:

Host console: exit from host console to ALOM prompt

Press #, . to disconnect console and return to sc> prompt.

This is bender.gentoo.osuosl.org (Linux sparc64 4.20.2-gentoo) 22:37:04
bender login: sc>

ALOM: logout

Properly logout from the ALOM console.

sc>logout
Connection to 10.0.0.176 closed.