Project:Infrastructure/Incident reports/2015-05-07 woodpecker

From Gentoo Wiki
Jump to:navigation Jump to:search

2015-05-7 Woodpecker Outage

Writeup by Robin H. Johnson robbat2

Timeline on 2015/05/07

  • 04:40:07 UTC - last log message
  • 04:41 UTC - first nagios alert to #gentoo-infra
  • 05:27 UTC - a dev sends an IRC /query to robbat2 that there is a problem
  • 05:28 UTC - robbat2 captures the serial console trace below and begins to restore
  • 06:20 UTC - woodpecker is back for the first time
  • ~06:45 UTC - some reboots to get a 64-bit kernel worked in
  • 07:06 UTC

Background

The system install on woodpecker is very old as Gentoo infrastructure systems go, dating back to late 2005 or earlier. It originally was a HP ProLiant DL380 G4 with no proper 64-bit capability (the CPUs were capable, but the BIOS had unresolvable issues). Instead ran a 32-bit HIGHMEM kernel, the only such system in infra.

As a result of the system age, many of the legacy pieces on the system were not managed by configuration management: woodpecker never got a cfengine deployment like other infra hosts. It did however get Puppet later.

In January 2015, the hardware started showing problems, and given the difficulty of moving all the developer content, as well as the fragile mail setup, the system was simply forklift-upgraded into a VM environment.

What went wrong in the first place

This is an open question right now. It was at an all-time high of uptime since migration, as ~98 days had elapsed since the migration.

What went wrong with bringing it back

  • The initramfs present on the system contained a lvm.conf that filtered out the /dev/vd* devices, so LVM did not initialize at first.
  • /etc/inittab was empty
    • Files populated by cfengine were not present, since the host did not run cfengine
    • The script to build it was fired by puppet, leading to an empty file
  • /etc/fstab was out of date from the actual mounts
    • /usr had been merged to /
    • A bad user_xattr entry on a filesystem that did not support it anymore (converted away from ext*) caused a mount fail.

What further actions were taken

  • A newer, 64-bit kernel was deployed on top of the existing 32-bit userland.
  • Puppet handling for inittab was worked around for the moment, a full fix is pending
  • Puppet contents of fstab were fixed.

Notes

This was the last logged line from woodpecker prior to the outage. There was nothing suspicious prior to this, just the normal lots of mail since this host runs all of the non-list mail for Gentoo.

May  7 04:40:07 woodpecker kernel: [8507373.720562] PAX: refcount overflow detected in: kswapd0:73, uid/euid: 0/0

The serial console is spewing traces after this:

[8510457.749253] CPU: 3 PID: 10780 Comm: nrpe Tainted: G S      W     3.15.2-hardened-infra29 #1
[8510457.753033] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[8510457.753033] task: dad62610 ti: dad62988 task.ti: dad62988
[8510457.754585] Stack:
[8510457.757068] Call Trace:
[8510457.758091] Code: 5d c3 8d 76 00 8d bc 27 00 00 00 00 55 8b 00 89 e5 c1 e8 0e 25 00 0c 00 00 f0 ff 80 c0 a9 b4 c1 71 09 f0 ff 88 c0 a9 b4 c1 cd 04 <5d> c3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 8b 10 83
[8510457.768956] CPU: 1 PID: 10778 Comm: nrpe Tainted: G S      W     3.15.2-hardened-infra29 #1
[8510457.769028] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[8510457.773027] task: dae026d0 ti: dae02a48 task.ti: dae02a48
[8510457.777029] Stack:
[8510457.777041] Call Trace:
[8510457.778065] Code: 5d c3 8d 76 00 8d bc 27 00 00 00 00 55 8b 00 89 e5 c1 e8 0e 25 00 0c 00 00 f0 ff 80 c0 a9 b4 c1 71 09 f0 ff 88 c0 a9 b4 c1 cd 04 <5d> c3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 8b 10 83
[8510461.829199] CPU: 1 PID: 10784 Comm: nrpe Tainted: G S      W     3.15.2-hardened-infra29 #1
[8510461.832974] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[8510461.836958] task: dad62610 ti: dad62988 task.ti: dad62988
[8510461.838524] Stack:
[8510461.839389] Call Trace:
[8510461.841054] Code: 5d c3 8d 76 00 8d bc 27 00 00 00 00 55 8b 00 89 e5 c1 e8 0e 25 00 0c 00 00 f0 ff 80 c0 a9 b4 c1 71 09 f0 ff 88 c0 a9 b4 c1 cd 04 <5d> c3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 8b 10 83
[8510461.849175] CPU: 7 PID: 10783 Comm: nrpe Tainted: G S      W     3.15.2-hardened-infra29 #1
[8510461.849175] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[8510461.853304] CPU: 1 PID: 10789 Comm: nrpe Tainted: G S      W     3.15.2-hardened-infra29 #1
[8510461.853305] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[8510461.853307] task: f3857890 ti: f3857c08 task.ti: f3857c08
[8510461.853325] Stack:
[8510461.853336] Call Trace:
[8510461.853403] Code: 5d c3 8d 76 00 8d bc 27 00 00 00 00 55 8b 00 89 e5 c1 e8 0e 25 00 0c 00 00 f0 ff 80 c0 a9 b4 c1 71 09 f0 ff 88 c0 a9 b4 c1 cd 04 <5d> c3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 8b 10 83
[8510461.868966] task: f1afd690 ti: f1afda08 task.ti: f1afda08
[8510461.869021] Stack:
[8510461.870477] Call Trace:
[8510461.870928] Code:[8510461.873338] CPU: 5 PID: 10791 Comm: nrpe Tainted: G S      W     3.15.2-hardened-infra29 #1
[8510461.873339] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[8510461.873342] task: f3770b90 ti: f3770f08 task.ti: f3770f08
[8510461.873360] Stack:
[8510461.873381] Call Trace:
[8510461.873382] [8510461.873388] [8510461.873392] [8510461.873395] [8510461.873398] [8510461.873401] [8510461.873405] [8510461.873409] [8510461.873412] [8510461.873416] [8510461.873419] [8510461.873421] [8510461.873424] [8510461.873426] [8510461.873430] [8510461.873434] [8510461.873437] [8510461.873440] [8510461.873443] [8510461.873445] [8510461.873449] Code:[8510461.878684] CPU: 4 PID: 10794 Comm: sh Tainted: G S      W     3.15.2-hardened-infra29 #1
[8510461.878684] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[8510461.878684] task: f1f936d0 ti: f1f93a48 task.ti: f1f93a48
[8510461.878684] Stack:
[8510461.878684] Call Trace:
[8510461.878684] [8510461.878684] [8510461.878684] [8510461.878684] [8510461.878684] [8510461.878684] [8510461.878684] [8510461.878684] [8510461.878684] [8510461.878684] [8510461.878684] [8510461.878684] [8510461.878684] [8510461.878684] [8510461.878684] [8510461.878684] [8510461.878684] [8510461.878684] [8510461.878684] [8510461.878684] [8510461.878684] [8510461.878684] Code:[8510461.881332] CPU: 4 PID: 10790 Comm: nrpe Tainted: G S      W     3.15.2-hardened-infra29 #1
[8510461.881333] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[8510461.881335] task: f1aa7190 ti: f1aa7508 task.ti: f1aa7508
[8510461.881353] Stack:
[8510461.881374] Call Trace:
[8510461.881375] [8510461.881380] [8510461.881383] [8510461.881386] [8510461.881390] [8510461.881393] [8510461.881396] [8510461.881400] [8510461.881403] [8510461.881405] [8510461.881408] [8510461.881410] [8510461.881413] [8510461.881416] [8510461.881420] [8510461.881422] [8510461.881424] [8510461.881427] [8510461.881430] [8510461.881433] Code:
[8510461.929552] CPU: 1 PID: 10797 Comm: sudo Tainted: G S      W     3.15.2-hardened-infra29 #1
[8510461.931556] CPU: 4 PID: 10796 Comm: sh Tainted: G S      W     3.15.2-hardened-infra29 #1
[8510461.931556] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[8510461.931556] task: f399e990 ti: f399ed08 task.ti: f399ed08
[8510461.931556] Stack:
[8510461.931556] Call Trace:
[8510461.931556] Code: 5d c3 8d 76 00 8d bc 27 00 00 00 00 55 8b 00 89 e5 c1 e8 0e 25 00 0c 00 00 f0 ff 80 c0 a9 b4 c1 71 09 f0 ff 88 c0 a9 b4 c1 cd 04 <5d> c3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 8b 10 83
[8510461.944968] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[8510461.949034] task: f3771690 ti: f3771a08 task.ti: f3771a08
[8510461.949410] CPU: 7 PID: 10792 Comm: nrpe Tainted: G S      W     3.15.2-hardened-infra29 #1
[8510461.949411] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[8510461.949413] task: f1f93150 ti: f1f934c8 task.ti: f1f934c8
[8510461.949431] Stack:
[8510461.949441] Call Trace:
[8510461.949501] Code: 5d c3 8d 76 00 8d bc 27 00 00 00 00 55 8b 00 89 e5 c1 e8 0e 25 00 0c 00 00 f0 ff 80 c0 a9 b4 c1 71 09 f0 ff 88 c0 a9 b4 c1 cd 04 <5d> c3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 8b 10 83
[8510461.964975] Stack:
[8510461.965843] Call Trace:
[8510461.966886] Code: 5d c3 8d 76 00 8d bc 27 00 00 00 00 55 8b 00 89 e5 c1 e8 0e 25 00 0c 00 00 f0 ff 80 c0 a9 b4 c1 71 09 f0 ff 88 c0 a9 b4 c1 cd 04 <5d> c3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 8b 10 83
[8510461.977168] CPU: 4 PID: 10787 Comm: nrpe Tainted: G S      W     3.15.2-hardened-infra29 #1
[8510461.977168] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[8510461.980960] task: ee1b4750 ti: ee1b4ac8 task.ti: ee1b4ac8
[8510461.982513] Stack:
[8510461.983369] Call Trace:
[8510461.984955] Code: 5d c3 8d 76 00 8d bc 27 00 00 00 00 55 8b 00 89 e5 c1 e8 0e 25 00 0c 00 00 f0 ff 80 c0 a9 b4 c1 71 09 f0 ff 88 c0 a9 b4 c1 cd 04 <5d> c3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 8b 10 83
[8510464.657127] CPU: 1 PID: 10487 Comm: apache2 Tainted: G S      W     3.15.2-hardened-infra29 #1
[8510464.659616] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[8510464.660912] task: c4e241d0 ti: c4e24548 task.ti: c4e24548
[8510464.662478] Stack:
[8510464.664924] Call Trace:
[8510464.665986] Code: 5d c3 8d 76 00 8d bc 27 00 00 00 00 55 8b 00 89 e5 c1 e8 0e 25 00 0c 00 00 f0 ff 80 c0 a9 b4 c1 71 09 f0 ff 88 c0 a9 b4 c1 cd 04 <5d> c3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 8b 10 83
[8510464.705046] CPU: 1 PID: 10487 Comm: apache2 Tainted: G S      W     3.15.2-hardened-infra29 #1
[8510464.705046] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[8510464.708909] task: c4e241d0 ti: c4e24548 task.ti: c4e24548
[8510464.710465] Stack:
[8510464.712923] Call Trace:
[8510464.713974] Code: 5d c3 8d 76 00 8d bc 27 00 00 00 00 55 8b 00 89 e5 c1 e8 0e 25 00 0c 00 00 f0 ff 80 c0 a9 b4 c1 71 09 f0 ff 88 c0 a9 b4 c1 cd 04 <5d> c3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 8b 10 83
[8510464.733365] CPU: 1 PID: 10487 Comm: apache2 Tainted: G S      W     3.15.2-hardened-infra29 #1
[8510464.736910] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[8510464.740908] task: c4e241d0 ti: c4e24548 task.ti: c4e24548
[8510464.742474] Stack:
[8510464.748921] Call Trace:
[8510464.748999] Code: 5d c3 8d 76 00 8d bc 27 00 00 00 00 55 8b 00 89 e5 c1 e8 0e 25 00 0c 00 00 f0 ff 80 c0 a9 b4 c1 71 09 f0 ff 88 c0 a9 b4 c1 cd 04 <5d> c3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 8b 10 83
[8510464.777119] CPU: 4 PID: 10698 Comm: apache2 Tainted: G S      W     3.15.2-hardened-infra29 #1
[8510464.777119] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[8510464.780908] task: c0a8a2d0 ti: c0a8a648 task.ti: c0a8a648
[8510464.782463] Stack:
[8510464.783325] Call Trace:
[8510464.785033] Code: 5d c3 8d 76 00 8d bc 27 00 00 00 00 55 8b 00 89 e5 c1 e8 0e 25 00 0c 00 00 f0 ff 80 c0 a9 b4 c1 71 09 f0 ff 88 c0 a9 b4 c1 cd 04 <5d> c3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 8b 10 83
[8510468.717848] CPU: 4 PID: 10799 Comm: nrpe Tainted: G S      W     3.15.2-hardened-infra29 #1
[8510468.720844] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[8510468.724841] task: f3771690 ti: f3771a08 task.ti: f3771a08
[8510468.726397] Stack:
[8510468.727256] Call Trace:
[8510468.728249] Code: 5d c3 8d 76 00 8d bc 27[8510468.731036] CPU: 1 PID: 10802 Comm: sh Tainted: G S      W     3.15.2-hardened-infra29 #1
[8510468.731036] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[8510468.731036] task: c0e29210 ti: c0e29588 task.ti: c0e29588
[8510468.731036] Stack:
[8510468.731036] Call Trace:
[8510468.731036] [8510468.731036] [8510468.731036] [8510468.731036] [8510468.731036] [8510468.731036] [8510468.731036] [8510468.731036] [8510468.731036] [8510468.731036] [8510468.731036] [8510468.731036] [8510468.731036] [8510468.731036] [8510468.731036] [8510468.731036] [8510468.731036] [8510468.731036] [8510468.731036] [8510468.731036] Code:[8510468.732466] CPU: 3 PID: 10800 Comm: nrpe Tainted: G S      W     3.15.2-hardened-infra29 #1
[8510468.732467] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[8510468.732470] task: c0a8a2d0 ti: c0a8a648 task.ti: c0a8a648
[8510468.732485] Stack:
[8510468.732502] Call Trace:
[8510468.732503] [8510468.732508] [8510468.732511] [8510468.732514] [8510468.732517] [8510468.732520] [8510468.732523] [8510468.732526] [8510468.732529] [8510468.732533] [8510468.732535] [8510468.732538] [8510468.732541] [8510468.732545] [8510468.732548] Code: