Project:Infrastructure/Shopping list

From Gentoo Wiki
Jump to:navigation Jump to:search

This page is used to list things on infra's shopping list. The goal is to solicit feedback from the community on available options and pricing.

As of June 2022, the necessary purchases have been made and the hardware has been installed at OSUOSL.

Switch

Networking Executive Summary

Replace dead & loaner network hardware, with new systems capable of supporting future needs.

Rationale

We have a temporary loaner switch at the Oregon State University Open Source Lab (OSUOSL), this is because the Cisco WS-C2970G-24T-E switch purchased in 2011 had failed during 2020. OSUOSL wants their loaner back, so we need new switchgear. The old switch provided multiple VLANs, which covered both OOB/IPMI as well as public/normal traffic.

Additionally, the hypervisor systems have high-bandwidth needs, and are presently using cross-over 10Gbit networking only, making it impossible to expand to more hypervisor systems or offer 10Gbit-or-better service to other hardware.

See switch discussion in Discussion page about potential FS.com switches


Requirements:

  • At least 12x 10GBASE-T ports
  • Transciever-based ports (SFP/SFP+/QSFP+/SFP28) not presently in use, but maybe for future expansion
  • Present 10GBASE-T hosts
    • oriole & ovenbird each have 2x 10GBASE-T
    • catbus has 4x 10GBASE-T
  • Future Hosts:
    • Ganeti replacements (2-3 hosts): at least 2x (10GBASE-T or SFP+ or SFP28 or QSFP28)
  • Remote management
  • VLAN support

Nice to have:

  • Bonding/trunking
  • redundant PSU: this would only be for the PSU failing, we don't have redundant power feeds in the rack (both breakers come from the same circuit)
  • 25G/40G options

Proposal

  • 1x FS.com S5860-20SQ (high-speed systems) - $1600 USD
  • One of:
    • 1x FS.com S3910-24TF (OOB/IPMI/embedded/non-10G systems) - $369 USD
    • 1x FS.com S3910-24TS (OOB/IPMI/embedded/non-10G systems), adds 10G uplink - $769 USD
  • Various SFP+/SFP28/QSFP28 transceivers (DAC/AOC/regular)
  • Optical Cabling
    • Wild Estimate: $500 USD total
  • Estimated total: $4159-$4559 USD

Transceiver & cable planning

10G-or-better switch: S5860-20SQ
  • new boxes:
    • switch: 2x SFP28 25G
    • host: 2x SFP28 25G
    • cable: 2x fiber or DAC
  • oriole, ovenbird (combined)
    • This replaces the old cross-over 10G setup w/ 1G uplink
    • host: N/A
    • switch: 4x SFP+ 10GBASE-T
    • cable: 4x 10GBASE-T CAT6A-or-better
  • catbus
    • host: N/A
    • switch: 1x SFP+ 10GBASE-T
    • cable: 1x CAT6A-or-better
  • link to 1G switch
    • S3910-24TS:
    • S5860-20SQ:
    • cable: DAC?
1G switch: S3910-24TS
  • guppy
  • muta

Computing

Computing Executive Summary

Gentoo has some aging hardware and low capacity at the OSUOSL. Infrastructure proposes generic computing boxes to run virtual machines hosting Gentoo services for North American access.

Rationale

Many Gentoo production services hosted OSL are presently run on 2 hypervisor hosts: oriole & ovenbird, which were purchased in June 2016. The hypervisors have absorbed many of the older hosts, but have exceeded capacity for N-1 failover (e.g. in the event of hardware failure, less-critical virtual machines will not run to save resources).

While the systems still have slow disk available, most SSD and RAM is utilized, the CPUs allocated to some VMs are maxed frequently, even with oversubscription.

There is still aging hardware, at risk of failure, that cannot be merged due to insufficient hypervisor capacity.

Specifically, these hosts are the oldest: - dipper.gentoo.org (Dell R415, masterdistfiles.g.o, 2011 or older, 32GB RAM, 8TB usable HDD, no SSD, frequently IO-bound) - jacamar.gentoo.org (Supermicro system, runs CI, 2013 or older, date uncertain, was donated to Gentoo, 64GB RAM, 2TB usable SSD, 2TB usable HDD) - vulture.gentoo.org (Dell SC1435, GSoC system, 2006 or older)

There is also demand for more CI ability from developers, that is not well-served by other systems. Gentoo did not get a renewal of AWS open source credits this fiscal year.

The systems run at high utilization, and are estimated to NOT be cost-effective to run on Cloud services.

Infrastructure team proposes buying 2 or 3 new systems, running as hypervisors: - absorb the above old hardware - migrate the most important Gentoo services from the older hypervisors to the new systems - consider minor RAM & SSD upgrades to the older hypervisor systems, if cost-effective. - use the older hypervisor systems for redundancy and additional CI capacity. - run older hypervisors until failure or power/rack-space requires removal


Goal

We don't need the fastest beefiest boxes. Primarily we need:

  • fast storage (SSD/NVM-e): almost all our operations are IO-bound.
  • Power Optimized: We are constrained on power in this hosting location; so don't want the fastest power-sucking threadripper; would rather take a low/mid range core with better power curves.
  • space optimized: 1-2U preferable.
  • We'd have 4-5 moderately sized units than 1-2 huge beefy units, esp. using Ganeti where the failure domain can be contained.

Hard requirements

  1. 10Gbit-or-faster ethernet ports
    1. soft preference for 10GBASE-T since there are no SFP+ systems in place so far, but SFP28 looks promising
  2. >5TB usable storage on each system (after any RAID [MD,LVM,ZFS])
    1. 15T usable storage pre-raid.
  3. 4 or more NVMe slots: U.2(2.5") / E1.S / E1.L / E3.*
    1. Robin suggests something like https://ark.intel.com/content/www/us/en/ark/products/186674/intel-ssd-d5p4326-series-15-36tb-2-5in-pcie-3-1-x4-3d2-qlc.html
  4. Dual PSU
  5. OS Boot disk: If the chassis has dedicated 2xM.2 slots, then yes, populate them w/ OS disks.

Migrate to VM:

  • vulture.gentoo.org (GSOC box), a Dell SC1435, 4GB RAM, dual opteron 2210, 2xST3500630NS (500 GB 7200 RPM spinning rust.)

Replacing:

  • jacamar.gentoo.org (CI box)
    • whitebox supermicro
    • 64GB ram
    • dual opteron 6272's
    • 2x Samsung 860 1T (RAID1)
    • 4x WD 1T (RAID10)
    • 2x WD mixed (old disks from other hosts)
    • sucks power
    • writes 100GiB/day
  • dipper.gentoo.org (masterdistfiles.gentoo.org):
    • dellpoweredge R415
    • 2x6 core opterons
    • 32G RAM
    • 8T usable disk available, ~5T used
    • 4x Seagate ST33000651AS in HWRAID5
    • Writes 200GiB/day


  • oriole.gentoo.org & ovenbird.gentoo.org (hypervisors in Ganeti cluster)
    • whitebox supermicro (2U SuperServer 2028TP-DECTR)
    • 128GB ram (each)
    • 2x E5-2620 v4 (each)
    • Disk (each):
      • 2x Samsung 850 1T (RAID1)
      • 4x Seagate ST2000NX0273 2T (mixed RAID1 LVs over the disks)
      • ~5T worth of data present
    • Writes 1TiB/day (each)

Recommendations (unaudited)