SSD
This article provides guidelines for basic maintenance, such as enabling discard/trim support, for SSDs (Solid State Drives) on Linux. It presumes the user has a basic understanding of partitioning and formatting disk drives.
Introduction
The term Solid State Drive is commonly used for flash-based block devices. Compared to conventional HDD, flash-based technology offers a much faster access time, lower latency, silent operation, power savings (no moving parts), and more. However, the flash-based technology brings a few issues which require some special system attention and care.
Dealing with empty blocks
Generally, traditional filesystems do not erase deleted data blocks but only flags them as such. Due to nature of flash memory cells any write operation has to be done to empty cells only. Thus writing to physically non-empty cells, flagged as deleted by a filesystem, requires their erasure which makes the operation slower than writing to empty cells. This problem is further amplified by hardware limitations. Futhermore, the data still remains on the storage media, even when it is flagged as deleted, which is a non-issue on conventional storage media such as HDD. On SDD on the other hand, storing unused (deleted) data severely limits the availability of empty cells.
For modern kernels it is possible to hint the deleted (not-used) data blocks to SSD. The described mechanism is called discard. Names of implementations differ — TRIM for ATAPI, UNMAP for SCSI, Deallocate for NVMe; MMC and SD cards (although not contained in the term "SSD" technically sharing the same technology: non-volatile flash memory) distinguish between TRIM and ERASE. Filesystem's support is required in order to use discard. Majority of modern filesystems (like ext4[1], XFS[2], Btrfs[3], or bcachefs) support discard, and it has been implemented for traditional (existing, "old") filesystems as well (e.g. FAT, or NTFS). Also there are filesystems developed primarily for flash-based devices, such as F2FS.
There are two basic approaches to issue the discard command — using mount discard
option (-o discard
) for continuous discard[4] or periodic calls of fstrim utility[5]. Not all filesystems support both methods.
Slowing wear out
Each write operation performed on a NAND flash cell causes its wear. This fact limits the SSD lifespan. The cell endurance varies with used technology[6]. On the other hand, read operations are straightforward and do not cause cell wear.
A basic method increasing SSD lifespan is to uniformly distribute writes across all the blocks. This method is called wear leveling and is deployed via SSD firmware.
From system point of view, it is appropriate to generally reduce amount of writes.
Considerations
Discard (trim) support
Device's support of discard (sometimes referred to as trim) should be verified before performing any form of discarding on the drive.
It is possible to use lsblk utility from sys-apps/util-linux:
user $
lsblk --discard
NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO sda 0 512B 2G 0 ├─sda1 0 512B 2G 0 ├─sda2 0 512B 2G 0 └─sda3 0 512B 2G 0 sdb 0 0B 0B 0 └─sdb1 0 0B 0B 0
A device supporting discard has non-zero values in the columns of DISC-GRAN
(discard granularity) and DISC-MAX
(discard max bytes). In the example listing above, the /dev/sda device supports discard while /dev/sdb does not.
Performing discard on a device that does not support it is potentially unsafe.
Initial setup
Partitioning
Sizes of SSD internal data structures (blocks and pages) varies across different devices. Filesystems operates on data structures of different sizes. For optimal performance filesystem data structures should aim not to cross boundaries of underlying SSD internal data structures. Thus effectively minimizing the number of required internal SSD operations. This can be achieved by aligning start of each partition — the common alignment is to 1 MiB.
Both parted and fdisk partitioning utilities support partition alignment. For parted, there is -a optimal
option. Recent versions of fdisk should use optimal alignment by default[7].
It is possible to easily check the alignment for given partition using parted:
root #
parted /dev/sda
(parted) align-check optimal 1 1 aligned
For further details about the partitioning, follow dedicated handbook chapter.
blkdiscard
blkdiscard utility from sys-apps/util-linux-2.23 (or later) discards all data blocks on given device.
All data on the discarded device will be lost!
LVM
LVM aligns to MiB boundaries and passes discards to underlying devices by default. No additional configuration is required.
In order to discard all unused space in a Volume Group (VG) use the blkdiscard utility:
root #
lvcreate -l100%FREE -n trim yourvg
root #
blkdiscard /dev/yourvg/trim
root #
lvremove yourvg/trim
Alternatively, there is a discard option in lvm.conf which makes LVM discard entire Logical Volume (LV) on lvremove, lvreduce, pvmove and other actions that free Physical Extents (PE) in a VG.
Enabling it will immediately render the system unable to undo any changes to the LV layout.
devices {
issue_discards = 1
}
dm-crypt/LUKS
For discards to pass through full encrypted devices (dm-crypt/LUKS), they have to be opened with the --allow-discards
option.
root #
cryptsetup luksOpen --allow-discards /dev/thing luks
When root-device exists on LUKS, enabling discards depends on the initramfs implementation. When using genkernel for creating your initramfs, pass the following kernel option:
GRUB_CMDLINE_LINUX_DEFAULT="root_trim=yes"
When using dracut for creating the initramfs, pass the following kernel option:
GRUB_CMDLINE_LINUX_DEFAULT="rd.luks.allow-discards"
To evaluate if discard is enabled on a LUKS device, check if the output of the following command contains the string allow_discards
:
root #
dmsetup table /dev/mapper/crypt_dev --showkeys
Formatting
Similarly to partitions, performance can be improved if a filesystem is configured the way it can align its data structures with device's internal structures sizes — namely its erase block size.
This configuration gets important in case of a software RAID, when one really should know the erase block size[8]. Consider this information when making a purchase.
Configuring for erase block size
When device's erase block size is known, it can be used when creating a filesystem.
For example for ext4 using mkfs.ext4 on an average-sized partition, it will apply 4KiB blocks[9]. Using -E stride
and -E stripe-width
options, it is possible to set the alignment to erase block size. Both options should be set as erase block size / block size.
For a drive with 512KiB erase block size, it makes 512KiB / 4KiB = 128:
root #
mkfs.ext4 -E stride=128,stripe-width=128 /dev/sda3
List of devices with known erase block sizes
- OCZ drives; stride an stripe-width are 128
Erase block size is 512KiB[10]
- Crucial M500 240GB; stride and stripe-width are 2048
Page size is 16KiB, there are 512 pages per block[11]. 16KiB * 512 = 8192KiB for erase block size. 8192KiB / 4KiB = 2048 for stride and stripe-width size.
- SanDisk z400s; stride an stripe-width are 4096
According to Dutch customer care service from SanDisk the erase block size = 16KiB.
Mounting
For rootfs it is usually recommended to periodically use fstrim utility. Using the discard
mount option results in continuous discard that could potentially cause degradation of older or poor-quality SSDs[5].
The following command can be used manually or be setup as a periodic job to run once a week[12]:
root #
fstrim -v /
Not every filesystem driver supports fstrim. Examples are ntfs3 and bcachefs, which only support the
discard
mount option. On a Btrfs system, running the fstrim command on any mounted subvolume will perform the discard command on the device.For mount points with a low amount of disk writes occurring on a SSD it should be safe to use the discard
mount option in /etc/fstab. Also it is recommended to use the mount option when maintaining performance is required[13].
Given the considerations above, a discard-enabled /etc/fstab could look like this:
/dev/sda3 /mnt/archive ext4 defaults,relatime,discard 0 1
Once the /etc/fstab has been modified, remount all filesystems mentioned there via:
root #
mount -a
Not every filesystem driver supports the
discard
mount option, although the majority do. As an example, ntfs-3g can only be trimmed with fstrim.Additional configuration
Periodic fstrim jobs
There are multiple ways how to setup a periodic block discarding process. As of 2018, the default recommended frequency is once a week[12].
cron
Run fstrim on all mounted devices that support discard on a weekly basis:
# Mins Hours Days Months Day of the week Command
15 13 * * 1 /sbin/fstrim --all
Similarly, it is possible to run fstrim only for a selected mount point:
# Mins Hours Days Months Day of the week Command
15 13 * * 1 /sbin/fstrim -v /
SSDcronTRIM
There is also a semi-automatic cron job available on GitHub called SSDcronTRIM which has the following features:
- Distribution independent script (developed on a Gentoo system).
- The script decides every time depending on the disk usage how often (monthly, weekly, daily, hourly) each partition has to be trimmed.
- Recognizes if it should install itself into /etc/cron.{monthly,weekly,daily,hourly}, /etc/cron.d or any other defined directory and if it should make an entry into crontab.
- Checks if the kernel meets the requirements, the filesystem is able to and if the SSD supports trimming.
SSDcronTRIM-LUKS
There is also a semi-automatic cron job available on GitHub called SSDcronTRIM-LUKS with dm-crypt/LUKS support.
systemd timer
sys-apps/util-linux on systemd-enabled systems comes with a timer unit executing a weekly fstrim. Enable it with:
root #
systemctl enable fstrim.timer
Without cron, on system shutdown (with OpenRC)
A /etc/local.d script may be used to trim on poweroff on Fridays:
# From
# https://fitzcarraldoblog.wordpress.com/2018/01/13/running-a-shell-script-at-shutdown-only-not-at-reboot-a-comparison-between-openrc-and-systemd/
if [ `who -r | awk '{print $2}'` = "0" ] && [ "$(date +%a)" = "Fri" ]; then
echo /etc/local.d/trim.stop: run SSD trim
fstrim / --verbose
sleep 5
fi
Reducing amount of writes
The flash-based SSDs have a limited write lifetime - the number of writes performed[6]. Thus when using a SSD, administrators generally want to reduce the amount of writes.
Portage TMPDIR on tmpfs
When building packages via Portage it is possible to perform the operations in RAM by using a tmpfs or zram mount. This has the theoretical benefit of reducing writes to the SSD. See Portage TMPDIR on tmpfs (or zram) guide.
Temporal files on tmpfs
Remember that all data in tmpfs reside in volatile memory. So data on tmpfs will be lost after system reboot, shutdown or crash!
It is possible to mount desired mount points as tmpfs. Since tmpfs stores files in volatile memory all the I/O operations directed to the given mount points are not performed on the solid state disk. This reduces the amount of writes and also improves performance.
This is an example of both /tmp and /var/tmp being mounted as tmpfs:
# temporal mountpoints on tmpfs
tmpfs /tmp tmpfs size=16G,noatime 0 0
tmpfs /var/tmp tmpfs size=1G,noatime 0 0
Systemd-based systems mount /tmp as tmpfs by default. Therefore an explicit /etc/fstab entry is required only for changing the default mount options. Currently used options can be reviewed by:
user $
findmnt | grep '/tmp'
/var/tmp will typically also be used by Portage (under the /var/tmp/portage directory). If its size is too small, it will lead to build errors. Refer to Portage TMPDIR on tmpfs for appropriately sizing /var/tmp.
XDG cache on tmpfs
When running a Gentoo desktop, many programs, using X Window System (Chromium, Firefox, Skype, etc.) make frequent disk I/O every few seconds to cache[14].
The cache directory location usually complies to XDG Base Directory Specification[15], namely to the XDG_CACHE_HOME environment variable. The default cache location is ~/.cache, which is usually mounted on a hard drive and could be moved to tmpfs.
To remap the cache directory location create a script that exports to directory under /run:
if [ ${LOGNAME} ]; then
export XDG_CACHE_HOME="/run/user/${UID}/cache"
fi
Web browser profile(s) and cache on tmpfs
The web browser profile/s, cache, etc. can be relocated to tmpfs. The corresponding I/O associated with using the browser gets redirected from the SSD drive to tmpfs' volatile memory, resulting in reduced wear to the physical drive and also improving browser speed and responsiveness.
It is possible to relocate the browser components mentioned above with the utility www-misc/profile-sync-daemon:
root #
emerge --ask www-misc/profile-sync-daemon
systemd
Close all the browsers, start and enable the daemon:
user $
systemctl --user enable --now psd
Now it is possible to view all symlinks by printing the status of the started daemon:
user $
psd p
OpenRC
Next add the users whose browser(s) profile(s) will get symlinked to a tmpfs or another mountpoint in the variable USERS:
USERS="user user2 root"
Finally, close all the browsers, start and enable the daemon:
root #
rc-update add psd default
root #
rc-service psd start
Now it is possible to view all symlinks by printing the status of the started daemon:
user $
psd p
See also
- HDD — describes the setup of an internal SATA or PATA (IDE) rotational hard disk drive.
- NVMe — flash memory chips connected to a system via the PCI-E bus.
External resources
- Aligning an SSD on Linux — Drives internal structures explained.
- Aligning filesystems to an SSD’s erase block size — Aligning explained by Ted T'so.
- Magic soup: ext4 with SSD, stripes and strides — ext4 aligning discussion
- Arch Linux Profile-Sync-Daemon
References
- ↑ Performance of TRIM command on ext4 filesystem, people.redhat.com. Retrieved on October 29, 2018
- ↑ FITRIM/discard, XFS.org. Retrieved on October 29, 2018
- ↑ FAQ - btrfs Wiki, btrfs.wiki.kernel.org. Retrieved on October 29, 2018
- ↑ mount(8) - Linux manual page, man7.org. Retrieved on October 29, 2018
- ↑ 5.0 5.1 fstrim(8) - Linux manual page, man7.org. Retrieved on October 29, 2018
- ↑ 6.0 6.1 Hard Drive - Why Do Solid State Devices (SSD) Wear Out, Dell. Retrieved on October 29, 2018
- ↑ fdisk(8) - Linux manual page, man7.org. Retrieved on October 31, 2018
- ↑ RAID setup - Linux Raid Wiki, wiki.kernel.org. Retrieved on November 1, 2018
- ↑ mke2fs(8) - Linux manual page, man7.org. Retrieved on November 1, 2018
- ↑ Partition Alignment Spreadsheet, techpowerup.com. Retrieved on November 1, 2018
- ↑ The Crucial/Micron M500 Review (960GB, 480GB, 240GB, 120GB)
- ↑ 12.0 12.1 fstrim.timer\sys-utils - util-linux/util-linux.git - The util-linux code repository, kernel.org. Retrieved on October 30, 2018
- ↑ 2.4. Discard unused blocks, Red Hat. Retrieved on October 30, 2018
- ↑ Firefox is eating your SSD - here is how to fix it, Loyolan Ventures. Retrieved on October 28, 2018
- ↑ XDG Base Directory Specification, freedesktop.org. Retrieved on October 28, 2018