Posted on Tue 22 October 2024

SSD NAS

Recently I decided to build a NAS to store my RAW photos as well as video content, and to use as a backup target for my MacBook. I wanted something that I could use directly with low latency for interactive editing over 10 GBit ethernet, so spinning disks were out.

However, building an all SSD solution turned out to be surprisingly tricky - the old SATA standard is much too slow for modern SSDs, but getting more than a few PCIe NVMe ports on a consumer mainboard is quite difficult (I wanted to avoid server boards to minimize power consumption). On top of this, SATA SSDs tend to be more expensive as well!

In the end I settled on an Asus Prime X670E-PRO mainboard with 4 NVMe slots, combined with an Asus Hyper M.2 X16 card for a futher 4 slots - giving me 8 NVMe slots in total, enough to make RAIDZ2 worthwhile. I used eight Crucial P3 Plus NVMe SSDs to fill the slots, chosen for being the cheapest high-quality NVMe SSDs at the time of my purchase:

X670E motherboard and 4x PCIe card with 8 SSDs installed

I used fio to confirm the maximum bandwidth to each device, using the example commands from the Google Cloud documentation:

sudo fio --name=write_bandwidth_test \
  --filename=/dev/nvme${i}n1 --filesize=4000G \
  --time_based --ramp_time=2s --runtime=1m \
  --ioengine=libaio --direct=1 --verify=0 --randrepeat=0 \
  --numjobs=4 --thread --offset_increment=500G \
  --bs=1M --iodepth=64 --rw=write \
  --iodepth_batch_submit=64  --iodepth_batch_complete_max=64

As expected, all SSDs reached around 4 GiB/s, except the one SSD in the PCIe 3.0 x4 slot of the motherboard, which topped out around 2.7 GiB/s.

I also benchmarked writing to all eight SSDs simultaneously:

  --filename=/dev/nvme1n1:/dev/nvme2n1:/dev/nvme3n1:/dev/nvme4n1:/dev/nvme5n1:/dev/nvme6n1:/dev/nvme7n1:/dev/nvme0n1

This hovered around 23.5GiB/s, close to the 30 GiB/s expected from writing to all disks independently - more than enough for my purpose.

ZFS - RAIDZ2

With the basics out of the way, it was time to test performance of reads and writes through ZFS. For simplicity I started with the default ZFS configuration suggested by TrueNAS SCALE.

To make my NAS more resilient and avoid common failure cases, my goal was to use ZFS with RAIDZ2. This stripes data across all disks in the pool to allow partial or complete failure of up to two disks without any data loss, at the cost of some storage capacity. In my case of eight SSDs with 3.64 TiB each I ended up with 20.57 TiB usable capacity, or roughly 70% - a cheap price to pay for so much more robustness!

Initial performance exceeded my expectations - I'm sure this can be optimized further, but it's already more than fast enough for network shares:

TEST_DIR=/mnt/data/benchmark

Sequential Write

sudo fio --name=write_throughput --directory=$TEST_DIR --numjobs=16 \
--size=10G --time_based --runtime=1m --ramp_time=2s --ioengine=libaio \
--direct=1 --verify=0 --bs=1M --iodepth=64 --rw=write \
--group_reporting=1 --iodepth_batch_submit=64 \
--iodepth_batch_complete_max=64

IOPS=5491, BW=5495MiB/s (5762MB/s)

Write throughput seems entirely CPU bound, all cores are fully utilized during this benchmark. No difference when re-testing with the default LZ4 compression disabled.

Random Write

sudo fio --name=write_iops --directory=$TEST_DIR --size=10G \
--time_based --runtime=1m --ramp_time=2s --ioengine=libaio --direct=1 \
--verify=0 --bs=4K --iodepth=256 --rw=randwrite --group_reporting=1  \
--iodepth_batch_submit=256  --iodepth_batch_complete_max=256

IOPS=44.1k, BW=172MiB/s (181MB/s)

In RAIDZ, IOPS are limited to the IOPS of the single slowest drive, as the chunks of a record need to be written to all drives.

Sequential Read

sudo fio --name=read_throughput --directory=$TEST_DIR --numjobs=16 \
--size=10G --time_based --runtime=1m --ramp_time=2s --ioengine=libaio \
--direct=1 --verify=0 --bs=1M --iodepth=64 --rw=read \
--group_reporting=1 \
--iodepth_batch_submit=64 --iodepth_batch_complete_max=64

IOPS=6635, BW=6653MiB/s (6976MB/s)

Random Read

sudo fio --name=read_iops --directory=$TEST_DIR --size=10G \
--time_based --runtime=1m --ramp_time=2s --ioengine=libaio --direct=1 \
--verify=0 --bs=4K --iodepth=256 --rw=randread --group_reporting=1 \
--iodepth_batch_submit=256  --iodepth_batch_complete_max=256

IOPS=99.7k, BW=389MiB/s (408MB/s)

Part List

If you want to replicate my build, the parts I used:

Amazon links to disambiguate models, any affiliate proceeds will go to charity.

Tags: hardware, server, nas

© Julian Schrittwieser. Built using 開板. Theme by Giulio Fidente on github. .