NOTE:
This project is now complete! Click here for details!
File Server
I've been fascinated by high-volume, fault-tolerant data storage systems for several years now, which led me to add a small hardware-based RAID 5 array to my desktop (followed by another, slightly larger RAID 5) and to plan out several dedicated storage servers that I'd like to build at some point in the year. Obviously, the hardware and software will change between now and then, but it's probably a safe bet to assume those changes will be iterative in nature and the newer versions can be dropped into this outline without changing much. Of course, one of the biggest changes will be in the size of the hard drives. I've built this system around 8TB drives, which (hopefully) will be around $300 a piece when I build this system in another year. I'm leaning towards WD Red Pro drives, mostly because of the extra 2 years on the warrenty over the consumer WD Red drives (5 years vs. 3). It's possible that 10TB WD Red Pro drives will be reasonably priced within a year, but regardless of size, I have $300/drive budgeted. Note that I already have 8 4TB WD Red drives, so I plan to purchase 4 more of those to make 12 and add them to the pool of 8-10 TB drives for a total of 24.
Predictions aside, my objectives for this build are maximum storage space in a 4U chassis with reasonable fault tolerance and performance. I discuss the fault tolerance aspect in depth a bit further down, as there are many trade-offs here, but for performance, I'll consider maxing out a gigabit line on read sufficient. There are other builds that I planned out which offered much more capacity in a 4U chassis (using a BackBlaze-inspired configuration), but the increased capacity was outweighed by the price for the chassis and the performance tradeoffs from using custom-made port expanders and crazy HiPoint 48-port SATA cards.
Hardware
I'll start by listing the hardware I have planned and then discuss each part and how I plan to configure everything. I will note here that I plan on running the FreeBSD-based FreeNAS on this server.
Part | Make/Model | Qty. | Price per | Total |
Chassis | SuperMicro SC846 (used) | 1 | $250 | $250 |
CPU | Intel Xeon E5-1630v3 | 1 | $375 | $375 |
Motherboard | SuperMicro X10SRA | 1 | $270 | $270 |
RAM | Crucial 64GB Quad-Channel Kit (ECC/Unbuffered/CL15) |
1 | $400 | $400 |
SAS Expander | IBM M1015 (in IT mode) | 1 | $120 | $120 |
Data Drive | 8-10TB WD Red | 12 | $300 | $3600 |
Data Drive | 4TB WD Red | 4 | $150 | $600 |
Replacement PSU | Corsair HX1000i | 1 | $240 | $240 |
Replacement Backplane | SuperMicro BPN-SAS2-846EL1 | 1 | $200 | $200 |
Boot Device | 8GB USB flash drive | 1 | $15 | $15 |
UPS | APC 1500VA (used) | 1 | $200 | $200 |
Rack | 42U (used; craigslist) | 1 | $100 | $100 |
total | $6370 |
Chassis: I decided on a SuperMicro chassis over the Norco 4224 for a few reasons:
- The overall build quality of the SuperMicro is far superior, which will make working on the server much easier,
- The backplane in the SuperMicro is also higher quality, which will result in higher drive pool reliability,
- The SuperMicro backplane takes aggregated SFF-8086 inputs rather than individual SATA inputs as on the Norco, and
- The SuperMicro supports redundant power supplies (although I might not make use of this).
I've found that the SC846 is very reasonably priced on eBay, but the stock PSU is apprently very loud. Factoring in replacement PSUs puts the total chassis cost at ~$650 for the SuperMicro, vs. $450 + $150 (=$600) for the Norco 4224 and a decent PSU. That extra $50 for the SuperMicro is well worth it for my purposes.
CPU: The requirements for the CPU are that it should support ECC RAM and have a fast single-core clock speed. I'll be serving up files via SMB/CIFS, faily CPU intensive are single-threaded. I chose an LGA 2011-3 chip because the X99 boards support absurd quantities of DDR4 RAM allowing me to expand in the future if I start to run into issues. The E5 1630v3 has 4 hyper-threaded cores which turbo up to 3.8GHz, meaning SMB/CIFS should be no problem. Note that Skylake-E is due in the second half of 2016 and will likely come with a new socket. It's likely I'll end up on that instead of the Haswell-based 1630v3 (maybe 1630v5?).
Motherboard: As with the CPU, the main requirement for the motherboard is that it should support ECC RAM. I chose SuperMicro because of the build quality; I want to make sure all the components are top-quality server-grade for this build. The X10SRA has 4 PCIe 3.0 x16 slots running at 16/8/8/8, which will more than enough for any expansion cards I might need in the future. I might eventually look into 10GigE, but I'l be sticking with the onboard NIC for now.
RAM: ZFS uses a lot of RAM, and everything I've read strongly urges users to max out the RAM in their systems, thus, I have 64GB in this build. ECC RAM is also recommended because of the active “scrubbing” that ZFS does periodically on its volumes (I'll likely schedule a scrub every 2 weeks). These scrubs check the parity calculations on all volumes for errors. This processes stores the data in RAM while it's doing these checks, so any errors in RAM can result in ZFS writing those errors out to the volume, which will obviously cause seriously problems (including data loss). For this reason, ZFS builders are urged to get ECC RAM (and a CPU/motherboard combo that supports it).
SAS Expander: The IBM M1015 is a cheap and reliable way to get a bunch of extra SATA ports. IBM shipped a ton of these cards with their servers over the years, and as a result, they can be found used for around $100 (very similar to the situation Dell created with their PERC cards). By default, these cards are configured to operate in RAID mode and thus won't present the individual drives to the OS and you won't be able to access the drives' SMART data, among other major drawbacks. You can, however, flash the cards to “IT mode”, which will cause it to behave like a basic SAS/SATA HBA card, where it will present all the individual drives to the OS. This is how ZFS likes its drives and makes error monitoring much easier. The M1015 card has two SFF-8087 ports (also called "mini-SAS" or "IPASS" ports), each of which can directly support up to 4 SAS or SATA drives. Like most SAS expanders, it can be cascaded with other expanders allowing more than 4 drives to be connected to each mini-SAS port. This is most likely how I will have my system configured with a single mini-SAS cable connecting the M1015 to the SuperMicro backplane (discussed below) with its own SAS expander, and then to the 24 individual drives through the backplane circuitry. Essentially, I'll have the signals for all 24 drives traveling over a single cable; the mini-SAS cable has 24Gbps of bandwidth which should be sufficent.
Replacement PSU: Apparently the PSUs that come in the SC846 chassis are absurdly loud. I don't plan to have this in a server room (or at least not initially), so I'll have to get something quieter. SuperMicro makes a quieter option, the PWS-920P-SQ, which can be had on eBay for around $200 each. Paying $400 for PSUs when the chassis itself is ~$250 is frusterating, so I think I'll just get a 1000W ATX PSU and figure out how to strap it into place.
Replacement Backplane: All the SC846's I've seen on eBay come with an older backplane (BPN-SAS-846EL1) that doesn't have SAS2 support. Without SAS2, apparently the maximum capacity of the array would be limited and/or it would only recognize drives up to 2 or 3TB; I'm not totally clear on all this. The solution is a SAS2 backplane (BPN-SAS2-846EL1) which can be had for ~$200. This backplane also comes with a built-in SAS expander chip, so I would only need a single M1015 (rather than the three I had originally planned) and could offset the cost of the replacement board. Another possible option is the BPN-SAS-846A backplane which appears to be a SAS breakout cable baked into a PCB. It doesn't have its own SAS expander chip so it gets around the above SAS/SAS2 limitation. I would however need all three of the M1015's, so I'm not sure if there would be any advantage in this option. This is an area I'll have to research some more.
Data Drives: As I mentioned above, I'm thinking (hoping) 8TB WD Red hard drives will be around $300 each when I actually get around to building this system. 12 8TB drives and 12 4TB drives would result in 104TB usable space. The other possibility is 10TB drives, which will give about 122TB of usable space. I also considered 8TB WD Red Pros, mostly because the extra 2 years on the warrenty, but I'm not totally sure it would be worth the premium. 8TB HGST Deskstar NAS drives would also work, but they don't exist yet...
Boot Device: A simple USB flash drive works great as FreeNAS boot devices. You'll want to get name-brand drives to avoid headaches, but even so, they shouldn't cost very much. USB 3.0 support on FreeNAS 9 requires at least a little tinkering, so I'll probably go with USB 2.0. Note that because I won't have a GPU in this build, I'll get the USB drive all set up on a VM before booting the server off it.
UPS: I'm going to want a 1500VA UPS when I build this system. I originally planned on having my desktop machine on the same UPS, but I realised it's way cheaper to buy multiple UPSs than one giant one. With some driver configuration, I should be able to get the APC UPS control protocol working on FreeNAS, which will allow automated system shutdown and other warnings. These are all over eBay for around $200 with new batteries.
Rack: I'll also be getting a server rack to properly mount this system in, along with my desktop (I'm going to transfer its guts to a rackmount case when I build the file server), the UPS, and my network gear. I'll likely get a couple of simple shelves for the network gear and maybe a small monitor so the server can have a head when I need it to. A 42U rack will be way more than I need, but I'm also planning a water cooling setup for my main desktop with an external water box that could take up 10-14U. I might change my mind on this, but 42U is the current plan.
Software and RAID Configuration
Configuring a RAID setup in ZFS is very similar to configuring it via a hardware controller. I would encourage anyone wanting to build a FreeNAS system to familiarize themselves with the basics of redundancy via RAID before diving in as it will make many of the more complicated concepts much easier to grasp. The most common RAID levels which ZFS presents are striped, mirrored, RAID-Z1, RAID-Z2, and RAID-Z3. These are comparable to RAID 0, RAID 1, RAID 5, RAID 6, and RAID 5 or 6 with 3 parity blocks instead of just 1 or 2. The RAID system is applied to a group of drives called a "virtual device" or "vdev", and vdevs are then grouped (via striping) into a "zpool", which is the logical disk you'll be presented with. Because vdevs are striped across a zpool, it's very important to note that failure of a single vdev will result in failure of the whole zpool. This creates some interesting tradeoffs which I examine more closely on the R2-C2 page.
ZFS likes the quantity of data drives in a vdev to be a power of two. This ensures that, when writing data to a vdev, it can be evenly divided between the 4KB sectors on all of your data drives. If you disregard this when configuring your zpool, you might experience degraded I/O performance and a slight reduction of total capacity of each vdev. Both of these penalties are due to the fact that a portion of the 4KB drive sectors are not fully filled.
I should mention that, in RAID 5/6 and RAID-Z1/2/3, the RAID controller (be it hardware or software) does not actually dedicate some portion of the drives to user data and the remainder to parity data. This sort of configuration was employed in RAID 4, but having all the parity information for the whole array reside on a single disk created contention for access to that disk. The solution was to stager the parity data across all the disks in the array, in a sort of barber pole fashion. For the sake of making visualization and discussion easier, I will still refer to "data drives" and "parity drives", but this distinction is purely conceptual (and a lot quicker than saying "one drive's worth of parity data").
My 24 drive configuration will have 2 vdevs: one with 12*8TB drives in RAID-Z3 and one with 12*4TB drives in RAID-Z3. This results in 3 parity drives and 9 data drives in each vdev. Thus, 9 * (individual drive size) will give the unformatted disk space of each vdev: 72TB for the first vdev (with 8TB drives), and 36TB for the second vdev (with 4TB drives) for a total of 104TB. Actual useable disk space depends on many factors and is tough to predict, but I would guess around 100TB useable.
I will run several servers on the machine, most notably, SSH, nginx, and SMB/CIFS. I had considered NFS over SMB/CIFS, but NFS seems like more of a pain that it's worth. SSH will be LAN only for configuration changes that can't be performed via the web GUI, and nginx will host this website.