Over the coming month I will be architecting, building and testing a modular, high performance SSD-only storage solution.
Ill be documenting my progress / findings along the way and open sourcing all the information as a public guide.
With recent price drops and durability improvements in solid state storage now is better time than any to ditch those old magnets.
Modular server manufacturers such as SuperMicro have spent large on R&D thanks to the ever growing requirements from cloud vendors that utilise their hardware.
Companies often settle for off-the-shelf large name storage products from companies based on several, often misguided assumptions:
At the end of the day we dont trust vendors to design our servers - why would we trust them to design our storage?
A great quote on Wikipedia under enterprise storage:
You might think that the hardware inside a SAN is vastly superior to what can be found in your average server, but that is not the case. EMC (the market leader) and others have disclosed more than once that the goal has always to been to use as much standard, commercial, off-the-shelf hardware as we can. So your SAN array is probably nothing more than a typical Xeon server built by Quanta with a shiny bezel. A decent professional 1 TB drive costs a few hundred dollars. Place that same drive inside a SAN appliance and suddenly the price per terabyte is multiplied by at least three, sometimes even 10! When it comes to pricing and vendor lock-in you can say that storage systems are still stuck in the mainframe era despite the use of cheap off-the-shelf hardware.
Its the same old story, if youve got lots of money and you dont care about how you spend it or translating those savings onto your customers - sure buy the ticket, take the ride - get a unit that comes with a flash logo, a 500 page brochure, licensing requirements and a greasy sales pitch.
Storage performance always seems to be our bottleneck at Infoxchange, we run several high-performance high-concurrency applications with large databases and complex reporting.
Were grown (very) fast and with that spending too much on off-the-shelf storage solutions, we have a requirement to self-host most of our products securely within our own control, on our hardware and need to be flexible to meet current and emerging security requirements.
I have been working on various proof-of-concepts which have lead to our decision to proceed with our own modular storage system tailored to our requirements.
Were going to start with a two node cluster, we want to keep rack usage to a minimum so Im going to go with a high density 1RU build.
The servers themselves dont need to be particularly powerful which will help us keep the costs down. Easily the most expensive components are the 1.2TB PCIe SSDs - but the performance and durability of these units cant be overlooked, were going to have a second performance tier constructed of high end SATA SSDs in RAID10. Of course if you wanted to reduce price further the PCIe SSDs could be left out or purchased at a later date.
NVMe is a relatively new technology which Im very interested in making use of for these storage units.
NVM Express has been designed from the ground up, capitalizing on the low latency and parallelism of PCI Express SSDs, and mirroring the parallelism of contemporary CPUs, platforms and applications. By allowing parallelism levels offered by SSDs to be fully utilized by hosts hardware and software, NVM Express brings various performance improvements.- AHCI NVMe Maximum queue depth 1 command queue; 32 commands per queue 65536 queues; 65536 commands per queue Uncacheable register accesses (2000 cycles each) 6 per non-queued command; 9 per queued command 2 per command MSI-X and interrupt steering single interrupt; no steering 2048 MSI-X interrupts Parallelism and multiple threads requires synchronization lock to issue a command no locking Efficiency for 4 KB commands command parameters require two serialized host DRAM fetches gets command parameters in one 64 Bytes fetch
Intel published an NVM Express driver for Linux, It was merged into the Linux Kernel mainline on 19 March 2012, with the release of version 3.3 of the Linux kernel.
A scalable block layer for high-performance SSD storage, developed primarily by_ _Fusion-io_ _engineers, was merged into the Linux kernel mainline in kernel version 3.13, released on 19 January 2014. This leverages the performance offered by SSDs and NVM Express, by allowing much higher I/O submission rates. With this new design of the Linux kernel block layer, internal queues are split into two levels (per-CPU and hardware-submission queues), thus removing bottlenecks and allowing much higher levels of I/O parallelisation.
Note the following: As of version 3.18 of the Linux kernel, released on 7 December 2014, [VirtIO]6 block driver and the [SCSI]7__layer (which is used by Serial ATA drivers) have been modified to actually use this new interface; other drivers will be ported in the following releases.
Debian - our operating system of choice currently has kernel 3.16 available (using the official backports mirrors), however we do generate CI builds of the latest stable kernel for specific platforms - if youre interested on how were doing that I have some information here.
Thats where Im upto for now, the hardware will hopefully arrive in two weeks and Ill begin the setup and testing.
[6/2/2015 - Sam McLeod]