Sizing Large Ceph Clusters

Okay, so the reason I come to you today is we're talking about sizing big Ceph clusters, but it is a conversation I end up having a lot with our sales team and prospective customers. You have a data requirement that's huge, multi multi petabyte, maybe even the 10’s of 20’s, and of course you have that much data now, but your business plan is going to see you expand into that pretty quickly, right? So you come to us and do you want to buy 20 petabytes worth of data to start? No. How much can we really get started with, and is that going to set me off on the wrong foot down the road?

Alright, so to avoid starting off on the wrong foot of building a huge cluster, the main thing that I want to consider, and of course there's all kinds of things to consider, but you got it, how do you eat an elephant? one bite at a time, right? So the first big thing you want to look at is storage efficiency. What is storage efficiency? Storage efficiency is the ratio of usable storage divided by the raw storage.

So what that is-it just factors in, well how much of your data you're losing to redundancy to keep everything safe. This is a concept that's true when you're talking Ceph clusters all the way down to RAID volumes, whatever, storage efficiency is always a concept you need to consider. But why it's really really big now, is the difference between 80% efficient, 70% efficient at a cluster of this scale could mean the difference between like a six-figure check.

Ok so storage efficiency is so important; what determines my storage efficiency? Well, it's a big question, but we are talking really really big clusters, and we're talking Ceph, so really the only answer in this case is erasure coding. For those who don't know, erasure coding is very analogous to RAID, except it's done at the file level, file comes in, it's cut into “K” data chunks and then an “M” number of parity chunks are also generated and they’re disbursed across unique hosts in your cluster.

Okay, so with all the definitions out of the way, let's actually walk through an example so I can show you what I'm talking about. An erasable configuration is defined by its profile, it's K plus M, K is data chunks, and M is the number of parity chunks. Two very common ones that are used, a 4 plus 2, and an 8 plus 2. Where there's 4 data chunks, 2 parities, 8 data chunks and 2 parities. So those there's more, don't get me wrong, but these are two very common ones we'll see all the time, and it helps illustrate my example here. 4 plus 2 will give you a 66% efficiency, an 8 plus 2 will give you 80%, and the equation for storage efficiency when using erasure code is the K value divided by K plus M. So in the 8 plus 2 a case, it would be 8 divided by 10, or 4 divided by 6, so 66% or 80% efficient.

Alright so an 8 plus 2 will give you 80% storage efficiency, 4 plus 2 will give you a 66% storage efficiency, so we can stop the video because we're done, it's 80% every time. But, unfortunately, it's not that easy. There is a little bit of a catch you have to remember here, I mentioned earlier erasure coding, when it disperses these chunks, each to a unique host, and it does that to ensure safety. If you lose a host, you don't want to lose multiple chunks because it defeats the whole purpose. Okay, but you might be seeing where the catch is here, if I have to send 10 chunks to 10 unique hosts, I need to start with 10 servers, contrasted to the 4 plus 2 is only 6 servers.

Ok, so to kind of tie all that up with an analogy for all, sizing your really really large cluster here is kind of like getting a really big loan, your storage efficiency and your interest rates are kind of like they're similar concepts there, where with a big loan if you don't put a down payment or you've got really bad credit, they'll give you the loan, your interest rate will just be through the roof and you will pay so much more extra than what you actually were given, and then comparing that to a really big cluster, if you kind of go bare minimum to start and you start off on the wrong foot with a poor storage efficiency, you in the long run of your cluster, while you might have saved money up front, will have paid for it almost twice by having to put that much more raw storage in it just to get your usable amount.

Alright, so fun fact for today's video is we are not loan sharks here at 45 Drives, we want to see you get the best bang for your buck out of your cluster, we want to fit you with the right storage efficiency that fits your huge clusters but gets you off the ground running without too much of a hassle. So, if you've got a big project that you've got to install some storage, give us a call, we'd love to help you.

Alright, so that was today's tech tip, sizing really very large Ceph clusters and really my point is we don't want you to overpay for storage, we want you to start off on the right foot for the budget you have and for the humongous amount of data that you have stored over the next few years. So again, hope you enjoyed this one, leave questions, comments, emails, however you want to reach out to us, we can't wait to hear from you.