In my new (post-Microsoft) career, I am looking forward to building cloud-based data solutions for customers around the world. To that end, I decided to take a look at the ‘state of cloud data pricing’, so that I could supply some kind of pricing estimates to potential customers.
You may be interested in what I found.
NoSQL Monthly Storage costs for 100 GB of data
There are two sides to the story. First, the positive. Unstructured, non-relational or NoSQL storage is remarkably cheap. I took a look across all major vendors, and tried to hold 100 GB / storage and a monthly fee as constant. Also I used 10GB/up and 100GB/down for usage. For the vendors I looked at monthly pricing baselines (some had additional charges for PUT, GET, etc…) was from $9 to $24. Below is a chart summarizing results.
Of course the only thing I am really measuring here is pure storage cost. I did NOT take into account any SLAs around availability (or lack there of), ease of access via tools or APIs or actual performance of any of these services. Also much to the probable consternation of the vendors, I ‘mixed’ both cloud storage intended for personal use (such as DropBox or Amazon CloudStorage) with cloud storage intended for business use (such as Microsoft Azure Tables or Amazon SimpleDB). Also, of note is Database.com (from SalesForce.com). Although I was able to set up a test account, pricing seems to be quotable only in terms of transactions and not in terms of storage space, so I couldn’t think of a way to include their offering in this comparison.
Relational Cloud Storage costs
Of course the full RDBS in the cloud services are a younger market, however, even I was startled at the dramatic difference in pricing. I am fully aware of the advantages of RDBMS implementations (vs. NoSQL), having ‘evangelized’ SQL Azure for the past year when I was working as a Microsoft evangelist. Those advantages include familiar tooling, programming models, transactional support, query tuning support and built-in high availability and are not insignificant. However, the cost for 100 GB / month for cloud-based RDMS systems vs. NoSQL solutions is also significant – that cost is around 50x GREATER than for the same amount of non-relational storage. Below is a comparison of the vendors who had pricing information that I could make sense of.
There were far fewer vendors in this field. Google has a beta offering, hosting mySQL, but hasn’t announced pricing that I could find on their site. You’ll note that I ‘rolled-up’ the Amazon offerings for mySQL or Oracle to around $ 600 /month as a baseline. Actual inputs included many other options. I’ll include screen shots at the end of this blog, so that you can see all of the input parameters I used. I addition to the other offerings, also offers SQL Server on one of their EC2 instances. One particularly difficult aspect of comparison for cloud-hosted RDBMS systems is that Amazon’s rate vary by storage, input/output, other factors AND by compute size (i.e. small, med, large, huge, etc…) whereas Microsoft prices by amount stored and in/out only. Still, Amazon’s offerings appear to be around 1/2 the prices of Microsoft’s SQL Azure. Also of note is that the current size capacity for SQL Azure databases is 50 GB, so the pricing here reflects purchasing two instances for a total of 100 GB of storage between those two instances.
So taking the high-end of both RDBMS (Microsoft SQL Azure) and dividing by the high-end of NoSQL (Amazon S3) that is $1000/$24 or 41X more; on the low-end we have mySQL on Amazon and personal CloudStorage on Amazon or $600/$9 or 67X more, adding 41+67 = 108, averaging around 50X greater.
I will also say that I did these comparisons as fairly as I could, however it was quite difficult to compare vendor-to-vendor as the service offerings differ. To that end, as mentioned, I will include screen shots from the vendor’s own websites at the end of this blog post so that you can see exactly how I did these comparisons.
Conclusions
Clearly moving data to the cloud has many elements of unpredictability. Based on my quick survey, it seems obvious that anyone building a cloud-data solution would want to consider both non-relational storage for the price difference alone.
Of course, when data is involved there are many other factors – these include SLAs, actual uptime, performance, method of querying, ability to performance tune (i.e. index), security, backup/restore, etc…. As I move back into production work, I’ll use this blog to document my journey into the cloud with data – I’ll certainly be investing these other aspects.
I am also wondering what your experience has been? Do you have production data in the cloud? How much has it really cost?