With nothing less than five (5) different database options to choose from, coupled with the importance of choosing the right singular or paired option, choosing the right database in Google Cloud Platform (GCP) is one of the recurrent task that cloud computing professionals have to undertake. One will have to consider a lot of things, among which are; type of data, size of data, latency, throughput, IOPs, scalability, etc. If you have been an ardent reader of my articles, you might be wondering, “Why on earth is this guy suddenly an expert in cloud computing services?’ Well, for a start, I am not an expert. I started my journey into cloud computing last month, thanks to Andela Learning Community in partnership with Google and Pluralsight.
Now, why would someone who is barely a month old in cloud computing be writing on choosing the right databases for a project? Well, let’s say Pluralsight has dealt with me with a whole lot of contents. And if you are new like I am during the time of writing this, chances are you will learn more as I would be using languages that new people — like me — understands. Also, I find it easy to learn while I write. So yes, I am writing on choosing the right GCP databases because I want to learn.
Talking about languages that can be understood, I believe I need to define some terms.
1. Latency, Throughput, IOPs: In simple terms, latency is the amount of time it takes for exchange of data between networks. Throughput is the amount of data that is transferred successfully over a given period of time — it used to be bites per second(bps), then it became Kbps, then Mbps, with 5G on the way, we will be looking at Gbps. Anyways, IOPs (pronounced high ops) is a unit of measurement for the maximum number of reads and writes (Input,Output) per second in computer storage devices like HDD, SSD and SAN.
Like I promised, I will write in languages that new people like me understands. Let’s do away with the abbreviations we have up there.
HDD — Hard Disk Drive, SSD — Solid-States Drive, SAN — Storage Area Networks. And before I forget, IOPS — Input/Output Operations Per Second
2. Database: A database is an intentional organization of data, such that these data can be accessed, managed and updated easily. There are different types of databases, but when streamlined to the database management offered by GCP, I like to group them as either structured or non-structured, or relational and non-relational. Ok, maybe I need to write on databases after this article.
So, away from definitions, what are the options that GCP offers as relating to databases? As at the time of writing this, GCP offers the following database management tools; Cloud SQL, Cloud Spanner, Cloud Bigtable, Cloud Firestore (Formrly Cloud Datastore), Cloud MemoryStore, Firebase Realtime and Database. However, Cloud Memorystore, Firebase Realtime and Database is beyond my learning path at Pluralsight. So, if you happen to have read to this path, hoping to find useful information on the two databases listed 28 words ago, my apologies. So, let’s start…
- CLOUD SQL
Cloud SQL (pronounced cloud ess-queue-el or cloud sequel) is a non-serverless, fully managed database service that makes the administration, management and the maintenance of relational database easy for users. It is integrated with three database languages; PostgreSQL, MySQL, and SQL, which makes it somewhat flexible for people like me who recently got acquainted with SQL. With Cloud SQL hosted on GCP, you do not need to worry yourself about infrastructure; all you need to do is focus on your application.
With a storage capacity of 10TB, 416GB of RAM for each instance and IOPS of 40,000, Cloud SQL is your number choice if you are looking at storing databases for E-commerce applications, Customer Relationship Management (CRM) tools, WordPress sites, and basically any other application that integrates either MySQL, PostgreSQL or SQL server.
Cloud SQL is highly scalable (vertically) with 99.95% guaranteed availability. 99.95% might be inconsequential in mathematical terms but in cloud computing, that is equivalent of 263minutes a year — 4hours, 38minutes per year.
So, when should you consider using Cloud SQL?
If your project requires a full relational database capability, with the needed storage capacity not exceeding 10TB, and proposed concurrent connections to this databases will not exceed 4000, plus your organization is cool with on-premise management, then cloud SQL is for you. To me, I believe one of the most important thing to consider here is the maximum storage capacity and concurrent connections. This is because Cloud SQL is vertically scalable. If you believe otherwise, I am willing to learn from you.
2. Cloud Spanner
Talking about scalability, Cloud Spanner is a very interesting database system. According to its documentation, it ‘…is the first scalable, enterprise-grade, globally-distributed and strongly consistent database.’ It is somewhat a combination of the characteristic of vertical scalability that is associated with relational database systems, and horizontal scalability which is associated with non-relational database systems. The concept of horizontal and vertical scalability is quite basic. When an instance is said to be vertically scalable, it means that resources can be added to increase or decrease its capacity. Horizontal scalability on the other had means other instances can be created if there be need for them.
Having established this, it will therefore not be surprising to say that Cloud Spanner has a capacity running into Petabytes of storage. In terms of reliability and availability, Cloud Spanner has its data stored in n-Zones, irrespective of the regions. It is the database that runs most of the services that is offered by Google. Talk about Search, Gmail, Youtube, and lots more.
When should you choose Cloud Spanner?
If your project is expected to use large amount of data that will eventually be more than the 10TB provided by Cloud SQL with transactional consistency, and you want to break these databases into shards in order to achieve good throughput with good global accessibility, then Cloud Spanner is for you. More like, if you are considering a project that will be storing billions of data per day, this database system is for you.
3. Cloud Firestore/Datastore
So far, we have considered two databases for structured data which are mostly based on server network. Cloud Firestore is a non-structured (NoSQL) document database that runs on a serverless platform. It is a database system that makes storing, syncing and querying of data easy for either mobile applications, web applications or IoT applications, and it does this on a global scale. It was formerly called — and still called — Cloud Datastore but with its recent integration with the security features of Firebase, it became known as Cloud Firestore — I still think Datastore is a cool name though.
Cloud Firestore is a database that is specifically designed to ease the processes of app development with the availability of live synchronization and offline support. You do not need to set up a server to access your data. It replicates data in multi-regions with strong consistency and availability of 99.95%. Its capacity is in Terabytes and it is mostly used for storing NoSQL data for mobile and web applications, collaborative multi-user applications, retail product catalogs, gaming leaderboards, social user profiles and many more.
4. Cloud Bigtable
Cloud Bigtable belongs to the NoSQL class, it is what you consider when your project involves single keyed large-scale data and you want low-latency as well as a good data processing throughput. It is built for big data and its capacity is in Petabytes. Cloud Bigtable’s architecture is somewhat complex as the processing which is done through the front-end server is separated from the storage. Because of how large the data could be, tables in Bigtable are sharded into tablets with the aim of balancing the workload on the entire database. If I am to note few differences between Firestore and Bigtable, apart from the storage capacity, I will say that Firestore is good at scaling down, and Bigtable is good at scaling up. I also want to say that Bigtable is compatible with HBase API but really, let’s not go into defining what HBase is. But generally, Cloud Big Table is mostly used by Google in its analytics, search, maps and Google earth services.
Do I need to tell you when to choose Bigtable again? No? Thanks. So to summarize;
Cloud SQL — Structured data, less than 10TB, less than 4000 concurrent connections, vertical scalability.
Cloud Spanner — Structured data, greater than 10TB, global horizontal scalability.
Cloud Firestore — Severless, NoSQL, less than 1TB, Real time analysis, zero down scaling.
Cloud Bigtable — NoSQL, Big data analytics, Petabytes upon petabytes, HBase integration, up scaling.
I hope you learned like I did. In case you want more details, you can always visit the documentation page for GCP Datastore.