Cloudy Journey: CumuLogic DBaaS Platform Under the Hood

----
CumuLogic DBaaS Platform Under the Hood – Part 1: Overview
// Build AWS-compatible Private Clouds with CumuLogic

When speaking with customers and prospects, we tend to talk about how simple to install and easy to use the CumuLogic DBaaS software is (which it is), but that simplicity hides a robust framework designed from the ground up to deliver "as-a-Service" functionality. On occasion, we have even been asked questions like "Why can't I just do this with some scripts?" It's a fair question to ask, but it highlights what can be a very real point of confusion around the difference between automated provisioning and an "as-a-Service" platform.

Certainly, provisioning of a database node or cluster can be done with a script or two, but what about ongoing operations and management of the deployed environment? "As-a-Service" is much more than configuration and scripting. It's about scaling operations of the "X" in "XaaS" through full automation, self service and hands-off operations for the platform management team. It is with this understanding that CumuLogic built our DBaaS platform.

To help shed some light on these topics, we will be sharing a series of blog posts that discuss the architecture and internals of the CumuLogic DBaaS Platform. As this is the first post of our series, we'll be reviewing the high-level architecture of the software and a list of functions that the architecture is performing.

Staying Simple (for now)

CumuLogic Architecture – The Simple View

Before we get into any details, let's set the context for the deployment architecture of CumuLogic's software. At a high level, the platform is made up of a controller, one or more target "clouds" (or pools of infrastructure on which databases will be deployed), and the deployed nodes that are running the provisioned databases.

The controller itself provides the user interface, RESTful API endpoint, core orchestration services, infrastructure abstraction layer, tenancy model (including RBAC), configuration metadata storage, task management, etc. We'll be digging into each of these elements later, but what's important to understand is the simple "exposed surface" of the controller hides a powerful framework for service orchestration and management.

The target cloud layer is one or more target IaaS clouds (or pool of infrastructure) which the controller uses to obtain the required virtual infrastructure for each user request for a new database instance. Since CumuLogic's platform is designed specifically for multiple private, hybrid or public cloud deployment scenarios, we have created integrations with leading private cloud IaaS software, hundreds of public clouds (that expose the APIs our controller can consume), and pools of "pre-provisioned" systems (bare metal servers, virtual machines, or both).

The database instances are, typically, virtual machines that have been provided by a target cloud. These are the unit of isolation that CumuLogic's platform uses for each node of a deployed database service. If a user is requesting a single-node MySQL database, then a single VM (sized as requested by the user) is provisioned. Once that VM is online, the controller then injects both a CumuLogic Agent, the appropriate application payload (in our example, MySQL itself), and the required configuration files. The CumuLogic Agent is the software on each node that allows the controller to maintain command & control over the system, as well as the point of contact for node telemetry readings.

If we extend our example of a single-node MySQL database to a more complex topology (e.g. MongoDB cluster for sharding), the same process described above occurs for each node of the topology being deployed. After the nodes are online, work isn't done. Each node in the deployment may require one or more additional steps to complete the overall environment's configuration. It might be to establish replication jobs between the nodes, or it might be to join each node into a cluster / replica set.

Overview of the Internals

Below is a diagram of the internal components of the software, including components from the controller, target cloud and deployed nodes:

CumuLogic Architecture – Internals

As you can see, there's quite a few moving parts to the overall picture, all working in concert with each other to provide the DBaaS experience to users. Let's take each component and briefly describe it:

Load Balancer – Up-front load balancer services that will balance UI and API traffic between multiple controller nodes.

API Server – HTTP(S)-based API services, including both the native CumuLogic API and our AWS RDS API support for relational databases.

Console – The web-based user interface for both platform administrators and users of the DBaaS platform.

Control Plane – The core controller / orchestration engine for the system.

Governance – RBAC governance functions, including identity management, role management, target cloud credential storage, LDAP/AD integration.

Repository – Source location for all "service payloads". This is where any database engine versions, CumuLogic agent binaries, and default configuration files are pulled from when provisioning a new database instance.

Database Cluster – A MySQL compatible database that the CumuLogic controller relies on to store its metadata about deployed or configured services. Also stores monitoring data from the deployed nodes.

Object Storage – An optional object storage system (S3 or Swift API compatible) that can be the target for storing backups.

Block Storage – Block storage (typically a remote NFS share mounted on each controller) to be used for either backup archive staging (prior to a push to the object storage system or actual retention of the backup archives.

Tack Queue – A queue used by the controller to pass instructions off to the distributed agents deployed on each node.

Job Scheduler – As you'd expect, this is where a record of all scheduled jobs within the platform is stored and appropriately initiated.

Health Monitor – The controller's process that periodically checks on the status and availability of each deployed node across all database service instances. Health events that require a recovery action are triggered by this monitoring process when appropriate (ex: if a node in a cluster goes offline, a new node is brought into the cluster).

Usage Tracker – The process responsible for ongoing collection of performance and capacity metrics from each deployed node.

Private/Public Cloud Instances – Represented as gray rectangles, the underlying VMs for the deployed nodes (typically backed by volumes provided via the IaaS platform's block storage system). These VMs are all sourced from the same master template (typically a CentOS-based image). Once deployed, they will host the required application payload (database engine), the CumuLogic Agent and Sensors, as well as potentially third party agents.

3rd Party APM Services – Optionally, the CumuLogic platform can be configured to coordinate with third-party monitoring and management platforms. Out of the box, CumuLogic supports New Relic, AppDynamics and MMS.

An Example: Backup Services

So far, our examples have focused on provisioning functions. However, to understand how the pieces of the system work together to deliver these management functions, let's review how the architectural components support the platform's automated database backup feature.

The CumuLogic DBaaS platform allows users to opt-in to an automated backup routine for their deployed database instances. Users get to select the time of day, as well as specifying the number of days for the controller to retain the previous backups. Just as importantly, this backup job is topologically aware, meaning that it is going to understand the runtime topology of a given database service and only backup what's required to recreate the database environment. All nodes in a cluster might not require their own backup, if they are all storing the same data.

Yes, users can also initiate backups on-demand, regardless of opting into the scheduled job. The process for that is similar to what we will describe, with the main difference being the initiator of the activity (explicit API call versus scheduled job).

When a database instance (single or multi-node) is configured for automated backups, the controller places a daily task for that database instance into the job scheduling system. When the time comes for the job to be executed, the job scheduler inserts a "backup" task into the controller's task queue. A single-node database instance requires three steps to achieve the required result: execute a snapshot, store the snapshot, expunge expired snapshots. The same steps occur for a multi-node database topology, but the process takes into account (1) the topology itself when selecting which nodes will be included and (2) potentially the coordination of snapshots across multiple nodes of a topology type that requires it (e.g. MongoDB shard clusters need to have snapshots created for the query routers, config servers and a node in each shard).

The execution of the snapshot is a task for the CumuLogic Agent residing on the deployed nodes. The system will place a specific task for the expected node(s) on the task queue, instructing that node (those nodes) to perform a local snapshot. The agent will understand how to coordinate with the specific application payload (database engine) that it's responsible for running, ensure that the snapshot is constant by performing any required locking, execute a snapshot of the data volume of the node using LVM, and finally free the database to continue processing transactions. Once the snapshot has been created on that node, it will be compressed and pushed back to the controller.

Next up, the controller's job is to accept the snapshot archive file(s) and store it in the appropriate storage target. CumuLogic's platform supports storing snapshots within either a specific filesystem mount point (typically a NFS share mounted into the controller nodes) or within an object storage environment. The object storage can be any service or software that supports the AWS S3 APIs or the OpenStack Swift APIs.

Once the backup has been safely stored by the controller, the last step is to check for any past backup copies that are older than the specified retention period for that database instance.

As you can see, automated backups are a multi-step process that rely on a number of architectural elements to ensure that they are done efficiently and effectively for a wide array of potential database engine types, database deployment topologies and infrastructure environments.

Summary

True "Database-as-a-Service" requires much more than provisioning and coordination of nodes. It requires ongoing management of the deployed nodes, through monitoring, self healing, backup services, software version updates (coming soon to CumuLogic) and support for many self-service operations to modify the deployed environment. It also requires a strong foundation of core orchestration services, multi-tenancy, infrastructure abstraction, job scheduling, and event-based automation triggers. These are the features that allow a platform to truly serve the users with database services, not just with database deployment.

We hope this post is a useful starting point in building a shared understanding of the technical details of the CumuLogic DBaaS platform. In future posts for this series, we'll dig deeper into specific functions provided by the platform.

If you are interested in learning more about the CumuLogic DBaaS platform, please get in touch or download the software. Thanks for reading!

----

Shared via my feedly reader

Sent from my iPhone

Cloudy Journey

Pages

Tuesday, August 12, 2014

CumuLogic DBaaS Platform Under the Hood – Part 1: Overview [feedly]

Staying Simple (for now)

Overview of the Internals

An Example: Backup Services

Summary

No comments:

Post a Comment

Total Pageviews