The vast features, capabilities, and limitless scale of modern cloud platforms enable technology professionals to solve business challenges using technology faster than ever. It’s also easier than ever to leave an unwanted trail of artifacts and services in our wake that can wind up costing tens of thousands of dollars or more in unnecessary cloud spend per year.
Let’s face it, cloud optimization isn’t cool, flashy, or fun. It can be tedious, complex, and downright boring. So why should you pay attention to it? Because saving boatloads of cash on currently wasted cloud spend IS cool and fun. It’s also good business.
There are three universal types of cloud optimization that can be easily automated across the three leading cloud providers—AWS, Azure, and GCP—that yield big results:
Resource Scheduling Automation
There's a fundamental difference between the cost model of running resources in the data center versus the utilization of cloud resources. In the data center, once we rack, stack, connect, and power up our gear, the operational runtime costs of running a wide range of virtual machines are the same month to month. There are no real cost increases for each net new workload if additional infrastructure isn't required. These are the well-known benefits of virtualization, thin-provisioning, and even over-provisioning individual virtual machines while maintaining appropriate underlying physical resources to support actual demand of these virtualized workloads.
In the cloud, we’re paying by the hour. This is a simple, critical distinction. Most organizations deploy their workloads to the cloud with no real thought around availability planning or even distinguishing their application stacks by the environment. For example, Non-Production Development, Non-Production Testing, Pre-Production Staging, and Production application environments. Only one of these environments, Production, needs to be running all the time (and even that is debatable as some applications are not utilized 24/7/365) in most organizations.
The simple answer here is to require application owners and product teams to implement power scheduling tags that can then be used to automate resource scheduling, effectively power cycling resources to meet the actual needs of the user base. Most of us turn the lights off when we leave the building. Why? Because electricity is a metered utility service and we pay by the kilowatt hour—so is the cloud. If this is true, why do we leave running resources in the cloud powered up and incurring hourly charges while they’re not in use outside of normal business hours?
Let's run the numbers:
For example, in the AWS Virginia region (us-east-1), a t3.medium EC2 instance running Windows costs $0.06 per hour. Running that EC2 instance 24 hours a day for 1 week would cost you a total of $10.08.
Committing to a reserved instance for 1 full year would cost you $374.00 upfront. Your weekly EC2 costs would be cut to $7.22, for a savings of 29% over on-demand pricing.
However, if your EC2 instance only needs to be running from 7 am to 7 pm, Monday through Friday, then you can save money by using automation to schedule your EC2 instance to run only during these hours. By scheduling the EC2 instance's off-time, the weekly costs are reduced to just $3.60.
That's a cost savings of 64%, with no upfront commitment needed. Therefore, resource scheduling automation is often one of the highest return-on-investment activities cloud teams can focus on.
It is completely realistic to expect a 60-80% cost savings by automating resource scheduling of Non-Production workloads currently running 24/7/365 in the cloud.
Data protection is a top priority for IT teams, or at least it should be. This data is often the most valuable work product being produced by most organizations. Cloud providers offer excellent ways to implement robust data protection and data lifecycle management capabilities, but implementing these management strategies can be somewhat complicated, tedious, and unmanageable. Luckily, these tasks are very easy to automate, and basic account hygiene activities can be tied to these data protection tasks so basic data protection requirements are met, without overpaying for unnecessary or out-of-control backup storage. Organizations can be unpleasantly surprised by unintended consequences of data protection policies pervasively adding cloud services quietly in the background.
Here are a few basic Data Protection automation tasks that can be used to create a comprehensive and effective Data Protection strategy while mitigating risk of out of control spending:
- Automate virtual machine or virtual hard disk drive snapshots. It’s important to understand each cloud provider’s data consistency model if you’re going to use cloud-native snapshots as a primary means of backing up your instances or virtual machines. Many guest operating systems are not aware they’re even running in the cloud, so when these snapshots are occurring at the hypervisor level within the cloud providers’ infrastructure, it’s often not possible to quiesce disk read-write activities, and you aren’t guaranteed consistency within the snapshot image. This can be particularly problematic when working with a boot volume or a volume that contains sensitive relational databases.
An easy technique for guaranteeing the consistency of these volume snapshot images is to simply stop the running cloud virtual machine instance before snapshotting the underlying volumes or creating a virtual machine image. Automation can be leveraged to stop the virtual machine, perform a snapshot, and then restart the virtual machine and any required services running on it.
- Automation can then be used to manage the creation of these snapshot images and the data lifecycle of the virtual machine snapshot data. Starting with scheduling, automation can look for certain tags that can be applied to running resources to indicate what needs to be captured by snapshots and when. In this way, the automation is effectively automated. Product teams or developers apply the required tags to their running resources and the automation processes will discover new resources as they come online and perform the desired backup tasks.
When these snapshots and images are complete, resource tags can be added to them for purposes of categorization, cost allocation, and archiving. Snapshots can be retained on an appropriate schedule and copied to another region for extra protection against disaster. As time goes on, it’s very important to have an effective data retention strategy that includes not just retaining the right snapshots and images, but also deleting the unwanted or unnecessary versions. This is a key area of cost optimization to combat hidden costs that can continue to creep up over time in many organizations’ cloud service accounts.
With the multitude of services available from the leading cloud service providers it’s difficult to keep up with what’s even possible on the cloud, and even more difficult to ensure that each service is carefully provisioned, managed, and cared for regularly.
Task automation can significantly lower operational requirements to ensure cloud account hygiene operations are performed on a regular basis.
Tasks such as deleting unused or unregistered services that have a cost associated with them, however small at first, can creep up slowly until they represent a significant line item on monthly bills. A good example of this is unused or unattached AWS Elastic Block Storage (EBS) volumes. EBS volumes are the disk drives of Elastic Compute Cloud (EC2) virtual servers. When an EC2 instance (or virtual machine) is deleted, the supporting EBS volumes, or disk drives, attached to the instance are not automatically deleted by default. This is a safety mechanism put in place by AWS to prevent inadvertent data loss. Cloud systems administrators must delete the instance, then in a separate step delete the underlying volume (or at least confirm that they want the volumes deleted at the same time as instance deletion). In many cases, this results in many of these EBS volumes being effectively orphaned—unused but still incurring cloud consumption costs month after month.
Another great example of task automation and account hygiene is the application of AWS Simple Storage Service (S3) bucket policies, such as configuring versioning correctly each time a new S3 bucket is created or erasing deleted S3 objects (which are technically still there, costing money) regularly.
Achieving these quick-wins and other significant cost-saving measures isn’t all that difficult. So why isn’t this happening already? These steps may also seem obvious to cloud systems administrators and assumptions are made that these processes will be put in place by everyone in the organization. But in fact, these steps are tedious and boring so they don’t get a lot of attention. They’re not creating something shiny and new. They’re not helping to bring a new product or service to market faster. They’re not talked about in development meetings. They’re simply often overlooked.
Over time, the lost opportunity cost in real dollars spent on the wrong or improperly configured services accumulates and takes away the budget for meaningful initiatives. It also erodes confidence in IT among business leaders.
Nothing halts a cloud initiative faster than a shocked CFO facing out-of-control cloud spending and an IT leadership team that doesn’t know why.
The key to success in controlling cloud spend is simple: automate these simple cost controlling measures as early in your cloud adoption journey as possible. Demand proper tagging due diligence from cloud teams and implement resource scheduling, backups, and account hygiene task automation based on these resource tags. Leverage automation, not people, to perform these tedious tasks and allow people to focus on the fun stuff.
For help with your Cloud Migration, Optimization, or Cost Control initiatives, contact your GreenPages Account Manager or firstname.lastname@example.org.