Which is the best option if EC2 instances utilization is guaranteed to be consistent for a long period of time to reduce cost?
One of the most common problems AWS customers face is the dreaded “Bad AWS Billing Surprise”. It’s the beginning of the month, you’ve just received your latest AWS bill and found out that you owe AWS a completely unexpected amount of money (typically a large sum, hundreds or thousands of dollars above what you originally expected, depending on your budget). There are also situations where you regularly spend far more than you should over an extended period of time, which can be equally as expensive. Show
If you’re in either of these situations, a critical question is:
There’s no easy answer to this problem. That’s why in this article I’ll walk you through a proven process I’ve put together in order to reduce AWS cost… Some important principles of AWS cost reductionBefore you start terminating EC2 instances, spending money on Reserved purchases or making any changes to your AWS infrastructure, keep in mind these important principles:
With these principles in mind, we can continue with the actual steps… Step 1 - Get All Relevant DataThe first thing you need to do is get all cost data and relevant system metrics in front of you, so you can find cost-related inefficiencies. Below are some important tools that can help you find relevant data:
I also recommend installing MiserBot, a Slack and email bot I developed, which gives you daily updates on your AWS cost and a detailed view of where your money is going. Now that you know where to find important cost data, it’s time to look for specific information… Where’s the money going?The first step is to understand the areas that result in the most AWS cost, and that you also identify their percentage relative to your total AWS bill. There are three important dimensions to analyze: A. Top AWS Usage Types by Cost The first meaningful step is to understand the AWS usage types that incur in the most money spent. Some common examples:
Using the tools described in this article, you can run the following Athena query: Once you have the top usage types, you’ll have a good starting point to uncover areas that are impacting your AWS bill. For example:
B. Top AWS Resources by Cost This is a list of the top AWS resources, such as specific EC2 instances, RDS DB instances, S3 buckets, CloudFront distributions, etc. This will give you a detailed view of the most expensive AWS components in your application. Once you have this list, focus on the top 5-10 and also on those that are a considerable expense for your budget. Using Athena, you can run the following query: SELECT lineitem_productCode, lineitem_resourceId, sum(cast(lineitem_unblendedcost AS double)) AS sum_unblendedcost FROM billing.hourly WHERE period='Keep in mind, this is not information you can find in AWS Cost Explorer, that’s why I recommend using Athena to analyze Cost and Usage reports. C. Find usage types for the top AWS resources Once you find the top AWS resources by cost, it’s time to drill down on each component and see what’s costing your applications the most money. For example:
This is my recommended Athena query to find the usage types for a particular AWS resource: SELECT DISTINCT lineitem_usagetype, sum(cast(lineitem_usageamount AS double)) AS sum_usageamount, sum(cast(lineitem_unblendedcost AS double)) AS sum_unblendedcost FROM billing.hourly WHERE lineitem_resourceId = 'Once you know exactly which resources consume the most amount of money and their usage types, then you are in a position to take some corrective actions. Similarly to finding the top AWS resources by cost, this information is not available in AWS Cost Explorer, only in Cost and Usage Reports. Step 2 - Identify Anomalies and Corrective ActionsOnce you have relevant usage data from the first step, here are two questions that need to be asked per each top cost item:
The important part in this step is to identify potential corrective actions, which will then be evaluated, prioritized and potentially executed in the following steps. Below are some common costly items: EC2 ComputeIn my experience, EC2 is almost always one of the top areas in AWS bills. If EC2 BoxUsage (On Demand EC2) is among your top AWS cost items, I recommend looking into the following areas:
A quick way to do this is to go to the CloudWatch Metrics console and select CPUUtilization for all instances in your account. You’ll see a graph including all EC2 instances in your account and how high (or low) CPU they consume. Any EC2 instances with <5% CPU Utilization over a long period of time (i.e. 2 weeks, 1 month) are clearly not used efficiently, and should be candidates for down-sizing or termination. If EC2 HeavyUsage (Reserved EC2) is a top usage type, then look into Reserved Instance Utilization Reports in the AWS Billing console. If reports show that Reserved discounts are not applied effectively in your AWS account, then one option might be to convert applicable On Demand instances to the EC2 instance type covered by already purchased Reserved instances. If this isn’t possible, a last resort option might be to sell unused reservations in the Amazon EC2 Reserved Instance Marketplace. You might not get the same amount you originally paid, but if you’re sure you don’t need those reserved EC2 instances, at least you’ll get a portion of the original amount and reduce your AWS bill. RDS ComputeRelational databases are still very popular in many applications; therefore a service like RDS is commonly seen among the top AWS usage types. Even though users pay for data storage and I/O in RDS, compute is in most cases the top cost item. InstanceUsage is the usage type for On Demand RDS instances, while HeavyUsage is for Reserved RDS instances. If RDS is a top usage item, then you can apply similar strategies as in EC2: find idle and over-provisioned RDS DB instances and find under-utilized reservations. From the CloudWatch Metrics console, view the DatabaseConnections and CPUUtilization metrics for all databases in your account over a 2-4 week period. From my experience, it’s not uncommon to see databases with 0 connections, which are obvious candidates for termination. Regarding under-utilized reserved RDS instances, one key difference compared to EC2 is that you can’t sell RDS reservations, as there’s no marketplace to do so. You can find more information on RDS Reserved in this article. Data TransferThere are two types of Data Transfer that can cost you a lot of money: 1) DataTransfer-Regional and 2) DataTransfer-Out. Regional Data Transfer happens within the same AWS region, but across Availability Zones. It’s a common usage type in applications that replicate or access large amounts of data stored across multiple Availability Zones. One way to reduce it is to deploy EC2 instances in a single AZ, which has the downside of reducing resiliency in case of AZ failure. In this case it’s very important to evaluate your application’s availability requirements and whether you’re spending a considerable amount of money in this usage type. If Regional Data Transfer is a top usage type, then it comes down to determining how much money you’re willing to spend for multi-AZ redundancy. Data Transfer Out is data transferred out to the internet and it’s typically incurred by services such as CloudFront, S3 and EC2. If you have an application that serves heavy media content or that has high traffic, then you’re likely to spend some money in this usage type. One way to reduce it is to enable compression in CloudFront as well as web servers. I’ve seen cases where unnecessary JSON data can be eliminated from responses and reduce this cost significantly. As well, simply optimizing image or video size can have a big impact in this area. S3 StorageS3 TimedStorage tells you how much money you’re spending on S3 storage. S3 offers different storage classes, such as Standard, Standard Infrequent Access, One Zone Infrequent Access, Intelligent Tiering, Glacier and Glacier Deep Archive. This usage type is measured in GB/month and choosing the right storage class depends on access patterns for each object as well as availability requirements. Standard is the most expensive storage class and it’s common to find opportunities to reduce its cost by choosing the right storage class or enabling features such as Object Lifecycle Management OthersGiven the large number of AWS services, there are a lot of usage patterns that can result in unnecessarily high cost, depending on the nature of your applications. In this article I’ve mentioned only some of the most common ones, but every application is different. That’s why one of the first steps in identifying cost reduction opportunities is identifying the top usage types as well as those AWS resources that consume them the most. Once you have that data, the next step is to see what can be changed either in your AWS architecture, AWS configuration or application code, in order to reduce relevant usage types. Step 3 - Confirm the right balance of Performance, Availability and Operational Excellence.Remember, cost is only one variable. At the end of the day you have to run reliable applications that will deliver a great customer experience and help your business grow. This means applications have to meet a number of requirements for Performance, Availability, Cost, Security and Operations. That’s why cost optimizations shouldn’t impact other areas of your AWS architecture to the point that your applications will no longer meet your business requirements. Also, Security shouldn’t be on the table - only in very exceptional situations where an optional or unnecessary security feature makes up a high percentage of your AWS bill. Here are some important steps to take, based on the top AWS usage types and resources by cost that you’ve already identified: a) Identify affected applications, components and deployment stages by costLet’s say you’re paying considerable money for some expensive EC2 instances. The first step would be to identify the application(s) and the underlying components that are using those instances (i.e. web server, batch jobs, database, etc.). It’s also important to identify whether they belong to production or pre-production environments (i.e. development, QA, staging, etc.) b) Identify main business transactions supported by identified applications and componentsOnce you identify the most expensive applications and components, it’s time to see how critical they are to your business and if the allocated AWS cost makes sense. The first step is to identify the affected business transactions. For example, once you’ve identified an expensive component, it’s time to see what user interactions it supports. For example: user login, users browsing through your content, logged in users managing their profiles, report generation, etc. c) Identify Performance requirements for identified transactions and stages (dev, QA, production, etc.)Once you identify the main affected business transactions by AWS cost, it’s time to think about Performance requirements. These requirements can be summarized in two areas:
For example, generating an overnight report might not be as critical in terms of performance compared to users arriving on your landing page. User login might be a very high volume and critical part of your application, or it might be something that doesn’t happen often. Answers to these questions will drive decisions such as EC2/RDS instance types and size (i.e. required vCPUs, RAM), disk allocation (i.e. IOPS), number of EC2 instances, etc. d) Identify Availability requirements for identified transactions and stages
The answer to those questions will drive decisions for compute redundancy (i.e. multi-AZ RDS and EC2 deployments, number of EC2 instances), data backups (i.e. EBS/RDS snapshot creation and retention), data redundancy (i.e. RDS multi-AZ or global deployments), and automated recovery processes, among others. These are all areas that drive AWS cost. Step 4 - Apply Corrective ActionsNow that you know where you’re spending money and what your applications need, it’s time to make some cost-cutting decisions. Here are some common areas to focus on: Right-size and Optimize InfrastructureIf CloudWatch metrics show a degree of under-utilization for some of the top usage types by cost, then right-sizing should be an option to consider. Let’s say you have a c5.2xlarge EC2 instance with constant 10% CPU Utilization - it might be good to consider provisioning a smaller EC2 instance. The recommended steps would be the following:
Optimize ApplicationsSometimes AWS cost issues are caused by applications that consume infrastructure resources in a sub-optimal way. Some common examples are heavy SQL statements or code components that result in a heavy CPU, memory or disk utilization. Here are some recommendations that you can apply to high cost AWS resources:
Delete, Terminate or StopA quick way to reduce cost is to identify idle resources, such as EC2/RDS instances, and terminate or stop them. As described earlier, CloudWatch Metrics are an essential tool in this step. Find those RDS instances with 0 connections for a certain period of time, or EC2 instances with < 1% CPU Utilization or a DynamoDB table with no read/write operations, etc. AWS Trusted Advisor has a section where it identifies potentially idle resources, which can help with this task as well. Purchase DiscountsOnce you’ve identified and applied optimizations at both the application and infrastructure level, it’s a good time to consider purchasing discounts. There are two main categories in this area:
It’s very important that this step takes place after optimizations have been applied, otherwise you could end up spending money on long term commitments that could’ve been avoided by applying optimizations first. Negotiate DiscountsIf you incur in a certain amount of usage, there are areas where you can negotiate a better rate. For example, CloudFront offers a discounted rate if you commit to at least 10TB of Data Transfer per month. If you have some other service with a very high usage, reach out to an AWS Product Manager or Sales Manager for that service and ask for special pricing on certain usage types. Keep in mind that in most cases this only applies to usage types worth thousands of dollars. Step 5 - IterateReducing AWS cost is a continuous, iterative process. Sometimes it’s possible to get significant savings by applying relatively simple updates, but there are often cost areas that need to be monitored and refined constantly. That’s why it’s essential to approach cost savings with a long-term mindset and be aware that in many cases it’ll take at least 3-4 billing cycles to achieve an optimal AWS cost situation. After that, it’s very important to monitor cost on a regular basis and avoid any future “Bad AWS Billing Surprises”. It typically looks like this:
To SummarizeHere are the 5 steps described in this article: Do you need help lowering your AWS cost?Don’t overspend on AWS. If you need help lowering your AWS bill, I can save you thousands of dollars (I’ve done it time and again). Click on the button below to schedule a free consultation or use the contact form. Which EC2 option is best for long term workloads?Which of the following EC2 options is best for long-term workloads with predictable usage patterns? Reserved instances are the most economical option for long-term workloads with predictable usage patterns.
Which EC2 purchase option is the most efficient and costSpot Instances are a cost-effective choice if you can be flexible about when your applications run and if they can be interrupted. Dedicated Hosts or Dedicated Instances can help you address compliance requirements and reduce costs by using your existing server-bound software licenses.
Which of the following is the most costSpot Instances - Spot Instances are the most cost-effective choice if you are flexible about when your applications run and if your applications can be interrupted.
What is the best Amazon EC2 instance type to use?M3 instances are recommended if you are seeking general-purpose instances with demanding CPU requirements. M1 instances are the original family of general-purpose instances and provide the lowest cost options for running your applications.
|