AWS Devops Guru

Vishal Raj
4 min readSep 25, 2022

--

Introduction to Devops Guru

During the AWS re:Invent December 2020 event, AWS announced the release of a revolutionizing new product — Devops Guru (press release here). As per the official documentation

“AWS Devops Guru is a fully-managed operations service that uses machine learning to make it easier for developers to improve application availability by automatically detecting operational issues and recommending specific actions to remediations.”

In todays’ world, the applications are growing complex and distributed in nature. As more services are added, it requires attention on multiple fronts from the Devops team, viz for logging, monitoring, alarms setup, notifications and more. These tasks become tedious and often repetitive in nature. In case of any incidence or alarms going off, it can be overwhelming to understand what went wrong, when did it occur, what’s the root cause and probable fix. Often this procedure can take long, rendering longer MTTR (Mean Time to Recovery), thus causing bad user experience. This is where Devops Guru steps in, to make the life easier for developers as well as Devops team.

What exactly is Devops Guru

Devops Guru is a fully managed operations service that enabled developer and Devops team to improve the application availability and infrastructure performance. It has been designed to provide pro-active as, well as reactive insights for the anomalies detected, provide accurate root cause analysis and most probable fixes for the same. Devops Guru has been designed based on the years of experience of handling numerous applications running on the AWS infrastructure.

How does Devops Guru works

Let have a high-level view of how Devops Guru functions. This can be broken down into the following three stages.

  1. In the first step, the Devops Guru should be made aware of the resource to be monitored. This can be done by specifying an account or a CloudFormation stack or a list of tags which encompasses various resources.
  2. Once the boundaries have been set, Devops Guru will start analyzing the resources (application and the corresponding infrastructure) with insights from CloudTrail, CloudWatch and more. It can take anywhere from a few hours to up to a day before it starts producing resourceful insights.
  3. Devops Guru, when configured, sends notifications via SNS for anomalies detected.

Once Devops Guru is made aware of the list of the resources to monitor, it would start analyzing the metrics and logs for last two weeks, to understand various usage patterns and automatically adjust itself to understand if a change is really an anomaly or expected. Since Devops Guru monitors resources continuously, there is no need to set or change any thresholds manually. As the application behavior changes, rather it would automatically adjust and understand as the pattern keep changing over the period. Devops Guru uses machine learning to evaluate and create useful insights.

Let’s take a simple example of CloudFormation template which defines three main components — An API Gateway, a Lambda function and DynamoDB to store data. Let’s says that someone accidentally updates the DynamoDB to reduce its read capacity. At the same time, the app sees a surge in the HTTP traffic. Since the DynamoDB is functioning at a reduced capacity, hence, it would start throttling, eventually leading to timeouts in the database reads and the API gateway will start throwing HTTP 500 to the users. If this issue was to be fixed manually, it may take longer time to detect the root cause. Alternatively, if the system was being monitored by Devops Guru, it would have detected that a DynamoDB configuration was changed, right before the errors started to show up. Hence it can co-relate the events and suggest appropriate action which can lead to faster MTTR.

Integrating Devops Guru with AWS services and third-party tools

Devops Guru natively integrates with various AWS services such as CloudWatch, X-Ray, CloudTrail, CloudFormation, Config and many more. It can also integrate with EventBridge, enabling user to setup routing rules to determine where to send the notifications. It can also integrate with third-party incidence management tools from Atlassian and PagerDuty. Both the tools have ability to ingest SNS notification from Devops Guru and can managed from their internal dashboard.

Devops Guru free tier availability

As for the most of the AWS resources, Devops Guru also has certain quota available as free tier usage. This includes 3 months of free usage which includes 7200 resource hours per group, per month of monitoring. It also includes 10,000 API calls to Devops Guru.

Devops Guru cost estimates

Devops Guru provides a cost estimator in its dashboard so that users can understand the budget and how much would it cost to use the services. Once the free tier usage is exhausted, you need to pay for using the Devops Guru service. Devops Guru is charged based on the number of hours of active resource monitoring is done. Consider that if an S3 bucket is setup for monitoring, then it would be charged as long as it is under monitoring. Alternatively, if an EC2 instance has been setup for monitoring, but lets say that it runs only for a few hours every 24 hours, then the cost would be incurred only for the time that the EC2 is up and running.

Bibliography:
AWS on Google podcasts
AWS Official Documentation
YouTube — Episode 1 / Episode 2 / Episode 3

NOTE: Images and examples used are from the sources mentioned in the bibliography.

--

--