Alarm system from application logs by CloudWatch Metrics [Step 2]

Kagawa
4 min readApr 9, 2021

In the last article, we went over how to write logs from C# application to CloudWatch without installing an agent. Now, the goal of this article is to create an alarm system out of applications logs so that we can proactively fix issues before users notice them.

Logs -> Metrics-> Alarms

Since we have structured logs in CloudWatch now, we can take advantage of them to create metrics so that we can detect something out of ordinately. Then, we can configure the condition in metrics so that we can receive an alert and proactively look into an issue.

Logs

First of all, let’s see what logs we put in the last article to get started. As you can see, the logs “Generated random value” has Number and Name variables, which we are using to query to filter logs.

Create Metric

In order to create metrics, go to your log group and click “Metric filters” and enter query pattern. Since we have structured logs, we can use $.VARIABLE syntax to filter only logs we need to create metric.

Here is the example. I want to send the number of logs to a metric whose “Number” exceeds 60.

Then, fill in metric name/namespace. I’m setting 1 for Metric value since I want to count the number of logs which matches the filter pattern.

Since metric is set up for our log group, when we start loggings to CloudWatch, we can see the number of count in our metric. One thing to note is that the default statistic is Average. It looks like we only have 1 event, but when we change statistic to Sum, we can see the correct count.

Create Alarm

Once metrics is set up, we can create an alarm to notify us. We create a condition and when it is met, we can use SNS to send an alarm message. It is also possible to use Lambda to send notification to slack or any chat tool, but in this example, we are just going to use email.

The condition we are setting here to send an alarm is that if we keep seeing value 60 more than 10 times within 5 minutes. If it happens, we receive an alarm for a possible issue automatically and take a look into it before users notice it.

If we want to be notified that alarm is off, we can set missing data as good.

That’s it, we did it! Now we can receive a notification when something out of ordinately happens and proactively work on possible issues.

Conclusion

Now that we have baseline work, we can keep improving our alarm system. We can use Lambda to call web hook for your chat tool so that we can react to alarm s faster. If we want to automate actions for an alarm, we can set up Lambda to call API to fix the issue.

When this system was set up, I was able to feel confident in my app working fine and saved lots of troubles since actions can be taken much faster and it saved time and effort to find helpful logs to find the cause and fix. I hope you feel the same way.

--

--