CloudWatch – a service to do EC2 Instance Health Check/Monitoring , Troubleshooting, Metrics and Analysis

The Health Check/Monitoring , Troubleshooting, Metrics and Analysis of the EC2 instances and getting timely alerts to fix the problems to keep your cloud architecture highly available, auto-scaling and fault tolerant are one of the important roles and responsibilities of a cloud architect or SysOps admin. Let’s check how we can achieve this.

So lets first try to understand what CloudWatch is – Its is a AWS’s health monitoring service to monitor the AWS resources and the applications. It can monitor the following:
– Compute resources like Auto scaling groups, Load balancers, Route 53,
– Storage resources like EBS volumes, storage gateways, Cloud Front,
– Database services like relational RDS instances, non-relational services                   like DynamoDB,
– Analytics services like Elastic Map Reduce, Red Shift,
– In-memory cache services like Elastic Cache to name a few.

The CloudWatch can monitor the following metric:
 – CPU Utilization
 – Disk Reads
 – Network In and Outs
 – Status checks
But it can’t check a few other metrics like Memory Utilization for that we have to add custom metrics, which we will see later in this post.

The default monitoring checks these metrics every 5 minutes whereas the detailed monitoring is every 1 minute.

The status checks listed above can be of two type:
 – System Status Checks – checks related to the host on which the instance                  is virtualized. E.g Loss of network or power,  software or hardware issues               on the host machine. Normally restarting/terminating the instance or                           contacting AWS are the options available.
 – Instance Status Checks – checks related to the VM(Virtual machine) itself.               E.g. memory leaks, corrupted file system, incompatible file system,                                  mis-configured network. Normally restarting/terminating the instance or               checking/trouble shooting your own application for bugs are the options.

On the AWS console go to the CloudWatch service :
 – Click “Create dashboard”
 – Add a widget to dashboard based on the metrics listed above
 – Save the dashboard.(See snapshot below)

Now what if we want to monitor a custom metrics(Memory Utilization) which is not monitored by default by CloudWatch. Well then we have to use some custom scripts for it. Lets see how it is done.

- Install the required packages:
 sudo yum install perl-Switch perl-DateTime perl-Sys-Syslog perl-LWP-Protocol-https
- Download the CloudWatch Custom Monitoring Scripts:
 curl -O
- Unzip the scripts:
 cd aws-scripts-mon
- Execute the script(You will get a "Successfully reported metrics to CloudWatch. Reference Id: 84bf63d3-2841-11e7-a20f-7786b8297dbd
" message on success):
 ./ --mem-util --mem-used --mem-avail
- Add a crontab job for 5 minutes intervals:
 */5 * * * * ~/aws-scripts-mon/ --mem-util --disk-space-util --disk-path=/ --from-cron

Once you have run these scripts successfully, the custom metrics for memory utilization will also be available and you can add it as a widget. See below.