How to monitor EC2, CloudWatch, EBS, RDS, ELB, ElasticCache using metrics

AWS is the front runners when it comes to have highly available, fault tolerant, secure and high scaling service which can integrate with almost everything in cloud as well as your own data center.

For monitoring your VPC(Virtual Private Cloud), AWS has these two very renowned services:
– CloudWatch and
– CloudTrail

While ‘CloudTrail’ is primarily used to monitor the API calls made to other services or Applications, ‘CloudWatch’ is used for monitoring and logging events periodically (default every 5 minutes, detailed every minute).

It can be used to monitor:
– Compute resources like EB2s, ELBs, Route53, Auto Scaling Groups,
– Storage & CDN resources like  S3, CloudFront, EBS Volumes,
– Database and Analytics services like DynamoDB, Elastic cache, RDS, Elastic MapReduce, Redshift
– SNS, SQS etc.
Let’s cover them one by one on how CloudWatch monitors different services:

A- EC2(Elastic Compute Cloud)

CloudWatch can monitor the following metrics:

# CPU 

– CPUCreditUsage (number of CPU credits consumed by the instance. One CPU credit equals one vCPU running at 100% utilization for one minute)

– CPUCreditBalance (number of CPU credits available for the instance to burst beyond its base CPU utilization, expire every 24 hrs)

– CPUUtilization (percentage of allocated EC2 compute units)

# Network

– NetworkIn (number of bytes received on all network interfaces by the instance)

– NetworkOut (number of bytes sent out on all network interfaces by the instance)

– NetworkPacketsIn (number of packets received on all network interfaces by the instance)

– NetworkPacketsOut (number of packets sent out on all network interfaces by the instance)

# Disk

– DiskReadOps (Completed read operations from all instance volumes)

– DiskWriteOps (Completed write operations to all instance store volumes )

– DiskReadBytes (Bytes read from all instance store volumes )

– DiskWriteBytes (Bytes written to all instance store volumes)

# Status Check

– StatusCheckFailed (Reports whether the instance has passed both the instance status check and the system status check in the last minute)

– StatusCheckFailed_Instance (whether the instance has passed the instance status check in the last minute.)

– StatusCheckFailed_System (whether the instance has passed the system status check in the last minute.)

– You can create alarms based on these above metrics to watch the health of the host and the instances.

Rest of the article to be continued…



CloudWatch – a service to do EC2 Instance Health Check/Monitoring , Troubleshooting, Metrics and Analysis

The Health Check/Monitoring , Troubleshooting, Metrics and Analysis of the EC2 instances and getting timely alerts to fix the problems to keep your cloud architecture highly available, auto-scaling and fault tolerant are one of the important roles and responsibilities of a cloud architect or SysOps admin. Let’s check how we can achieve this.

So lets first try to understand what CloudWatch is – Its is a AWS’s health monitoring service to monitor the AWS resources and the applications. It can monitor the following:
– Compute resources like Auto scaling groups, Load balancers, Route 53,
– Storage resources like EBS volumes, storage gateways, Cloud Front,
– Database services like relational RDS instances, non-relational services                   like DynamoDB,
– Analytics services like Elastic Map Reduce, Red Shift,
– In-memory cache services like Elastic Cache to name a few.

The CloudWatch can monitor the following metric:
 – CPU Utilization
 – Disk Reads
 – Network In and Outs
 – Status checks
But it can’t check a few other metrics like Memory Utilization for that we have to add custom metrics, which we will see later in this post.

The default monitoring checks these metrics every 5 minutes whereas the detailed monitoring is every 1 minute.

The status checks listed above can be of two type:
 – System Status Checks – checks related to the host on which the instance                  is virtualized. E.g Loss of network or power,  software or hardware issues               on the host machine. Normally restarting/terminating the instance or                           contacting AWS are the options available.
 – Instance Status Checks – checks related to the VM(Virtual machine) itself.               E.g. memory leaks, corrupted file system, incompatible file system,                                  mis-configured network. Normally restarting/terminating the instance or               checking/trouble shooting your own application for bugs are the options.

On the AWS console go to the CloudWatch service :
 – Click “Create dashboard”
 – Add a widget to dashboard based on the metrics listed above
 – Save the dashboard.(See snapshot below)

Now what if we want to monitor a custom metrics(Memory Utilization) which is not monitored by default by CloudWatch. Well then we have to use some custom scripts for it. Lets see how it is done.

- Install the required packages:
 sudo yum install perl-Switch perl-DateTime perl-Sys-Syslog perl-LWP-Protocol-https
- Download the CloudWatch Custom Monitoring Scripts:
 curl -O
- Unzip the scripts:
 cd aws-scripts-mon
- Execute the script(You will get a "Successfully reported metrics to CloudWatch. Reference Id: 84bf63d3-2841-11e7-a20f-7786b8297dbd
" message on success):
 ./ --mem-util --mem-used --mem-avail
- Add a crontab job for 5 minutes intervals:
 */5 * * * * ~/aws-scripts-mon/ --mem-util --disk-space-util --disk-path=/ --from-cron

Once you have run these scripts successfully, the custom metrics for memory utilization will also be available and you can add it as a widget. See below.