Log collection in AWS land

2017-09-30 elasticsearch cloud

In AWS, it is really easy to popup new machines and scale. The more machines you have, the more important it’s to centralize logs. In this article, I will describe what I discovered while trying to collect logs from applications deployed on Beanstalk and send them into Elasticsearch.

Disclaimer: I am an AWS newbie.

The family picture

Beanstalk: contains a web server and runs an application (Java, JS, Go…), both produce logs in a local /var/log/something directory. There can be multiple instances of the same application for scalability, or it can be different applications in the same environment.
Cloudwatch: is used to monitor EC2, Beanstalk… instances, it is the place where logs and metrics are gathered. From there you can trigger alerts, schedule tasks…
S3: is a file storage where can be used to archive logs on long term and survive instances stop. However theses logs are not easily searchable because they are compressed files.
Elasticsearch: can be used to index logs and make them searchable. A Kibana UI is provided to make search and dashboard building even more easy.
Lambda: a provided Lambda function is used to bridge logs from Cloudwatch to Elasticsearch

To ship logs into Cloudwatch, an AWSLogs agent is provided. To archive logs into S3, a script is cron-ed along with logrotate.

This article will skip the security (IAM) settings which are required to allow these components to communicate.

Beanstalk

There are several components producing logs in a Beanstalk instance:

Beanstalk deployment
Proxy server: either Apache or Nginx produce Access logs and Error logs
Web server: Tomcat, NodeJS…
Application with its own logging framework

Each component produces logs with its own format. Beanstalk knows how to automatically collect logs for the first three components: deploy logs, access logs, web server logs… As it knows where this logs are located it is in charge of rotating, archiving on S3, and purging log files.

The Beanstalk console allows to download 100 lines of each file for a given instance. It is useful to understand why a deployment fails, but it’s not meant to dig into the logs of a running application cluster.

To tell Beanstalk to take care of your application specific log files, just at some configuration files to indicate where they are located:

/opt/elasticbeanstalk/tasks/taillogs.d/my-app.conf

/var/log/my-app/my-app.log

/opt/elasticbeanstalk/tasks/bundlelogs.d/my-app.conf

/var/log/my-app/my-app.*.log

The first file allows my-app.log (current log file) content to appear in the Beanstalk console. The second file allows all (old log files) files to be archived.

Cloudwatch Logs

First of all, the log stream from Beanstalk to Cloudwatch must be enabled in Beanstalk configuration file:

.ebextensions/cloudwatch.config

option_settings:
  aws:elasticbeanstalk:cloudwatch:logs:
    StreamLogs: true
    DeleteOnTerminate: false
    RetentionInDays: 30

This sets up the Cloudwatch log agent. Provided you’re using an image based on Amazon Linux, you can also install it on any EC2 instance with yum:

yum update -y
yum install -y awslogs
service awslogs start

Then the Cloudwatch log agent must be configured to watch your custom log files.

/etc/awslogs/config/my-app.conf

[/var/log/my-app/my-app.log]
log_group_name=/aws/elasticbeanstalk/my-app-dev/var/log/my-app/my-app.log
log_stream_name={instance_id}
file=/var/log/my-app/my-app*.log

Log files produced my components (Apache, Tomcat…) managed by Beanstalk are already configured. The above config file is only for application specific log files. This agent support multiline logs (like stacktraces, call trace…) provided they start with a whitespace (space, tab). It’s not as powerful as Filebeat or Logstash.

At this point, you’ll be able to see logs aggregated from multiple Beanstalk instances in the Cloudwatch console.

It’s better than Beanstalk console to monitor a running platform. Yet, log search is still limited because logs are not structured (split into fields) and the full text search is simplistic.

Using Cloudwatch, it is also possible :

to raise alerts when a specific pattern is found in logs
to extract metrics from logs (HTTP request time, number of 404 errors in Access logs for example) and draw charts

More information can be found here.

Elasticsearch

To send logs into Elasticsearch and get a better log search experience, subscribe a log filter to each Cloudwatch log group. There is a special Lambda which can do Log filtering and send logs to Elasticsearch. This log filter can be used to split text logs into fields:

This tool can split space delimited logs, like access logs, into fields. But it’s hard to "grok" more complicated logs with such basic tool. It supports JSON formatted logs, so a good solution for application logs, is to configure your favorite logging framework to produce JSON logs.

This article is worth reading.

At this point, we can open Kibana and configure an index pattern named cwl-*. Cloudwatch log filter mimics Logstash and uses a field named @timestamp for timestamp

Conclusion

AWS provides all the building blocks to centralize logs and monitor your whole infrastructure. It’s not hard to collect logs and send them in Elasticsearch. But it’s also far less powerful than the complete Elastic stack.