Testing Logstash configuration

2016-09-07 logstash

You wrote a piece of Logstash configuration which can parse some logs. You tested several corner cases to ensure the output in Elasticsearch was alright. How do you protect this clever configuration file against regressions?

Unit testing to the rescue of course!

Simple example

For the sake of simplicity, we will take an obvious example: access logs. The input looks like

172.17.0.1 - - [05/Sep/2016:20:06:17 +0000] "GET /images/logos/hubpress.png HTTP/1.1" 200 5432 "http://localhost/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/51.0.2704.79 Chrome/51.0.2704.79 Safari/537.36" "-"

The output, once in Elasticsearch, should look like

{ "@version":"1",
  "@timestamp":"2016-09-05T20:06:17.000Z",
  "type":"nginx",
  "host":"nginx-server", "path":"/var/log/nginx/access.log",
  "clientip":"172.17.0.1", "ident":"-", "auth":"-",
  "verb":"GET","request":"/images/logos/hubpress.png","httpversion":"1.1",
  "response":200, "bytes":5432, "referrer":"\"http://localhost/\"",
  "agent": "\"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/51.0.2704.79 Chrome/51.0.2704.79 Safari/537.36\""
}

The configuration could look like

input {
    file {
        path => "/var/log/nginx/access*.log"
        type => "nginx"
    }
}
filter {
    if [type] == "nginx" {
        grok {
            match => [ "message" , "%{COMBINEDAPACHELOG}"]
        }
        date {
            match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
        }
        mutate {
            convert => ["response", "integer"]
            convert => ["bytes", "integer"]
        }
    }
}
output {
    elasticsearch {
      hosts => [ "es-server"]
      index => "logstash-%{+YYYY.MM.dd}"
      document_type => "%{type}"
    }
}

Split the file

In the above config file, the interesting part, the one containing logic is the filter part. In order to test it, the first thing to do is split this big file into small pieces:

  • 01_logstash_input_nginx.conf contains the nginx file input

  • 02_logstash_filter_nginx.conf contains the nginx filter section

  • 03_logstash_output.conf contains the elasticsearch output

In production, you can load multiple config files as if they were a single one:

logstash agent -f /etc/logstash.d/*.conf"

At test time, by picking a single configuration file 02_logstash_filter_nginx.conf, the Nginx log parsing can be tested in isolation.

Write the unit test

Now let’s test the 02_logstash_filter_nginx.conf file alone and write a simple Ruby test case. As you may know, Logstash is written in JRuby.

02_logstash_filter_nginx_spec.rb
# encoding: utf-8
require "logstash/devutils/rspec/spec_helper"

# Load the configuration file
@@configuration = String.new
@@configuration << File.read("conf/02_logstash_nginx_filter.conf")

describe "Nginx filter" do

  config(@@configuration) (1)

  # Inject input event/message into the pipeline
  message = "172.17.0.1 - - [05/Sep/2016:20:06:17 +0000] \"GET /images/logos/hubpress.png HTTP/1.1\" 200 5432 \"http://localhost/\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/51.0.2704.79 Chrome/51.0.2704.79 Safari/537.36\" \"-\""
  sample("message" => message, "type" => "nginx") do (2)
    # Check the ouput event/message properties
    insist { subject.get("type") } == "nginx" (3)
    insist { subject.get("@timestamp").to_iso8601 } == "2016-09-05T20:06:17.000Z"
    insist { subject.get("verb") } == "GET"
    insist { subject.get("request") } == "/images/logos/hubpress.png"
    insist { subject.get("response") } == 200
    insist { subject.get("bytes") } == 5432
    reject { subject.get("tags").include?("_grokparsefailure") }
    reject { subject.get("tags").include?("_dateparsefailure") }
  end
end
1 Load configuration file
2 Inject input event/message into the pipeline
3 Check the ouput event/message properties

This test uses the JRuby testing framework called RSpec (describe method). The config and sample functions are located in the Logstash DevUtils library. The insist and reject functions are part of the Ruby Insist assertion library.

Run the unit tests

First we will need to download and install additional development libraries like those mentioned above.

$ logstash-2.4.0/bin/logstash-plugin install --development
Installing logstash-devutils, logstash-input-generator, logstash-codec-json, logstash-output-null, logstash-filter-mutate, flores, rspec, stud, pry, rspec-wait, childprocess, ftw, logstash-output-elasticsearch, rspec-sequencing, gmetric, gelf, timecop, jdbc-derby, docker-api, logstash-codec-plain, simplecov, coveralls, longshoreman, rumbster, logstash-filter-kv, logstash-filter-ruby, sinatra, webrick, poseidon, logstash-output-lumberjack, webmock, logstash-codec-line, logstash-filter-grok
Installation successful

Now we can run the test, Logstash comes with a rspec command to run these spec files.

$ logstash-2.4.0/bin/rspec 02_logstash_nginx_filter_spec.rb
Using Accessor#strict_set for specs
Run options: exclude {:redis=>true, :socket=>true, :performance=>true, :couchdb=>true, :elasticsearch=>true, :elasticsearch_secure=>true, :export_cypher=>true, :integration=>true, :windows=>true}
.

Finished in 0.115 seconds (files took 0.784 seconds to load)
1 example, 0 failures

Randomized with seed 4384

The rspec command can also run multiple tests at once.

$ logstash-2.4.0/bin/rspec spec -P '**/*_spec.rb'

To prevent test dependencies, they are randomly ordered: This called randomized testing.

Give me the code!

All the code shown in this article is available in Github.