172.17.0.1 - - [05/Sep/2016:20:06:17 +0000] "GET /images/logos/hubpress.png HTTP/1.1" 200 5432 "http://localhost/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/51.0.2704.79 Chrome/51.0.2704.79 Safari/537.36" "-"
Testing Logstash configuration
 
        2016-09-07 logstash
You wrote a piece of Logstash configuration which can parse some logs. You tested several corner cases to ensure the output in Elasticsearch was alright. How do you protect this clever configuration file against regressions?
Unit testing to the rescue of course!
Simple example
For the sake of simplicity, we will take an obvious example: access logs. The input looks like
The output, once in Elasticsearch, should look like
{ "@version":"1",
  "@timestamp":"2016-09-05T20:06:17.000Z",
  "type":"nginx",
  "host":"nginx-server", "path":"/var/log/nginx/access.log",
  "clientip":"172.17.0.1", "ident":"-", "auth":"-",
  "verb":"GET","request":"/images/logos/hubpress.png","httpversion":"1.1",
  "response":200, "bytes":5432, "referrer":"\"http://localhost/\"",
  "agent": "\"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/51.0.2704.79 Chrome/51.0.2704.79 Safari/537.36\""
}The configuration could look like
input {
    file {
        path => "/var/log/nginx/access*.log"
        type => "nginx"
    }
}
filter {
    if [type] == "nginx" {
        grok {
            match => [ "message" , "%{COMBINEDAPACHELOG}"]
        }
        date {
            match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
        }
        mutate {
            convert => ["response", "integer"]
            convert => ["bytes", "integer"]
        }
    }
}
output {
    elasticsearch {
      hosts => [ "es-server"]
      index => "logstash-%{+YYYY.MM.dd}"
      document_type => "%{type}"
    }
}Split the file
In the above config file, the interesting part, the one containing logic is the filter part. In order to test it, the first thing to do is split this big file into small pieces:
- 
01_logstash_input_nginx.confcontains the nginx file input
- 
02_logstash_filter_nginx.confcontains the nginx filter section
- 
03_logstash_output.confcontains the elasticsearch output
In production, you can load multiple config files as if they were a single one:
logstash agent -f /etc/logstash.d/*.conf"
At test time, by picking a single configuration file 02_logstash_filter_nginx.conf, the Nginx log parsing can be tested in isolation.
Write the unit test
Now let’s test the 02_logstash_filter_nginx.conf file alone and write a simple Ruby test case.
As you may know, Logstash is written in JRuby.
# encoding: utf-8
require "logstash/devutils/rspec/spec_helper"
# Load the configuration file
@@configuration = String.new
@@configuration << File.read("conf/02_logstash_nginx_filter.conf")
describe "Nginx filter" do
  config(@@configuration) (1)
  # Inject input event/message into the pipeline
  message = "172.17.0.1 - - [05/Sep/2016:20:06:17 +0000] \"GET /images/logos/hubpress.png HTTP/1.1\" 200 5432 \"http://localhost/\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/51.0.2704.79 Chrome/51.0.2704.79 Safari/537.36\" \"-\""
  sample("message" => message, "type" => "nginx") do (2)
    # Check the ouput event/message properties
    insist { subject.get("type") } == "nginx" (3)
    insist { subject.get("@timestamp").to_iso8601 } == "2016-09-05T20:06:17.000Z"
    insist { subject.get("verb") } == "GET"
    insist { subject.get("request") } == "/images/logos/hubpress.png"
    insist { subject.get("response") } == 200
    insist { subject.get("bytes") } == 5432
    reject { subject.get("tags").include?("_grokparsefailure") }
    reject { subject.get("tags").include?("_dateparsefailure") }
  end
end| 1 | Load configuration file | 
| 2 | Inject input event/message into the pipeline | 
| 3 | Check the ouput event/message properties | 
This test uses the JRuby testing framework called RSpec (describe method).
The config and sample functions are located in the Logstash DevUtils library.
The insist and reject functions are part of the Ruby Insist assertion library.
Run the unit tests
First we will need to download and install additional development libraries like those mentioned above.
$ logstash-2.4.0/bin/logstash-plugin install --development Installing logstash-devutils, logstash-input-generator, logstash-codec-json, logstash-output-null, logstash-filter-mutate, flores, rspec, stud, pry, rspec-wait, childprocess, ftw, logstash-output-elasticsearch, rspec-sequencing, gmetric, gelf, timecop, jdbc-derby, docker-api, logstash-codec-plain, simplecov, coveralls, longshoreman, rumbster, logstash-filter-kv, logstash-filter-ruby, sinatra, webrick, poseidon, logstash-output-lumberjack, webmock, logstash-codec-line, logstash-filter-grok Installation successful
Now we can run the test, Logstash comes with a rspec command to run these spec files.
$ logstash-2.4.0/bin/rspec 02_logstash_nginx_filter_spec.rb
Using Accessor#strict_set for specs
Run options: exclude {:redis=>true, :socket=>true, :performance=>true, :couchdb=>true, :elasticsearch=>true, :elasticsearch_secure=>true, :export_cypher=>true, :integration=>true, :windows=>true}
.
Finished in 0.115 seconds (files took 0.784 seconds to load)
1 example, 0 failures
Randomized with seed 4384
The rspec command can also run multiple tests at once.
$ logstash-2.4.0/bin/rspec spec -P '**/*_spec.rb'
To prevent test dependencies, they are randomly ordered: This called randomized testing.
Give me the code!
All the code shown in this article is available in Github.
Other posts
- 2020-11-28 Build your own CA with Ansible
- 2020-01-16 Retrieving Kafka Lag
- 2020-01-10 Home temperature monitoring
- 2019-12-10 Kafka connect plugin install
- 2019-07-03 Kafka integration tests