Structured logging with SLF4J and Logback

2017-12-01 java

I don’t know who first coined the term structured logging. There is a 2015 blog post by James Turnbull to get started.

Python and .Net developers have libraries dedicated to structured logging : structlog and serilog. In this article I will describe how to do structured logging in Java with usual logging libraries like SLF4J et Logback.

Structured logging with SLF4J

All Java developers know how to log a message:

Logger demoLogger = LoggerFactory.getLogger("logodyssey.DemoLogger");
demoLogger.info("Hello world!");

Properly configured, it produces a log like

21:10:29.178 [Thread-1] INFO  logodyssey.DemoLogger - Hello world!

Notice how this "Hello world!" message is qualified with several fields: a timestamp, a thread Id, a level and a logger/category.

This is what the term "structured logging" means, a log is more than a message string. The message is associated with contextual information about what was occurring, it tells more detail about what was going on when this log was printed

How can we enrich this contextual information provided by default, and add the user Id for example? It is the purpose of the MDC (Mapped Diagnostic Context):

MDC.put("userId", "gquintana");
demoLogger.info("Hello world!");
MDC.remove("userId");

The MDC is a map-like object filled in the Java code, and used in the back-end logging library to output custom data. With the adequate configuration, we can get the user Id the log:

21:10:29.178 [Thread-1] gquintana INFO  logodyssey.DemoLogger - Hello world!

The MDC can store any information about the user (user Id, session Id, token Id), about the current request (request Id, transaction Id), about long running threads (batch instance Id, broker client Id). Later on, this information will be part of the log.

Having this kind of information allows to group logs by user, by request, by processing. Remember that logs may be scattered across different servers, on different time periods. These additional fields allow correlating logs belonging to the same scenario and finding answers to questions like "what was the user X doing when he met this nasty error?"

Let’s get back to the example, we saw the MDC stores extra information about logs. The MDC is usually based on a thread local variable, this has two drawbacks:

  1. It must be properly cleaned after being used, or you may experience information leaks if the thread is reused. Think about thread pools in web servers like Tomcat.

  2. The information may no be properly transfered from one thread to another. Think about asynchronous calls.

As a result, calling MDC.remove, like the above example, (or MDC.clear) is required to clean the MDC after usage. In order not to forget to do the housework afterwards, we can use a try-with-resource construct:

try(MDC.MDCCloseable mdc = MDC.putCloseable("userId", "gquintana")) {
        demoLogger.info("Hello world!");
}

It’s better but still verbose. Hopefully, this kind of code won’t make its way in your business code because it is usually hidden in an interceptor like a Servlet filter, a Spring aspect or a JAXRS interceptor. In Logback, there is a MDCInsertingServletFilter class which can serve as an example.

JSON logging with Logback

At this point, a log is more than simple string, it is qualified with useful information: timestamp, level, thread, user Id…​ How can we write this data structure on disk or send it over the wire to a log collection tool? We have to serialize it. For a human being, a simple text format as shown above is readable enough. However, for a machine, this is just a word soup without any structure. In short, to send structured logs to a log collection tool and benefit from this structure (search by user, by thread…​), we must use a structured format, like JSON for example.

Compared to the Syslog format, another popular log format, the JSON format

  • Can properly handle multi-line logs like stack traces/call traces or messages containing line separators (wanted or not)

  • Is a versatile format and can have custom fields like user Id, transaction Id

  • Is more verbose, so compression (GZip or the like) may be required to reduce the weight

Most popular log collection tools likes Filebeat, Graylog, Fluentd already use some kind of compressed JSON format under the hood. You should too.

Generating JSON logs with Logback is very easy. I’ll show how to use two Logback extensions, the Logstash Logback encoder and the Logback Contrib library.

The first one uses a Logback extension point known as encoder that you can plug into any appender:

    <appender name="FILE" class="ch.qos.logback.core.FileAppender">
        <file>log/log-odyssey.log</file>
        <encoder class="net.logstash.logback.encoder.LogstashEncoder">
            <customFields>{"application":"log-odyssey"}</customFields>
        </encoder>
    </appender>

It will produce the expected result:

{
  "@timestamp": "2017-11-25T21:10:29.178+01:00",
  "@version": 1,
  "message": "Hello world!",
  "logger_name": "logodyssey.DemoLogger",
  "thread_name": "Thread-1",
  "level": "INFO",
  "level_value": 20000,
  "HOSTNAME": "my-laptop",
  "userId": "gquintana",
  "application": "log-odyssey"
}

The Maven coordinates for this library are net.logstash.logback:logstash-logback-encoder:4.11.

The second one uses a different extension point called layout. In the end, it looks very similar to the first one, a bit more verbose though:

    <appender name="FILE" class="ch.qos.logback.core.FileAppender">
        <file>log/log-odyssey.log</file>
        <encoder class="ch.qos.logback.core.encoder.LayoutWrappingEncoder">
            <layout class="ch.qos.logback.contrib.json.classic.JsonLayout">
                <jsonFormatter class="ch.qos.logback.contrib.jackson.JacksonJsonFormatter"/>
                <appendLineSeparator>true</appendLineSeparator>
            </layout>
        </encoder>
    </appender>

The result is very close as well, even though the fields are named differently:

{
    "timestamp":"1511814391083",
    "level":"INFO",
    "thread":"Thread-1",
    "mdc": {
        "userId":"gquintana"
        },
    "logger":"logodyssey.DemoLogger",
    "message":"Hello world!",
    "context":"default"
}

In order to be on par with the first example, it is possible to subclass the JsonLayout and add custom fields:

public class CustomJsonLayout extends JsonLayout {
    @Override
    protected void addCustomDataToJsonMap(Map<String, Object> map, ILoggingEvent event) {
        map.put("application", "log-odyssey");
        try {
            map.put("host", InetAddress.getLocalHost().getHostName());
        } catch (UnknownHostException e) {
        }
    }
}

Several Maven dependencies are required ch.qos.logback.contrib:logback-json-classic:0.1.5, ch.qos.logback.contrib:logback-jackson:0.1.5 and com.fasterxml.jackson.core:jackson-databind for this library to work.

In the end these libraries are similar, both use the Jackson library to generate JSON. Contrary to the above JSON examples which have been prettyfied to be human readable, producing one JSON document per line is better because it is more compact, and each end of line marks the end of a log, there is no multi-line log. This format is known as NDJSON or and JSON Lines. Logstash and Filebeat can easily read this kind of JSON file.

Conclusion

A log is more than a textual message, it can be enriched with information at different levels:

  • Line of code: message, timestamp, level, threadId, appender…​

  • User or transaction: user Id, session Id…​

  • Deployment unit: application Id, container Id, host Id, environment Id (production, staging)…​

Once qualified with this contextual information, the log message becomes a structured piece of information and must be processed as such. Producing logs in JSON format allows to keep that structure and eases storing these logs in Elasticsearch. More on that later, it time permits.