Logstash? Grok?

If you ended up here from a search engine’s results, then you should know what Logstash and grok are and can jump directly to the next section. For the others, Logstash is a data processing software that is able to ingest data from multiple sources, transform it and send it to some other tool. It is often used in combination with Elasticsearch, a distributed search and analytics engine, and Kibana, a frontend to Elasticsearch that helps visualizing data, to form what is called an “ELK” stack.

One way to transform data received by Logstash is to use the Grok filter plugin. In short, Grok uses regular expression patterns to parse and structure arbitrary text. A Grok pattern has the following form: %{SYNTAX:SEMANTIC}, where SYNTAX is the name of the pattern and SEMANTIC is the identifier you choose for the pattern in the specific context. A number of SYNTAX patterns are built-in in Logstash (see the logstash-patterns-core repository). As an example, taken straight from this repository is the definition of the SYNTAX pattern that defines an hour:

HOUR (?:2[0123]|[01]?[0-9])

It can be combined with other patterns to create new ones:

TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])

In any real case scenario, it is likely that you will build custom patterns to deal with your data.

Now that all of that has been cleared up, let’s go straight to the reason of this post: I could not find any resource about how to test custom patterns effortlessly. The goto answer is usually to check some online web applications. As of this writing, the official Logstash documentation suggests to either use Grok Debugger or Grok Constructor if you need help building patterns to match your logs. I don’t know about you but using untrusted online resources to test grok patterns on potentially sensitive data is a no go for me. So what other option is there? Well, have a standalone Logtash instance configured in a way that is easy to test your patterns with excerpt from your data!

Setting up an environment to test Logstash Grok patterns

What I’m describing here has been tested with Logstash 5.2.2, which is the latest available Logstash version at the moment of writing this article. Please, adapt the commands and configuration to the version of Logstash that you are using if necessary.

Setting the environment can be done in a limited number of easy steps:

  1. Get Logstash, which can be downloaded from this page. In my case, I went for the gzip compressed tarball:

     $ cd /tmp && fetch https://artifacts.elastic.co/downloads/logstash/logstash-5.2.2.tar.gz
    

    I’m using DragonFly BSD so I naturally use fetch(1) but use the method you prefer to get Logstash.

  2. Extract the tarball.

     $ tar -xzf logstash-5.2.2.tar.gz
     $ cd logstash-5.2.2
    
  3. Create a configuration file. Typical Logstash configuration file have 3 sections: input, filter and output. The goal here is to set the input to a file since the goal is to test the grok patterns against a subset of data and this is an easy way to do it. The output section is configured to standard output but only for the Grok parsing failures, making sure in the process that the output is easy to read (codec => rubydebug). At the early stage of building your pattern, you may actually want to have only one line in your test file and print out the matched result. In this case, also make sure to comment or remove the if condition in the output section of the configuration file so that everything is actually printed out to your standard output, not only the parsing failures. Finally, the filter section is where the Grok match pattern is defined. Here is a full example, which you can adapt to your needs:

     input {
         file {
             type => "custom"
             path => ["/tmp/logstash-5.2.2/sample_data.txt"]
         }
     }
     filter {
         if [type] == "custom" {
             grok {
                 patterns_dir => "/tmp/logstash-5.2.2/patterns"
                 match => {
                     "message" => "%{CUSTOMPATTERN1:foo} %{CUSTOMPATTERN2:bar}"
                 }
             }
         }
     }
     output {
         if "_grokparsefailure" in [tags] {
             stdout {
                 codec => rubydebug
             }
         }
     }
    

    In this example, the input data is the file /tmp/logstash-5.2.2/sample_data.txt. The type of data is set to custom. The custom Grok patterns are defined in separate files within /tmp/logstash-5.2.2/patterns directory.

  4. Run Logstash with useful command-line switches.

     $ ./bin/logstash --path.config logstash.conf --config.reload.automatic
    

    Every time the pattern is modified in the configuration file, Logstash reloads it automatically, which is pretty handy. When a line from the sample data file does not match, something is printed to the standard output which looks like this:

     {
               "path" => "/tmp/logstash-5.2.2/sample_data.txt",
         "@timestamp" => 2017-03-12T20:24:13.565Z,
           "@version" => "1",
               "host" => "myhost.local",
            "message" => "sample non matching pattern",
               "type" => "custom",
               "tags" => [
             [0] "_grokparsefailure"
         ]
     }
    

That’s it! Happy patterns building :)