Logstash pipelines remote configuration and self-indexing

Logstash pipelines remote configuration and self-indexing

Do you like our work......we hire!

Never miss our publications about Open Source, big data and distributed systems, low frequency of one email every two months.

Logstash is a powerful data collection engine that integrates in the Elastic Stack (Elasticsearch - Logstash - Kibana). The goal of this article is to show you how to deploy a fully managed Logstash cluster incuding access to the logs and remotely configurable pipelines.

Previously the Elastic stack had two issues when it comes to a fully managed service: Kibana did not support Elasticsearch High Availabilty and Logstash Pipelines were not configurable remotely.

Theses two problems are resolved since ELK 6, Kibana now supports multiple hosts (and can autodiscover new hosts) and logstash pipelines can be modified directly on Kibana using Centralized Pipeline Management.

Configure Centralized Pipeline management

Centralized Pipeline Management is available since ELK 6.0 and requires a valid Elasticsearch Gold or Platinium License. If you do not have one, don’t worry! You won’t be able to modify the pipelines on Kibana but the rest of the blog post will still be relevant.

How Pipeline management works is simple: pipelines configuration are stored on Elasticsearch under the .logstash index. A user having write access to this index can configure pipelines through a GUI on Kibana (under Settings -> Logstash -> Pipeline Management)

Pipeline management UI

On the Logstash instances, you will set which pipelines are to be managed remotely. Logstash will then regularly check for changes and apply them. Local configuration for this pipeline will not apply once remote management is set.

Here is a snippet of what the logstash.yml looks like:

...
xpack:
  management:
    enabled: true
    elasticsearch:
      hosts:
        - 'host1.mydomain:9200'
        - 'host2.mydomain:9200'
        - 'host3.mydomain:9200'
      username: logstash_admin_user
      password: MySecretPassword
      ssl:
        certificate_authority: /usr/share/logstash/config/certs/cacert.pem
    pipeline:
      id:
        - logstash_logs
        - client_pipeline1
        - client_pipeline2

If you deliver this to a client on a secured environment, you need to establish with your client:

  • How many pipelines are needed, since adding a pipeline is a administrator operation and requires restarting the Logstash instances.
  • If the pipelines are listening to connections (listening for Filebeat content for example) you will need to open the ports where the client can listen to.
  • If you are using SSL using your own certificate authorities, do not forget to make the CACert readable by the Logstash process.

Tips and tricks:

  • Create the pipeline on Kibana before starting your Logstash instance! If the pipeline is not created Logstash will not boot up correctly. Just the default empty pipeline will do the trick.
  • logstash_admin_user needs the following rights:
    • CRUD on .logstash* index (available in the default logstash_admin role)
    • Cluster privileges: manage_index_templates, monitor, manage_ilm

Provide access to Logstash logs on Kibana

The modification on the pipeline definition are applied on your Logstash instances, great! But the user complains that his pipelines are not working and want access to Logstash logs to see why. As the administrator you don’t want the user to access directly the logs on the machine. Why not use the Elasticsearch cluster to index the Logstashs logs? Let’s see how to do it!

Make Logstash write to files (on Docker)

Logstash uses log4j2 to manage its logging. The default log4j2.properties is available on the Logstash GitHub repo. Be aware that a much simplified log4j2.properties is shipped inside the official Docker image. This simplified version will only write standard logs to stdout.

What we want to achieve here is:

  • write logs to a file
  • Rollover this file based on date and/or size
  • Make this file available on the host machine (when using Docker)

The default log4j2.properties will manage the rollover, but will also writes to stdout.

Disable logging to stdout

I would not recommend logging to the console where you could use files instead. Especially on Docker it will probably end up filling a filesystem you’re not in control of.

To disable logging to the console, comment these two lines in your log4j2.properties (if using the default one). You will be able to dynamically reenable it using the Logstash API.

#rootLogger.appenderRef.console.ref = ${sys:ls.log.format}_console
#logger.slowlog.appenderRef.console_slowlog.ref = ${sys:ls.log.format}_console_slowlog

Set the log path and the log format

I recommend using the PLAIN format and not the JSON one. It is much easier to read and not that hard to parse. Place these properties in your logstash.yml

path.logs: /usr/share/logstash/logs
log.format: plain

(Docker only) Create a Logstash user on the host machine

Since we want our Logstash containers to be able to write on our host machine, we must create an user with write permissions on our log directory. The user need to have the same UID both in the container and on the host machine. By default the “logstash” user in the container got UID 1000 who will often be associated to “root” which is not something you want to do.

# On the host machine
groupadd -g 1010 logstash
useradd -u 1010 -c "Logstash User" -d /var/lib/logstash -g logstash logstash
mkdir /var/logs/logstash/mycluster/logstash_logs
chown -R logstash:logstash /var/logs/logstash/mycluster/logstash_logs

We just created a “logstash” group with a specified GID (1010), a “logstash” user with a specified UID (1010) and placed it in its group. We also created a directory where the Logstash container can write to.

(Docker only) Change the User and UID in the Logstash container

We need to modify a little bit our Docker image to reflect our newly created “logstash” user. This Dockerfile works using Logstash version 7.3.

FROM logstash:7.3.1
USER root
RUN groupmod -g 1010 logstash && usermod -u 1010 -g 1010 logstash && chown -R logstash /usr/share/logstash && \
    sed -i -e 's/--userspec=1000/--userspec=1010/g' \
      -e 's/UID 1000/UID 1010/' \
      -e 's/chown -R 1000/chown -R 1010/' /usr/local/bin/docker-entrypoint && chown logstash /usr/local/bin/docker-entrypoint
USER logstash

To build the image: docker build --rm -f Dockerfile -t logstash-set-user:7.3.1 .

We will then need to use the image logstash-set-user:7.3.1 instead of the default one.

(Docker only) Manage your mounting points in your docker-compose file

When using a compose file to deploy your Logstash, you can declare:

...
services:
  logstash:
    user: logstash
    volumes:
      - '/etc/elasticsearch/mycluster/logstash/conf/logstash.yml:/usr/share/logstash/config/logstash.yml'
      - '/etc/elasticsearch/mycluster/logstash/conf/log4j2.properties:/usr/share/logstash/config/log4j2.properties'
      - '/var/log/elk/mycluster/logstash:/usr/share/logstash/logs'
    image: 'logstash-set-user:7.3.1'

Add a pipeline that gathers logs and sent them to Elasticsearch

The Pipeline definition that I use for this purpose is this one:

input {
    file {
        path => "/usr/share/logstash/logs/*.log"
        sincedb_path => "/usr/share/logstash/logs/sincedb"
    }
}

filter {
    grok {
        match => { "message" => "\[%{TIMESTAMP_ISO8601:tstamp}\]\[%{LOGLEVEL:loglevel} \]\[%{JAVACLASS:class}.*\] %{GREEDYDATA:logmessage}" }
    }
    date {
        match => ["tstamp", "ISO8601"]
    }
    mutate {
        remove_field => [ "tstamp" ]
    }
}

output {
    elasticsearch {
        hosts => ["host1.mydomain:9200", "host2.mydomain:9200", "host3.mydomain:9200"]
        index => "logstash_logs-%{+YYYY.MM.dd}"
        user => "logstash_logs_user"
        password => "MySecretPassword"
        ssl => true
        cacert => "/usr/share/logstash/config/certs/cacert.pem"
    }
}

It is very simple: it checks for files in my log directory. When it finds one, it parse the content a little bit to extract the timestamp, the log level, the class and the message. It also replace the timestamp with the one available in the logs (if not set, Logstash would use the timestamp of the current time when reading the log). These logs are then sent to the Elasticsearch cluster in the logstash_logs-(+ date) index.

A few notes:

  • You can also use Filebeat for this purpose. I did not because it would require adding even more containers. The user is also able to disable the pipeline (just comment the input section).
  • My grok filter is not perfect, it does not manage multilines logging but fits my needs.
  • If you do not have an X-Pack license, you can still use this pipeline except you won’t be able to configure it on Kibana
  • A field named “host” is added by Logstash automatically. You can use this field to filter if you got multiple Logstash instances

Results on Kibana

We can now use Kibana to see the logs of our Logstash instances.

We can use the “Discover” tab

Logs using discovers tab

Or the very convenient “Logs” tab. It’s like we are using tail -f on the file!

Logs using logs tab

Share this article

Canada - Morocco - France

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.

Support Ukrain