How to Ingest Nginx Access Logs to Elasticsearch using Filebeat and Logstash

In this post we will setup a Pipeline that will use Filebeat to ship our Nginx Web Servers Access Logs into Logstash, which will filter our data according to a defined pattern, which also includes Maxmind's GeoIP, and then will be pushed to Elasticsearch.

Our Environment:

Note that all commands are run as root. In this environment, our resources will look like the following:

filebeat/nginx -> 10.21.5.120
logstash       -> 10.21.5.5
elasticsearch  -> 10.21.5.190

Elasticsearch:

If you don't have Elasticsearch running yet, a post on the deployment of Elasticsearch can be found here

Prepare the Logstash Environment:

Get the repositories:

$ apt update && apt upgrade -y
$ apt install wget apt-transport-https gunzip -y
$ wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | apt-key add -
$ echo "deb https://artifacts.elastic.co/packages/5.x/apt stable main" | tee -a /etc/apt/sources.list.d/elastic-5.x.list
$ apt update 

Get the dependencies and install Logstash:

$ apt install openjdk-8-jdk -y
$ echo JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64" >> /etc/environment
$ source /etc/environment
$ apt install logstash -y

Backup Logstash Config:

$ mkdir /opt/backups/logstash -p
$ mv /etc/logstash/logstash.yml /opt/backups/logstash/logstash.yml.BAK

Get the Latest GeoIP Databases:

$ cd /etc/logstash/
$ wget http://geolite.maxmind.com/download/geoip/database/GeoLite2-City.mmdb.gz
$ gunzip GeoLite2-City.mmdb.gz

Setup Logstash Main Config:

cat > /etc/logstash/logstash.yml << EOF
path.data: /var/lib/logstash
path.config: /etc/logstash/conf.d
path.logs: /var/log/logstash
EOF

Configure Logstash Application Config:

cat > /etc/logstash/conf.d/logstash-nginx-es.conf << EOF
input {
    beats {
        host => "0.0.0.0"
        port => 5400
    }
}

filter {
 grok {
   match => [ "message" , "%{COMBINEDAPACHELOG}+%{GREEDYDATA:extra_fields}"]
   overwrite => [ "message" ]
 }
 mutate {
   convert => ["response", "integer"]
   convert => ["bytes", "integer"]
   convert => ["responsetime", "float"]
 }
 geoip {
   source => "clientip"
   target => "geoip"
   add_tag => [ "nginx-geoip" ]
 }
 date {
   match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
   remove_field => [ "timestamp" ]
 }
 useragent {
   source => "agent"
 }
}

output {
 elasticsearch {
   hosts => ["10.21.5.190:9200"]
   index => "weblogs-%{+YYYY.MM.dd}"
   document_type => "nginx_logs"
 }
 stdout { codec => rubydebug }
}
EOF

Enable Logstash on Boot and Start Logstash:

$ systemctl enable logstash
$ systemctl restart logstash

Prepare Filebeat:

Filebeat is a lightweight log shipper, which will reside on the same instance as the Nginx Web Server(s):

$ apt update && apt upgrade -y
$ apt install wget apt-transport-https -y

Setup Nginx Web Server:

$ apt install nginx -y
$ systemctl enable nginx
$ systemctl restart nginx

Get the repositories:

$ wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
$ echo "deb https://artifacts.elastic.co/packages/5.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-5.x.list

Setup the Dependencies:

$ apt update 
$ apt install openjdk-8-jdk -y
$ echo JAVA_HOME=$(find /usr/lib/jvm/ -name "*openjdk*" -type d) >> /etc/environment
$ source /etc/environment

Install Filebeat:

$ apt install filebeat -y

Backup Filebeat configuration:

$ mkdir /opt/backups/filebeat -p
$ mv /etc/filebeat/filebeat.yml /opt/backups/filebeat/filebeat.yml.BAK

Create the Filebeat configuration, and specify the Logstash outputs:

$ cat > /etc/filebeat/filebeat.yml << EOF
filebeat.prospectors:
- input_type: log
  paths:
    - /var/log/nginx/*.log
  exclude_files: ['\.gz$']

output.logstash:
  hosts: ["10.21.5.5:5400"]
EOF

Enable Filebeat on Boot and Start Filebeat:

$ systemctl enable filebeat
$ systemctl restart filebeat

Testing:

While Nginx, Logstash, Filebeat and Elasticsearch is running, we can test our deployment by accessing our Nginx Web Server, we left the defaults "as-is" so we will expect the default page to respond, which is fine.

But before, accessing your web server, tail your logs:

$ tail -f /var/log/nginx/access.log /var/log/filebeat/filebeat

Now access your Web Server:

==> /var/log/nginx/access.log <==
165.1.2.3 - - [06/Jun/2017:21:53:35 +0000] "GET / HTTP/1.1" 200 396 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"

==> /var/log/filebeat/filebeat <==
2017-06-06T21:53:43Z INFO Non-zero metrics in the last 30s: libbeat.logstash.call_count.PublishEvents=1 libbeat.logstash.publish.read_bytes=6 libbeat.logstash.publish.write_bytes=464 libbeat.logstash.published_and_acked_events=2 libbeat.publisher.published_events=2 publish.events=2 registrar.states.update=2 registrar.writes=1

Having a look at Elasticsearch:

$ curl http://10.21.5.190:9200/_cat/indices?v
health status index                               uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   weblogs-2017.06.06                  KwfrPYnsRmiQ8EvrpHJ1-g   5   1          6            0    286.6kb        130.4kb

Search our index to retrieve details about our document:

$ curl -XGET http://10.21.5.190:9200/weblogs-2017.06.06/_search?pretty
{
  "took" : 68,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 6,
    "max_score" : 1.0,
    "hits" : [
    {
      "_index" : "weblogs-2017.06.06",
      "_type" : "nginx_logs",
      "_id" : "AVx_ZiLN-RV1_0gc9l6o",
      "_score" : 1.0,
      "_source" : {
        "request" : "/",
        "agent" : "\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36\"",
        "minor" : "0",
        "auth" : "-",
        "ident" : "-",
        "source" : "/var/log/nginx/access.log",
        "type" : "log",
        "patch" : "3029",
        "major" : "58",
        "clientip" : "165.1.2.3",
        "@version" : "1",
        "beat" : {
          "hostname" : "nginx-web-01",
          "name" : "nginx-web-01",
          "version" : "5.4.1"
        },
        "host" : "nginx-web-01",
        "geoip" : {
          "timezone" : "Africa/Johannesburg",
          "ip" : "165.1.2.3",
          "latitude" : -33.935462,
          "continent_code" : "AF",
          "city_name" : "Cape Town",
          "country_code2" : "ZA",
          "country_name" : "South Africa",
          "country_code3" : "ZA",
          "region_name" : "Province of the Western Cape",
          "location" : [
            18.377256,
            -33.935462
          ],
          "postal_code" : "7945",
          "longitude" : 18.377256,
          "region_code" : "WC"
        },
        "offset" : 196,
        "os" : "Windows 10",
        "input_type" : "log",
        "verb" : "GET",
        "message" : "165.1.2.3 - - [06/Jun/2017:21:53:35 +0000] \"GET / HTTP/1.1\" 200 396 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36\"",
        "tags" : [
          "beats_input_codec_plain_applied",
          "nginx-geoip"
        ],
        "referrer" : "\"-\"",
        "@timestamp" : "2017-06-06T21:53:35.000Z",
        "response" : 200,
        "bytes" : 396,
        "name" : "Chrome",
        "os_name" : "Windows 10",
        "httpversion" : "1.1",
        "device" : "Other"
      }
    ]
  }
}

And when we take the co-ordinates, and place them into google, we can see, that I am chilling at the beach! :)

Google Maps:

Elastic, most definitely have their game on, when it comes to awesome software! You can further visualize this, by adding Kibana to your stack.