How to Ingest Nginx Access Logs to Elasticsearch using Filebeat and Logstash
In this post we will setup a Pipeline that will use Filebeat to ship our Nginx Web Servers Access Logs into Logstash, which will filter our data according to a defined pattern, which also includes Maxmind's GeoIP, and then will be pushed to Elasticsearch.
Our Environment:
Note that all commands are run as root. In this environment, our resources will look like the following:
filebeat/nginx -> 10.21.5.120
logstash -> 10.21.5.5
elasticsearch -> 10.21.5.190
Elasticsearch:
If you don't have Elasticsearch running yet, a post on the deployment of Elasticsearch can be found here
Prepare the Logstash Environment:
Get the repositories:
$ apt update && apt upgrade -y
$ apt install wget apt-transport-https gunzip -y
$ wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | apt-key add -
$ echo "deb https://artifacts.elastic.co/packages/5.x/apt stable main" | tee -a /etc/apt/sources.list.d/elastic-5.x.list
$ apt update
Get the dependencies and install Logstash:
$ apt install openjdk-8-jdk -y
$ echo JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64" >> /etc/environment
$ source /etc/environment
$ apt install logstash -y
Backup Logstash Config:
$ mkdir /opt/backups/logstash -p
$ mv /etc/logstash/logstash.yml /opt/backups/logstash/logstash.yml.BAK
Get the Latest GeoIP Databases:
$ cd /etc/logstash/
$ wget http://geolite.maxmind.com/download/geoip/database/GeoLite2-City.mmdb.gz
$ gunzip GeoLite2-City.mmdb.gz
Setup Logstash Main Config:
cat > /etc/logstash/logstash.yml << EOF
path.data: /var/lib/logstash
path.config: /etc/logstash/conf.d
path.logs: /var/log/logstash
EOF
Configure Logstash Application Config:
cat > /etc/logstash/conf.d/logstash-nginx-es.conf << EOF
input {
beats {
host => "0.0.0.0"
port => 5400
}
}
filter {
grok {
match => [ "message" , "%{COMBINEDAPACHELOG}+%{GREEDYDATA:extra_fields}"]
overwrite => [ "message" ]
}
mutate {
convert => ["response", "integer"]
convert => ["bytes", "integer"]
convert => ["responsetime", "float"]
}
geoip {
source => "clientip"
target => "geoip"
add_tag => [ "nginx-geoip" ]
}
date {
match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
remove_field => [ "timestamp" ]
}
useragent {
source => "agent"
}
}
output {
elasticsearch {
hosts => ["10.21.5.190:9200"]
index => "weblogs-%{+YYYY.MM.dd}"
document_type => "nginx_logs"
}
stdout { codec => rubydebug }
}
EOF
Enable Logstash on Boot and Start Logstash:
$ systemctl enable logstash
$ systemctl restart logstash
Prepare Filebeat:
Filebeat is a lightweight log shipper, which will reside on the same instance as the Nginx Web Server(s):
$ apt update && apt upgrade -y
$ apt install wget apt-transport-https -y
Setup Nginx Web Server:
$ apt install nginx -y
$ systemctl enable nginx
$ systemctl restart nginx
Get the repositories:
$ wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
$ echo "deb https://artifacts.elastic.co/packages/5.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-5.x.list
Setup the Dependencies:
$ apt update
$ apt install openjdk-8-jdk -y
$ echo JAVA_HOME=$(find /usr/lib/jvm/ -name "*openjdk*" -type d) >> /etc/environment
$ source /etc/environment
Install Filebeat:
$ apt install filebeat -y
Backup Filebeat configuration:
$ mkdir /opt/backups/filebeat -p
$ mv /etc/filebeat/filebeat.yml /opt/backups/filebeat/filebeat.yml.BAK
Create the Filebeat configuration, and specify the Logstash outputs:
$ cat > /etc/filebeat/filebeat.yml << EOF
filebeat.prospectors:
- input_type: log
paths:
- /var/log/nginx/*.log
exclude_files: ['\.gz$']
output.logstash:
hosts: ["10.21.5.5:5400"]
EOF
Enable Filebeat on Boot and Start Filebeat:
$ systemctl enable filebeat
$ systemctl restart filebeat
Testing:
While Nginx, Logstash, Filebeat and Elasticsearch is running, we can test our deployment by accessing our Nginx Web Server, we left the defaults "as-is" so we will expect the default page to respond, which is fine.
But before, accessing your web server, tail your logs:
$ tail -f /var/log/nginx/access.log /var/log/filebeat/filebeat
Now access your Web Server:
==> /var/log/nginx/access.log <==
165.1.2.3 - - [06/Jun/2017:21:53:35 +0000] "GET / HTTP/1.1" 200 396 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
==> /var/log/filebeat/filebeat <==
2017-06-06T21:53:43Z INFO Non-zero metrics in the last 30s: libbeat.logstash.call_count.PublishEvents=1 libbeat.logstash.publish.read_bytes=6 libbeat.logstash.publish.write_bytes=464 libbeat.logstash.published_and_acked_events=2 libbeat.publisher.published_events=2 publish.events=2 registrar.states.update=2 registrar.writes=1
Having a look at Elasticsearch:
$ curl http://10.21.5.190:9200/_cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open weblogs-2017.06.06 KwfrPYnsRmiQ8EvrpHJ1-g 5 1 6 0 286.6kb 130.4kb
Search our index to retrieve details about our document:
$ curl -XGET http://10.21.5.190:9200/weblogs-2017.06.06/_search?pretty
{
"took" : 68,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 6,
"max_score" : 1.0,
"hits" : [
{
"_index" : "weblogs-2017.06.06",
"_type" : "nginx_logs",
"_id" : "AVx_ZiLN-RV1_0gc9l6o",
"_score" : 1.0,
"_source" : {
"request" : "/",
"agent" : "\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36\"",
"minor" : "0",
"auth" : "-",
"ident" : "-",
"source" : "/var/log/nginx/access.log",
"type" : "log",
"patch" : "3029",
"major" : "58",
"clientip" : "165.1.2.3",
"@version" : "1",
"beat" : {
"hostname" : "nginx-web-01",
"name" : "nginx-web-01",
"version" : "5.4.1"
},
"host" : "nginx-web-01",
"geoip" : {
"timezone" : "Africa/Johannesburg",
"ip" : "165.1.2.3",
"latitude" : -33.935462,
"continent_code" : "AF",
"city_name" : "Cape Town",
"country_code2" : "ZA",
"country_name" : "South Africa",
"country_code3" : "ZA",
"region_name" : "Province of the Western Cape",
"location" : [
18.377256,
-33.935462
],
"postal_code" : "7945",
"longitude" : 18.377256,
"region_code" : "WC"
},
"offset" : 196,
"os" : "Windows 10",
"input_type" : "log",
"verb" : "GET",
"message" : "165.1.2.3 - - [06/Jun/2017:21:53:35 +0000] \"GET / HTTP/1.1\" 200 396 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36\"",
"tags" : [
"beats_input_codec_plain_applied",
"nginx-geoip"
],
"referrer" : "\"-\"",
"@timestamp" : "2017-06-06T21:53:35.000Z",
"response" : 200,
"bytes" : 396,
"name" : "Chrome",
"os_name" : "Windows 10",
"httpversion" : "1.1",
"device" : "Other"
}
]
}
}
And when we take the co-ordinates, and place them into google, we can see, that I am chilling at the beach! :)
Elastic, most definitely have their game on, when it comes to awesome software! You can further visualize this, by adding Kibana to your stack.