BigData - Sysadmins

April 10, 2019

Setup Kibana Dashboards for Nginx log Analysis

In this tutorial we will setup a Basic Kibana Dashboard for a Web Server that is running a Blog on Nginx. What do we want to achieve? We will setup common visualizations to give us an idea on how our

Kibana Elasticsearch BigData Analytics Filebeat

July 22, 2017

Amazon EMR Performance Comparison dealing with Hadoops SmallFiles Problem

Today I would like to have a dive into Job Performance with Hadoop, running on the Managed Hadoop Framework of Amazon Web Services, which is Elastic MapReduce (EMR). Hadoop does not deal well with lots of small files, and I

BigData Hadoop EMR AWS S3DistCp Performance

March 6, 2017

Removing the Hive Metastore Password from hive-site.xml on EMR

With Hive's Metastore config, we have an entry that hosts your password to authenticate against your metastore database. This password is saved in clear-text, which looks like this: <property> <name>javax.jdo.option.ConnectionPassword&

BigData Hive Security

February 16, 2017

AWS: Create EMR Cluster with Java SDK Examples

Today, providing some basic examples on creating a EMR Cluster and adding steps to the cluster with the AWS Java SDK. This tutorial will show how to create an EMR Cluster in eu-west-1 with 1x m3.xlarge Master Node and

AWS BigData Hadoop EMR Java

December 21, 2016

Generating Sensible Transaction Data with Python

The other day, I was facing a scenario where I had to setup bucketing with Hive and I needed some sample data, but in the same way I thought it would've been nice to have some random data,

BigData Python

July 15, 2016

Spark: PySpark Examples

Example 1: Top 3 Occurrences: In this tutorial we will generate 400,000 lines of data that consists of Name,Country,JobTitle Then we have a scenario where we would like to find out the Top 3 Occurences from our

BigData Spark PySpark

July 15, 2016

Setup PIG on Hadoop YARN Cluster

This is part 4 of our Big Data Cluster Setup. From our Previous Post [http://sysadmins.co.za/setup-spark-cluster-on-hadoop-yarn-2-7-0/] I was going through the steps on setting up Spark on your Hadoop Cluster. In this tutorial, we will setup Apache

BigData Pig

May 29, 2016

Setup Spark Cluster on Hadoop YARN

This is part 3 of our Big Data Cluster Setup. From our Previous Post [http://sysadmins.co.za/setup-hadoop-2-7-multinode-cluster-on-ubuntu/] I was going through the steps on getting your Hadoop Cluster up and running. In this tutorial, we will setup Apache

BigData Spark

Subscribe to Sysadmins