SysAdmins | Linux Tutorials
  • Home
  • About Me
  • AWS
  • Linux
  • Docker
  • BigData
  • Github
  • Subscribe
Subscribe

BigData

A collection of 10 posts

BigData

Amazon EMR Performance Comparison dealing with Hadoops SmallFiles Problem

Today I would like to have a dive into Job Performance with Hadoop, running on the Managed Hadoop Framework of Amazon Web Services, which is Elastic MapReduce (EMR). Hadoop does not deal well

Ruan Bekker
BigData

Removing the Hive Metastore Password from hive-site.xml on EMR

With Hive's Metastore config, we have an entry that hosts your password to authenticate against your metastore database. This password is saved in clear-text, which looks like this: <property> <name&

Ruan Bekker
AWS

AWS: Create EMR Cluster with Java SDK Examples

Today, providing some basic examples on creating a EMR Cluster and adding steps to the cluster with the AWS Java SDK. This tutorial will show how to create an EMR Cluster in eu-west-1

Ruan Bekker
BigData

Generating Sensible Transaction Data with Python

The other day, I was facing a scenario where I had to setup bucketing with Hive and I needed some sample data, but in the same way I thought it would've been nice

Ruan Bekker
BigData

Spark: PySpark Examples

Example 1: Top 3 Occurrences: In this tutorial we will generate 400,000 lines of data that consists of Name,Country,JobTitle Then we have a scenario where we would like to find

Ruan Bekker
BigData

Setup PIG on Hadoop YARN Cluster

This is part 4 of our Big Data Cluster Setup. From our Previous Post I was going through the steps on setting up Spark on your Hadoop Cluster. In this tutorial, we will

Ruan Bekker
BigData

Setup Spark Cluster on Hadoop YARN

This is part 3 of our Big Data Cluster Setup. From our Previous Post I was going through the steps on getting your Hadoop Cluster up and running. In this tutorial, we will

Ruan Bekker
BigData

Setup Hive on Hadoop YARN Cluster

This is part 2 of our Big Data Cluster Setup. From our Previous Post I was going through the steps on getting your Hadoop Cluster up and running. In this tutorial, we will

Ruan Bekker
AWS

AWS: Import CSV Data from S3 to DynamoDB

When running a AWS EMR Cluster, you can import CSV data that is located on S3 to DynamoDB, using Hive. Our sample data has the following structure "id", "movie name&

Ruan Bekker
BigData

Setup Hadoop 2.7 MultiNode Cluster on Ubuntu

We will setup a 4 Node Hadoop Cluster using Hadoop 2.7.1 and Ubuntu 14.04. Our cluster will consist of: Ubuntu 14.04 Hadoop 2.7.1 HDFS 1 Master Node

Ruan Bekker
SysAdmins | Linux Tutorials © 2018
Latest Posts Facebook Twitter Ghost

Subscribe to SysAdmins | Linux Tutorials

Stay up to date! Get all the latest & greatest posts delivered straight to your inbox