BigData Amazon EMR Performance Comparison dealing with Hadoops SmallFiles Problem Today I would like to have a dive into Job Performance with Hadoop, running on the Managed Hadoop Framework of Amazon Web Services, which is Elastic MapReduce (EMR). Hadoop does not deal well
BigData Removing the Hive Metastore Password from hive-site.xml on EMR With Hive's Metastore config, we have an entry that hosts your password to authenticate against your metastore database. This password is saved in clear-text, which looks like this: <property> <name&
AWS AWS: Create EMR Cluster with Java SDK Examples Today, providing some basic examples on creating a EMR Cluster and adding steps to the cluster with the AWS Java SDK. This tutorial will show how to create an EMR Cluster in eu-west-1
BigData Generating Sensible Transaction Data with Python The other day, I was facing a scenario where I had to setup bucketing with Hive and I needed some sample data, but in the same way I thought it would've been nice
BigData Spark: PySpark Examples Example 1: Top 3 Occurrences: In this tutorial we will generate 400,000 lines of data that consists of Name,Country,JobTitle Then we have a scenario where we would like to find
BigData Setup PIG on Hadoop YARN Cluster This is part 4 of our Big Data Cluster Setup. From our Previous Post I was going through the steps on setting up Spark on your Hadoop Cluster. In this tutorial, we will
BigData Setup Spark Cluster on Hadoop YARN This is part 3 of our Big Data Cluster Setup. From our Previous Post I was going through the steps on getting your Hadoop Cluster up and running. In this tutorial, we will
BigData Setup Hive on Hadoop YARN Cluster This is part 2 of our Big Data Cluster Setup. From our Previous Post I was going through the steps on getting your Hadoop Cluster up and running. In this tutorial, we will
AWS AWS: Import CSV Data from S3 to DynamoDB When running a AWS EMR Cluster, you can import CSV data that is located on S3 to DynamoDB, using Hive. Our sample data has the following structure "id", "movie name&
BigData Setup Hadoop 2.7 MultiNode Cluster on Ubuntu We will setup a 4 Node Hadoop Cluster using Hadoop 2.7.1 and Ubuntu 14.04. Our cluster will consist of: Ubuntu 14.04 Hadoop 2.7.1 HDFS 1 Master Node