July 15, 2016

Setup PIG on Hadoop YARN Cluster

BigData Pig

pig

This is part 4 of our Big Data Cluster Setup.

From our Previous Post I was going through the steps on setting up Spark on your Hadoop Cluster.

In this tutorial, we will setup Apache Pig, on top of the Hadoop Ecosystem.

Our cluster will consist of:

Ubuntu 14.04
Hadoop 2.7.1
1 Master Node
3 Slave Nodes

After we have setup Pig we will also run a Pig example.

This Big Data Series will cover:

Setup Hadoop and HDFS
Setup Hive
Setup Spark
Setup Pig
Example PySpark Application
~~Example Scala Application~~ (Coming Soon)

~~Setup Hive and Pig~~ (Coming Soon)
~~Setup Presto~~ (Coming Soon)
~~Setup Impala~~ (Coming Soon)

Lets get started with our setup:

Setup Pig:

$ wget http://apache.claz.org/pig/pig-0.15.0/pig-0.15.0.tar.gz
$ wget http://apache.claz.org/pig/pig-0.15.0/pig-0.15.0-src.tar.gz
$ mkdir /usr/local/pig
$ sudo chown hadoop:hadoop /usr/local/pig -R
$ tar -xf pig-0.15.0.tar.gz -C /usr/local/pig/
$ tar -xf pig-0.15.0-src.tar.gz -C /usr/local/pig/
``` <p>

**Setup Environment Variables:**

```language-bash
vi ~/.profile
``` <p>

```language-bash
export PIG_HOME=/usr/local/pig
export PATH=$PATH:$PIG_HOME/bin
export PIG_CLASSPATH=$HADOOP_HOME/etc/hadoop
``` <p>

**Pig Properties:**

View supported porperties for the config /usr/local/pig/conf/pig.properties bu executing:

```language-bash
$ pig -h properties
``` <p>

**Verify Installation:**

```language-bash
pig –version
``` <p>

**Basic Pig Script:**

Let's test our installation by running a very basic pig script.

Generate the data:

```language-bash
cat > file.txt << EOF
1,ruan,bekker,za
2,peter,james,ned
3,james,smith,usa
EOF
``` <p>

Push the data to HDFS:

```language-bash
hdfs dfs -mkdir -p /pig/input
hdfs dfs -copyFromLocal file.txt /pig/input/
``` <p>

Create the Pig script:

```language-bash
cat > script.pig << EOF
dsetLOAD = LOAD '/pig/input/' USING PigStorage(',') AS (id:int, name:chararray, surname:chararray, location:chararray);
dsetGENERATE = FOREACH dsetLOAD GENERATE id, name, surname, location;
DUMP dsetGENERATE;
EOF
``` <p>

==Note==: You can run the Pig script in 3 modes:

`Local:`
```language-bash
$ pig -x local file.pig
``` <p>
`HDFS:`

```language-bash
$ pig file.pig
``` <p>

`TEZ:`

```language-bash
$ pig -x tez file.pig
``` <p>

**Executing the Script:**

We will run our script in HDFS mode since our data resides in HDFS:

```language-bash
$ pig script.pig
``` <p>

Setup PIG on Hadoop YARN Cluster

Comments

Read Next

Setup Kibana Dashboards for Nginx log Analysis

Amazon EMR Performance Comparison dealing with Hadoops SmallFiles Problem

Comments

Subscribe to Sysadmins

Read Next

Tags