/ Linux

Unable to Launch EMR Clusters with Cronjobs

The Issue:

I was trying to Schedule the Launch of EMR Clusters using Cronjobs where I have noticed that it fails via cronjobs but passes via cli.

The reasoning behind this in my experience was that the PATH variable that crontab uses did not include the PATH where the aws binary was located. I resolved this, by setting the PATH variable in my EMR Script.

Identifying the Issue:

I scheduled a cronjob to dump the environment variables to a file, so that I can inspect what the cronjob are aware of, the cronjob looked like:

* * * * * env > /home/ruan/envs.txt

and the output showed:


From the command line, when I located the PATH of the aws binary:

$ which aws

I determined that the location is excluded from the PATH that the cronjob is dealing with.

Resolving the Issue:

I resolved this issue by setting my PATH in my script, first by getting my current value for my PATH variable:

$ echo $PATH

And then by setting my PATH variable in my EMR Cluster creation script:

$ cat /home/ruan/scripts/create-emr.sh


aws emr create-cluster --release-label emr-5.3.0 --instance-type m3.xlarge --instance-count 2 --applications Name=Hive --configurations file:///home/ruan/conf/hive-config.json --use-default-roles --ec2-attributes '{"KeyName":"mykey"}' --log-uri 's3://bucketname/logs/'

I then added to my cronjob:

50 11 * * * /home/ruan/scripts/create-emr.sh 2> /home/ruan/emr.err 1> /home/ruan/emr.log

And then I was able to verify the output:

$ cat /home/ruan/emr.log
    "ClusterId": "j-12345ABCDEF"

There are different ways to approach this, but this way helped me.