The Issue:
I was trying to Schedule the Launch of EMR Clusters using Cronjobs where I have noticed that it fails via cronjobs but passes via cli.
The reasoning behind this in my experience was that the PATH
variable that crontab uses did not include the PATH
where the aws
binary was located. I resolved this, by setting the PATH variable in my EMR Script.
Identifying the Issue:
I scheduled a cronjob to dump the environment variables to a file, so that I can inspect what the cronjob are aware of, the cronjob looked like:
* * * * * env > /home/ruan/envs.txt
and the output showed:
LANGUAGE=en_US:
HOME=/home/ruan
LOGNAME=ruan
PATH=/usr/bin:/bin
LANG=en_US.UTF-8
SHELL=/bin/sh
PWD=/home/ruan
From the command line, when I located the PATH
of the aws
binary:
$ which aws
/usr/local/bin/aws
I determined that the location is excluded from the PATH
that the cronjob is dealing with.
Resolving the Issue:
I resolved this issue by setting my PATH
in my script, first by getting my current value for my PATH
variable:
$ echo $PATH
/usr/local/bin:/usr/bin:/bin
And then by setting my PATH
variable in my EMR Cluster creation script:
$ cat /home/ruan/scripts/create-emr.sh
#!/bin/bash
PATH=/usr/local/bin:/usr/bin:/bin
aws emr create-cluster --release-label emr-5.3.0 --instance-type m3.xlarge --instance-count 2 --applications Name=Hive --configurations file:///home/ruan/conf/hive-config.json --use-default-roles --ec2-attributes '{"KeyName":"mykey"}' --log-uri 's3://bucketname/logs/'
I then added to my cronjob:
50 11 * * * /home/ruan/scripts/create-emr.sh 2> /home/ruan/emr.err 1> /home/ruan/emr.log
And then I was able to verify the output:
$ cat /home/ruan/emr.log
{
"ClusterId": "j-12345ABCDEF"
}
There are different ways to approach this, but this way helped me.
Comments