AWS: Import CSV Data from S3 to DynamoDB
When running a AWS EMR Cluster, you can import CSV data that is located on S3 to DynamoDB, using Hive.
Our sample data has the following structure "id"
, "movie name"
, "director name"
, "release status"
:
# movies.csv
01,movie1,james may,anne rose,released
02,movie2,andy smith,floyd rose,released
03,movie3,george good,ben james,released
04,movie4,mike james,anne goode,released
This CSV file, sits on S3, for example:
"s3://bucketname/path/to/csvdata/"
We will log onto the master node of our EMR Cluster, logon to Hive. Then from there we will create 2 external tables, a dynamodb_tbl
and a s3_tbl
:
# dynamodb-tbl
hive> CREATE EXTERNAL TABLE ddb_tbl_movies (
id STRING,
movie STRING,
actor STRING,
director STRING,
status STRING )
STORED BY
'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES (
"dynamodb.table.name" = "moviedb",
"dynamodb.column.mapping" =
"id:id,movie:movie,actor:actor,director:director,status:status"
);
# s3-tbl
hive> CREATE EXTERNAL TABLE s3_table_movies (
id STRING,
movie STRING,
actor STRING,
director STRING,
status STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION "s3://bucketname/path/to/csvdata/"
TBLPROPERTIES ("serialization.null.format"="");
From here, we can see the mapped data from S3:
hive> select * from s3_table_movies;
OK
01 movie1 james may anne rose released
02 movie2 andy smith floyd rose released
03 movie3 george good ben james released
04 movie4 mike james anne goode released
Now, we can see the DynamoDB table is still empty:
hive> select * from ddb_tbl_movies;
OK
Now we will insert data from S3 to DynamoDB
hive> INSERT INTO TABLE ddb_tbl_movies select * from s3_table_movies;
Launching Job 1 out of 1
...
MapReduce Total cumulative CPU time: 6 seconds 900 msec
Total MapReduce CPU Time Spent: 6 seconds 900 msec
OK
Time taken: 40.732 seconds
Verify the data:
hive> select * from ddb_tbl_movies;
OK
04 movie4 mike james anne goode released
02 movie2 andy smith floyd rose released
01 movie1 james may anne rose released
03 movie3 george good ben james released
Now your data has been imported to DynamoDB