Step by Step Guide using the DynamoDB Client interface using Boto3 and Python
In this demonstration I will be using the client interface on Boto3 with Python to work with DynamoDB.
The main idea is to have a step by step guide to show you how to Write, Read and Query from DynamoDB.
With this demonstration we have a DynamoDB table that will host our data about game scores. These entries in our table will consist of their event names, gamerid's, location, scores, timestamp etc.
Then we will query on gamers, scores, ages and times.
We will go through these actions:
- Creating a DynamoDB Table
- Writing / Reading / Deleting / Updating Data
- Querying Data
- Querying Non Key Attributes with a Global Secondary Index
- Deleting the Table
Interacting with DynamoDB
First we need to instantiating the client. As you can see I have a profile configured with the name dev
and I will be using region eu-west-1
:
>>> import boto3
>>> client = boto3.Session(region_name='eu-west-1', profile_name='dev').client('dynamodb')
After we have instantiated the client, let's go ahead and create our dynamodb table with the event
as the HashKey and timestamp
as the RangeKey.
For our demonstration our data will consist of the event the competed in, the finishing time, and the rest of the attributes will consist of the game information.
Creating the DynamoDB Table
Create the dynamodb table with 1 Write / 1 Read IOP (for demonstration):
>>> response = client.create_table(
AttributeDefinitions=[{
'AttributeName': 'event',
'AttributeType': 'S'
},
{
'AttributeName': 'timestamp',
'AttributeType': 'S'
}],
TableName='gamescores',
KeySchema=[{
'AttributeName': 'event',
'KeyType': 'HASH'
},
{
'AttributeName': 'timestamp',
'KeyType': 'RANGE'
}],
ProvisionedThroughput={
'ReadCapacityUnits': 1,
'WriteCapacityUnits': 10
}
)
While the table is busy creating, go ahead and tag your resource:
>>> response = client.tag_resource(
ResourceArn=response['TableDescription']['TableArn'],
Tags=[
{'Key': 'Name', 'Value': 'gamescores'},
{'Key': 'Environment', 'Value': 'dev'}
]
)
Basic Operations:
Once the resource has been tagged, lets write our first item to our dynamodb table. In this example we will see that the data consist of the event, time, score, game and details about the gamer:
>>> response = client.put_item(
TableName='gamescores',
Item={
'event': {'S': 'gaming_nationals_zaf'},
'timestamp': {'S': '2019-02-08T14:53'},
'score': {'N': '11885'},
'name': {'S': 'will'},
'gamerid': {'S': 'wilson9335'},
'game': {'S': 'counter strike'},
'age': {'N': '27'},
'rank': {'S': 'professional'},
'location': {'S': 'sweden'}
}
)
After the item has been written to the dynamodb table, execute a get item on the table and make sure to set your hashkey and sortkey:
>>> response = client.get_item(
Key={
'event': {'S': 'gaming_nationals_zaf'},
'timestamp': {'S': '2019-02-08T14:53'}
},
TableName='gamescores'
)
>>> response['Item']
{u'name': {u'S': u'will'}, u'gamerid': {u'S': u'wilson9335'}, u'timestamp': {u'S': u'2019-02-08T14:53'}, u'age': {u'N': u'27'}, u'rank': {u'S': u'professional'}, u'score': {u'N': u'11885'}, u'location': {u'S': u'sweden'}, u'event': {u'S': u'gaming_nationals_zaf'}}
Let's say one of the games would like to update their gamerid:
>>> response = client.update_item(
TableName='gamescores',
Key={
'event': {'S': 'gaming_nationals_zaf'},
'timestamp': {'S': '2019-02-08T14:53'}
},
AttributeUpdates={
'gamerid': {'Value': {'S': 'willx9335'}}
}
)
To verify that the item has been updated, lets execute a get item on our table:
>>> response = client.get_item(
Key={
'event': {'S': 'gaming_nationals_zaf'},
'timestamp': {'S': '2019-02-08T14:53'}
},
TableName='gamescores'
)
>>> response['Item']
{u'name': {u'S': u'will'}, u'gamerid': {u'S': u'willx9335'}, u'timestamp': {u'S': u'2019-02-08T14:53'}, u'age': {u'N': u'27'}, u'rank': {u'S': u'professional'}, u'score': {u'N': u'11885'}, u'location': {u'S': u'sweden'}, u'event': {u'S': u'gaming_nationals_zaf'}}
>>> response['Item']['gamerid']
{u'S': u'willx9335'}
To delete the item from our dynamodb table:
>>> response = client.delete_item(
Key={
'event': {'S': 'gaming_nationals_zaf'},
'timestamp': {'S': '2019-02-08T14:53'}
},
TableName='gamescores'
)
A scan operation is expensive as it will scan the whole table and return every single item that resides in the table. Since we have no data in our table, execute a scan operation:
>>> response = client.scan(TableName='gamescores')
>>> response
{u'Count': 0, u'Items': [], u'ScannedCount': 0, ..}
Querying:
Go ahead and generate some dummy data to your table (Details at the bottom or grab the script from my <strong><a href="https://gist.github.com/ruanbekker/c9c5d92b921b16ec5cabdf7625812f12">github</a></strong> repo). In my case I had 5 events where 16 contestants participated. We would like to query our data based on event and timestamp less than a specified time.
In this example we will get all the data for one event as the time specified was the disqualification time. In this case everyone completed their games before this time:
>>> response = client.query(
TableName='gamescores',
KeyConditionExpression="#S = :event_name AND #T < :time_stamp",
ExpressionAttributeNames={
"#S": "event", "#T": "timestamp"
},
ExpressionAttributeValues={
":event_name": {"S": "gaming_nationals_round_01"},
":time_stamp": {"S": "2019-02-08T23:59:59.999999Z"}
}
)
>>> response['Count']
16
As you can see all 16 entries was returned. Let's reduce the time and see who completed their games for the event gaming_nationals_round_01
before 2019-02-08 15:56:08
>>> response = client.query(
TableName='gamescores',
KeyConditionExpression="#S = :event_name AND #T < :time_stamp",
ExpressionAttributeNames={
"#S": "event", "#T": "timestamp"
},
ExpressionAttributeValues={
":event_name": {"S": "gaming_nationals_round_01"},
":time_stamp": {"S": "2019-02-08T15:56:08.284785Z"}
}
)
>>> response['Count']
5
>>> response['Items']
[{u'name': {u'S': u'rick'}, u'gamerid': {u'S': u'rickmax0901'}, u'timestamp': {u'S': u'2019-02-08T15:56:08.006737Z'}, u'age': {u'N': u'20'}, u'rank': {u'S': u'professional'}, u'score': {u'N': u'17962'}, u'location': {u'S': u'sweden'}, u'event': {u'S': u'gaming_nationals_round_01'}}, {u'name': {u'S': u'adrian'}, u'gamerid': {u'S': u'adriano5519'}, u'timestamp': {u'S': u'2019-02-08T15:56:08.087836Z'}, u'age': {u'N': u'22'}, u'rank': {u'S': ...
Let's use a FilterExpression on top of our Query. This will essentially apply our query, but then use a filter on top of the returned results to filter what we would like to see. Note that the filter executes a scan on the results that was received from the query.
In this case we are querying on the event
and less than a specific timestamp
then apply the filter for the age
older than:
>>> response = client.query(
TableName='gamescores',
KeyConditionExpression="#S = :event_name AND #T < :time_stamp",
FilterExpression="#A > :age_value",
ExpressionAttributeNames={
"#S": "event",
"#T": "timestamp",
"#A": "age"
},
ExpressionAttributeValues={
":event_name": {"S": "gaming_nationals_round_01"},
":time_stamp": {"S": "2019-02-08T15:56:08.284785Z"},
":age_value": {"N": "20"}
}
)
When looking at the response, we can see that we have 4 results:
>>> response['Count']
4
Querying with a Global Secondary Index
Let's say we would like to query based on the gamerid
and score
. As our table has a hashkey on event
and rangekey on timestamp
we can only query on those attributes and not on the ones we want to query.
Sure we can do a scan operation then filter our data, but that might be an expensive call as our data grows bigger.
Creating a Global Secondary Index can help us with that, as a GSI enables us to query on non-key attributes from our table definition.
Let's go ahead and create a GSI:
>>> response = client.update_table(
TableName='gamescores',
AttributeDefinitions=[
{'AttributeName': 'event', 'AttributeType': 'S'},
{'AttributeName': 'timestamp', 'AttributeType': 'S'},
{'AttributeName': 'gamerid', 'AttributeType': 'S'},
{'AttributeName': 'score', 'AttributeType': 'N'}
],
GlobalSecondaryIndexUpdates=[{
'Create': {
'IndexName': 'gamerid_score',
'KeySchema': [{
'AttributeName': 'gamerid',
'KeyType': 'HASH'
},
{
'AttributeName': 'score',
'KeyType': 'RANGE'
}
],
'Projection': {
'ProjectionType': 'ALL'
},
'ProvisionedThroughput': {
'ReadCapacityUnits': 1,
'WriteCapacityUnits': 1
}}
}]
)
Let's query our table for all the events where rickmax0901
scored more than 17500
:
>>> response = client.query(
TableName='gamescores',
IndexName='gamerid_score',
KeyConditionExpression="#G = :gamer_id AND #S > :score_value",
ExpressionAttributeNames={"#G": "gamerid", "#S": "score"},
ExpressionAttributeValues={
":gamer_id": {"S": "rickmax0901"},
":score_value": {"N": "17500"}
}
)
Let's loop through our response to see the events and scores:
>>> for x in response['Items']:
x['event']['S'], x['score']['N']
(u'gaming_nationals_round_01', u'17962')
(u'gaming_nationals_round_05', u'18174')
Let's delete our dynamodb table:
>>> response = client.delete_table(TableName='gamescores')
Below is the script to generate the data:
Script is available on my github repo, or you can follow the example below on how I generate the data.
Building up a dictionary for al the information about our users:
>>> userlists = {}
>>> userlists['john'] = {'id':'johnsnow9801', 'firstname': 'john', 'age': '23', 'location': 'south africa', 'rank': 'professional'}
>>> userlists['max'] = {'id':'maxmilia', 'firstname': 'max', 'age': '24', 'location': 'new zealand', 'rank': 'professional'}
>>> userlists['samantha'] = {'id':'sambubbles8343', 'firstname': 'samantha', 'age': '21', 'location': 'australia', 'rank': 'professional'}
>>> userlists['aubrey'] = {'id':'aubreyxeleven4712', 'firstname': 'aubrey', 'age': '24', 'location': 'america', 'rank': 'professional'}
>>> userlists['mikhayla'] = {'id':'mikkie1419', 'firstname': 'mikhayla', 'age': '21', 'location': 'mexico', 'rank': 'professional'}
>>> userlists['steve'] = {'id':'stevie1119', 'firstname': 'steve', 'age': '25', 'location': 'ireland', 'rank': 'professional'}
>>> userlists['rick'] = {'id':'rickmax0901', 'firstname': 'rick', 'age': '20', 'location': 'sweden', 'rank': 'professional'}
>>> userlists['michael'] = {'id':'mikeshank2849', 'firstname': 'michael', 'age': '26', 'location': 'america', 'rank': 'professional'}
>>> userlists['paul'] = {'id':'paulgru2039', 'firstname': 'paul', 'age': '26', 'location': 'sweden', 'rank': 'professional'}
>>> userlists['nathalie'] = {'id':'natscotia2309', 'firstname': 'nathalie', 'age': '21', 'location': 'america', 'rank': 'professional'}
>>> userlists['scott'] = {'id':'scottie2379', 'firstname': 'scott', 'age': '23', 'location': 'new zealand', 'rank': 'professional'}
>>> userlists['will'] = {'id':'wilson9335', 'firstname': 'will', 'age': '27', 'location': 'sweden', 'rank': 'professional'}
>>> userlists['adrian'] = {'id':'adriano5519', 'firstname': 'adrian', 'age': '22', 'location': 'ireland', 'rank': 'professional'}
>>> userlists['julian'] = {'id':'jules8756', 'firstname': 'julian', 'age': '27', 'location': 'mexico', 'rank': 'professional'}
>>> userlists['rico'] = {'id':'ricololo4981', 'firstname': 'rico', 'age': '20', 'location': 'sweden', 'rank': 'professional'}
>>> userlists['kate'] = {'id':'kitkatkate0189', 'firstname': 'kate', 'age': '24', 'location': 'south africa', 'rank': 'professional'}
Next, populate all the users to a list:
>>> users = userlists.keys()
>>> users
['will', 'aubrey', 'max', 'adrian', 'michael', 'steve', 'rico', 'scott', 'rick', 'nathalie', 'samantha', 'paul', 'john', 'mikhayla', 'kate', 'julian']
Next, populate thhe event list which is a list of dictionaries of the event name and game names per event:
>>> events = [
{
'name': 'gaming_nationals_round_01',
'game': 'counter_strike'
},
{
'name': 'gaming_nationals_round_02',
'game': 'fifa'
},
{
'name': 'gaming_nationals_round_03',
'game': 'rocket_league'
},
{
'name': 'gaming_nationals_round_04',
'game': 'world_of_warcraft'
},
{
'name': 'gaming_nationals_round_05',
'game': 'pubg'
},
{
'name': 'gaming_nationals_round_06',
'game': 'league_of_legends'
},
{
'name': 'gaming_nationals_round_07',
'game': 'dota'
}
]
Next, our function to generate the item. The function accepts a name and eventname parameter which we will retrieve from the previous assignments:
>>> def generate(name, eventname):
item = {
'event': {'S': eventname['name']},
'timestamp': {'S': datetime.datetime.utcnow().strftime("%Y-%m-%dT%H:%M")},
'gamerid': {'S': name['id']},
'name': {'S': name['firstname']},
'age': {'N': str(name['age'])},
'location': {'S': name['location']},
'game': {'S': eventname['game']},
'score': {'N': str(random.randint(10000, 19999))},
'rank': {'S': name['rank']}}
return item
Next will be our main application code that will loop through our data that we defined earlier, each event
and name
will be passed as a parameter to the generate function which will produce a gamescore
item and then written to DynamoDB:
>>> import boto3
>>> import time
>>> client = boto3.Session(region_name='eu-west-1', profile_name='dev').client('dynamodb')
>>> for eventname in events:
for user in users:
item = generate(userlists[user], eventname)
client.put_item(TableName='gamescores', Item=item)
time.sleep(300)
Hope this was useful.