While testing DynamoDB for a specific use case I picked up at times where a GetItem will incur about 150ms in RequestLatency on the Max Statistic. This made me want to understand the behavior that I'm observing.
I will go through my steps drilling down on pointers where latency can be reduced.
DynamoDB Performance Testing Overview
Tests:
- Create 2 Tables with 10 WCU / 10 RCU, one encrypted, one non-encrypted
- Seed both tables with 10 items, 18KB per item
- Do 4 tests:
- Encrypted: Consistent Reads
- Encrypted: Eventual Consistent Reads
- Non-Encrypted: Consistent Reads
- Non-Encrypted: Eventual Consistent Reads
Seed the Table(s):
Seed the Table with 10 items, 18KB per item:
from boto3 import Session as boto3_session
from time import sleep, strftime
from random import sample
# session ids that will be fetched in a random.choice order
session_ids = [
'77c81e29-c86a-411e-a5b3-9a8fb3b2595f',
'b9a2b8ee-17ab-423c-8dbc-91020cd66097',
'cbe01734-c506-4998-8727-45f1aa0de7e3',
'e789f69b-420b-4e6d-9095-cd4482820454',
'c808a4e6-311e-48d2-b3fd-e9b0602a16ac',
'2ddf0416-6206-4c95-b6e5-d88b5325a7b1',
'e8157439-95f4-49a9-91e3-d1afc60a812f',
'f032115b-b04f-423c-9dfe-e004445b771b',
'dd6904c5-b65b-4da4-b0b2-f9e1c5895086',
'075e59be-9114-447b-8187-a0acf1b2f127'
]
generated_string = ''
# instantiating dynamodb client
session = boto3_session(region_name='eu-west-1', profile_name='perf')
dynamodb = session.client('dynamodb')
timestamp = strftime("%Y-%m-%dT-%H:%M")
results = open('dynamodb-put-results_{}.txt'.format(timestamp), 'a')
count = 0
for sid in session_ids:
count += 1
gen_data = ''.join(sample(generated_string, len(generated_string)))
sleep(1)
response = dynamodb.put_item(
TableName='ddb-perf-testing',
Item={
'session_id': {'S': sid },
'data': {'S': gen_data },
'item_num': {'S': str(count) }
}
)
results.write('Call Number: {call_num} \n'.format(call_num=count))
results.write('Call ResponseMetadata: {metadata} \n\n'.format(metadata=response['ResponseMetadata']))
results.close()
Read from the Table(s):
- Read 18KB per second for 3 Hours:
from boto3 import Session as boto3_session
from time import sleep, strftime
from random import choice
# delay between each iteration
iteration_delay = 1
# iterations number - 3 hours
iterations = 10800
# session ids that will be fetched in a random.choice order
session_ids = [
'77c81e29-c86a-411e-a5b3-9a8fb3b2595f',
'b9a2b8ee-17ab-423c-8dbc-91020cd66097',
'cbe01734-c506-4998-8727-45f1aa0de7e3',
'e789f69b-420b-4e6d-9095-cd4482820454',
'c808a4e6-311e-48d2-b3fd-e9b0602a16ac',
'2ddf0416-6206-4c95-b6e5-d88b5325a7b1',
'e8157439-95f4-49a9-91e3-d1afc60a812f',
'f032115b-b04f-423c-9dfe-e004445b771b',
'dd6904c5-b65b-4da4-b0b2-f9e1c5895086',
'075e59be-9114-447b-8187-a0acf1b2f127'
]
# instantiating dynamodb client
session = boto3_session(region_name='eu-west-1', profile_name='perf')
dynamodb = session.client('dynamodb')
dynamodb-table = 'ddb-perf-testing'
timestamp = strftime("%Y-%m-%dT-%H:%M")
results = open('dynamodb-results_{}.txt'.format(timestamp), 'a')
for iteration in range(iterations):
count = iteration + 1
print(count)
sleep(iteration_delay)
response = dynamodb.get_item(
TableName=dynamodb-table,
Key={'session_id': {'S': choice(session_ids)}},
ConsistentRead=False
)
results.write('Call Number: {cur_iter}/{max_iter} \n'.format(cur_iter=count, max_iter=iterations))
results.write('Call Item Response => Key: {attr_id}, Key Number:{attr_num} \n'.format(attr_id=response['Item']['session_id']['S'], attr_num=response['Item']['item_num']['S']))
results.write('Call ResponseMetadata: {metadata} \n\n'.format(metadata=response['ResponseMetadata']))
results.close()
Results
Notes from AWS Support:
Reasons for High Latencies:
- RequestLatency is a Server Side Metric
- Long requests could relate to metadata lookups
- Executing Relative Low Amount of Requests there is Frequent Metadata Lookups; This may cause a spike in latency
- Consistent Requests can have higher average latency then Eventual Consistent Reads
- Requests in general can encounter higher then normal latency at times, due to network issue, storage node issue, metadata issue.
- The p90 should still be single digit
- Using Encryption has to interact with KMS API as well (mechanisms in place to deal with KMS integration though to still offer p90 under 10 ms)
- DAX: Strongly consistent reads will be passed on to DynamoDB and not handled by the cache
- 1 RCU reading in Eventual Consistent manner can read 8 kb
- Consistent read costs double an eventual consistent read
- DDB not 100% of requests will be under 10 ms
Resources:
Comments