Parallel Processing on AWS Lambda with Python using Multiprocessing
If you are trying to use multiprocessing.Queue
or multiprocessing.Pool
on AWS Lambda, you are probably getting the exception:
[Errno 38] Function not implemented: OSError
sl = self._semlock = _multiprocessing.SemLock(kind, value, maxvalue)
OSError: [Errno 38] Function not implemented
The reason for that is due to the Lambda execution environment not having support on shared memory for processes, therefore you can’t use multiprocessing.Queue
or multiprocessing.Pool
.
As a workaround, Lambda does support the usage of multiprocessing.Pipe
instead of Queue.
Parallel Processing on Lambda Example
Below is a very basic example on how you would achieve the task of executing parallel processing on AWS Lambda for Python:
import time
import multiprocessing
region_maps = {
"eu-west-1": {
"dynamodb":"dynamodb.eu-west-1.amazonaws.com"
},
"us-east-1": {
"dynamodb":"dynamodb.us-east-1.amazonaws.com"
},
"us-east-2": {
"dynamodb": "dynamodb.us-east-2.amazonaws.com"
}
}
def multiprocessing_func(region):
time.sleep(1)
endpoint = region_maps[region]['dynamodb']
print('endpoint for {} is {}'.format(region, endpoint))
def lambda_handler(event, context):
starttime = time.time()
processes = []
regions = ['us-east-1', 'us-east-2', 'eu-west-1']
for region in regions:
p = multiprocessing.Process(target=multiprocessing_func, args=(region,))
processes.append(p)
p.start()
for process in processes:
process.join()
output = 'That took {} seconds'.format(time.time() - starttime)
print(output)
return output
The output when the function gets invoked:
pid: 30913 - endpoint for us-east-1 is dynamodb.us-east-1.amazonaws.com
pid: 30914 - endpoint for us-east-2 is dynamodb.us-east-2.amazonaws.com
pid: 30915 - endpoint for eu-west-1 is dynamodb.eu-west-1.amazonaws.com
That took 1.014902114868164 seconds
Thank You
Please feel free to show support by, sharing this post, making a donation, subscribing or reach out to me if you want me to demo and write up on any specific tech topic.