9 reasons you should consider using Step Functions for microservices orchestration.
1. Ease of Integration
Using step functions makes it really easy to wire up lambda or other microservices. The biggest advantage comes from the fact that you are relieved of most of the plumbing work. With a simple declarative JSON you just wire up all of the microservices instead of having to manually configure SQS, SNS and IAM role amends for each of the lambdas to work with these services.
Here is what you would normally need to wire the microservices up. You would create and manage each of the resources below by hand. This requires good amount of time investment.
- Lambdas
- SNS
- SQS
- Kinesis
- Kafka
- IAM permissions
With Step functions all you need is one clean JSON. You do have to create your lambdas but the declarative JSON is really simple and nice to wire them all up together. The messaging infrastructure is taken care of for you so no need to worry about SQS, SNS etc...
{
"Comment": "A Hello World example",
"StartAt": "Pass",
"States": {
"Pass": {
"Comment": "Comments...",
"Type": "Pass",
"Next": "Hello World example?"
},
"Hello World example?": {
"Comment": "Comments...",
"Type": "Choice",
"Choices": [
{
"Variable": "$.IsHelloWorldExample",
"BooleanEquals": true,
"Next": "Yes"
},
{
"Variable": "$.IsHelloWorldExample",
"BooleanEquals": false,
"Next": "No"
}
],
"Default": "Yes"
},
"Yes": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "arn:aws:lambda:ap-southeast-2:123456789:function:MyLambdaFunction",
"Payload": {
"Input.$": "$"
}
},
"Next": "Wait 3 sec"
},
"No": {
"Type": "Fail",
"Cause": "Not Hello World"
},
"Wait 3 sec": {
"Comment": "A Wait state delays the state machine.",
"Type": "Wait",
"Seconds": 3,
"Next": "Parallel State"
},
"Parallel State": {
"Comment": "A Parallel state can be used for parallel flows.",
"Type": "Parallel",
"Next": "Hello World",
"Branches": [
{
"StartAt": "Hello",
"States": {
"Hello": {
"Type": "Pass",
"End": true
}
}
},
{
"StartAt": "World",
"States": {
"World": {
"Type": "Pass",
"End": true
}
}
}
]
},
"Hello World": {
"Type": "Pass",
"End": true
}
}
}
The above JSON gives you a nice looking flow chart below. Very easy to understand.
2. Less Code to Develop
Since you do not have to do the plumbing work yourself, it significantly reduces the amount of IaC such as (Terraform or Cloud Formation) that you may have to write to put your workflow together. Hence there is a big positive impact on your project timelines.
As stated above just one JSON wires up all your lambdas together.
3. Ease of coding
Passing the events object across microservices is greatly simplified.
For example your lambdas can simply pass on the event object with more data added, removed or manipulated.
# lambda 1 codedef my_handler(event, context):
event['message'] = "pass this to next"
return event
Easily pass objects without worrying about handling serializing as long as all objects that you are passing are serializable.
# lambda 2 codedef my_handler(event, context):
print(event['message'])
return event
If your messages are more than 256kb then there is always an option to use the messages for metadata (such as S3 location info) while keeping objects in S3 in case they are likely to exceed this size limit.
4. Ease of debugging
Debugging is greatly simplified. You can view the inputs and outputs for each flow. Also each flow can be custom named. For example if you were processing file 12345 then you could create a naming convention such as 12345-<guid> where the guid allows making the flow unique in case you had to re-process 12345 while also giving you the ability to do a wildcard search for that flow in the console.
The UI also gives you a full flow indicator with green/red signalling to identify problems. As you can see below I have the execution name prefixed with a serial. This can be triggered via some automation such as a S3 event or a lambda with naming convention of your choice.
client = boto3.client('stepfunctions')response = client.start_execution( stateMachineArn='aws:states:.......', name='12345-a057ac36-ad21-644a-eb89-a3bac32c77c9', input= "{\"first_name\" : \"test\"}")
5. Error Handling
Error handling is also really easy with Step functions. More details can be found in the Step Functions documentation but the main thing I like is to view at high level the exception stack trace printed next to the step that fails. It really speeds up fixing issues.
6. Traceability of workflows
As stated above, each workflow is easy to trace for audit purposes. Instead of following logs across several microservices you get it all in one place which is very easy to track.
7. Speed
The Standard Step Functions can process upto 2000 executions and 4000 transitions per second. There is an express edition which can do upto 100,000 per second execution rate and nearly unlimited state transitions. So for most microservices architectures this is more than adequate.
8. Flexibility
Step Functions offers a very cool functionality to harness polymorphic behaviour of lambdas. There is another blog I have written which explains it in detail as it is too involved to explain here. Please follow this link to know more.
9. Encryption
Messages flowing through Step Function states are encrypted by default. So no additional work is required.
https://docs.aws.amazon.com/step-functions/latest/dg/security-encryption.html
Limits
Standard workflows
- 2,000 per second execution rate
- 4,000 per second state transition rate
Express workflows
- 100,000 per second execution rate
- Nearly unlimited state transition rate
Message size limits
256 KBytes increased from 32KBytes in Sep 2020.
Step Function Instances History
There is a limit of 90 days for Step Functions executions history.
Step Function Transtitions per Instance
For very long running instances such as with conditional loops etc you need to consider if you will hit the limit of 25000 transitions within a Step Function execution. It is very unlikely you will hit this limit but there is a workaround if you do.