AWS Compute Blog

Building cost-effective AWS Step Functions workflows

Builders create AWS Step Functions workflows to orchestrate multiple services into business-critical applications with minimal code. Customers are looking for best practices and guidelines to build cost-effective workflows with Step Functions.

This blog post explains the difference between Standard and Express Workflows. It shows the cost of running the same workload as Express or Standard Workflows. Then it covers how to migrate from Standard to Express, how to combine workflow types to optimize for cost, and how to modularize and nest one workflow inside another.

Step Functions Express Workflows

Express Workflows orchestrate AWS services at a higher throughput of up to 100,000 state transitions per second. It also provides a lower cost of $1.00 per million invocations versus $25 per million for Standard Workflows.

Express Workflows can run for a maximum duration of 5 minutes and do not support the .waitForTaskToken or .sync integration pattern. Most Step Functions workflows that do not use these integrations patterns and complete within the 5-minute duration limit see both cost and throughput optimizations by converting the workflow type from Standard to Express.

Consider the following example, a naïve implementation of an ecommerce workflow:

When started, it emits a message onto an Amazon SQS queue. An AWS Lambda function processes and approves this asynchronously (not shown). Once processed, the Lambda function persists the state to an Amazon DynamoDB table. The workflow polls the table to check when the action is completed. It then moves on to process the payment, where it repeats the pattern. Finally, the workflow runs a series of update tasks in sequence before completing.

I run this workflow 1,000 times as a Standard workflow. I then convert this to an Express Workflow and run another 1,000 times. I create an Amazon CloudWatch dashboard to display the average execution times. The Express Workflow runs on average 0.5 seconds faster than the Standard Workflow and also shows improvements in cost:

Workflow Execution times

Running the Standard Workflow 1,000 times costs approximately $0.42. This excludes the 4,000 state transitions included in the AWS Free Tier every month, and the additional services that are being used. In contrast to this, running the Express Workflow 1000 times costs $0.01. How is this calculated?

Standard Workflow cost calculation formula:

Standard Workflows are charged based on the number of state transitions required to run a workload. Step Functions count a state transition each time a step of your workflow runs. You are charged for the total number of state transitions across all your state machines, including retries. The cost is $0.025 per 1,000 state transitions.

A happy path through the workflow comprises 17 transitions (including start and finish).

Total cost = (number of transitions per execution x number of executions) x $0.000025
Total cost = (17 X 1000) X 0.000025 = $0.42*

*Excluding the 4,000 state transitions included in the AWS Free Tier every month.

Express Workflow cost calculation formula:

Express Workflows are charged based on the number of requests and its duration. Duration is calculated from the time that your workflow begins running until it completes or otherwise finishes, rounded up to the nearest 100 ms, and the amount of memory used in running your workflow, billed in 64-MB chunks.

Total cost = (Execution cost + Duration cost) x Number of Requests
Duration cost = (Avg billed duration ms / 100) * price per 100 ms
Execution cost = $0.000001 per request

Total cost = ($0.000001 + $0.0000117746) x 1000 = $0.01
Duration cost = (11300 MS /100) * $ 0.0000001042 = $0.0000117746
Execution cost = $0.000001 per request

This cost changes depending on the number of GB-hours and memory sizes used. The memory usage for this State machine is less than 64 MB.
See the Step Functions pricing page for full more information.

Converting a Standard Workflow to an Express Workflow

Given the cost benefits shown in the previous section, converting existing Standard Workflows to Express Workflows is often a good idea. However, some considerations should be made before doing this. The workflow must finish in less than 5 minutes and not use .WaitForTaskToken or .sync integration patterns. Express Workflows send logging history to CloudWatch Logs at an additional cost.

An additional consideration is idempotency, and exactly once versus at least once execution requirements. If a workload requires a guaranteed once execution model, then a Standard Workflow is preferred. Here, tasks and states are never run more than once unless you have specified retry behavior in Amazon States Language (ASL). This makes them suited to orchestrating non-idempotent actions, such as starting an Amazon EMR cluster or processing payments. Express Workflows use an at-least-once model, where there is a possibility that an execution might be run more than once. This makes them ideal for orchestrating idempotent actions. Idempotence refers to an operation that produces the same result (for a given input) irrespective of how many times it is applied.

To convert a Standard Workflow to an Express Workflow directly from within the Step Functions console:

  1. Go to the Step Functions workflow you want to convert, and choose Actions, Copy to new.

  2. Choose Design your workflow visually.
  3. Choose Express then choose Next.
  4. The next two steps allow you to make changes to your workflow design. Choose Next twice.
  5. Name the workflow, assign permissions, logging and tracing configurations, then choose Create state machine.

If converting a Standard Workflow defined in a templating language such as AWS CDK or AWS SAM, you must change both the Type value and the Resource name. The following example shows how to do this in AWS SAM:

StateMachinetoDDBStandard:
    Type: AWS::Serverless::StateMachine
    Properties:
      Type: STANDARD

Becomes:

StateMachinetoDDBExpress:
    Type: AWS::Serverless::StateMachine
    Properties:
      Type: EXPRESS

This does not overwrite the existing workflow, but creates a new workflow with a new name and type.

Better together

Some workloads may require a combination of both long-running and high-event-rate workflows. By using Step Functions workflows, you can build larger, more complex workflows out of smaller, simpler workflows.

For example, the initial step in the previous workflow may require a pause for human interaction that takes more than 5 minutes, followed by running a series of idempotent actions. These types of workloads can be ideal for using both Standard and Express workflow types together. This can be achieved by nesting a “child” Express Workflow within a “parent” Standard Workflow. The previous workflow example has been refactored as a parent-child nested workflow.

Deploy this nested workflow solution from the Serverless Workflows Collection.

Nesting workflows

Parent Standard Workflow

Child Express Workflow

Nested workflow metrics

This new blended workflow has a number of advantages. First the polling pattern is replaced by .WaitForTaskToken. This pauses the workflow until a response is received indicating success or failure. In this case, the response is sent by a Lambda function (not shown). This pause can last for up to 1 year, and the wait time is not billable.

This not only simplifies the workflow but also reduces the number of state transitions. Next, the idempotent steps are moved into an Express Workflow, this reduces the number of state transitions from the Standard Workflow, and benefits from the high throughput provided by Express Workflows. The child workflow is invoked by using the StartExecution StepFunctions API call from the parent workflow.

This new workflow combination runs 1,000 times, costing a total cost of 20 cents. There is no additional charge for starting a nested workflow: It is treated as another state transition. The nested workflow itself is billed the same way as all Step Functions workflows.

Here’s how the cost is calculated:

Parent Standard Workflow:

Total cost = (number of transitions per execution x number of executions) x $0.000025
Total cost =(8*1000) *0.000025 = $0.20

Child Express Workflow:

Total cost = (Execution cost + Duration cost) x No Requests
Duration cost = (Avg billed duration ms / 100) * price per 100ms
Execution cost = $0.000001 per request

Total cost = ($0.000001 + $0.0000013546) x 1000 = $0.0002
Duration cost = (1300 ms /100) * $ 0.0000001042 = $0.0000013546
Execution cost = $0.000001 per request

Total cost for nested workflow = (cost of Parent Standard Workflow) + (cost of Child Express Workflow)
Total cost for nested workflow = 0.20 cents  / 1000 executions.

Conclusion

This blog post explains the difference between Standard and Express Workflows. It describes the exactly once and at-least-one execution models and how this relates to idempotency. It compares the cost of running the same workload as an Express and Standard Workflow, showing how to migrate from one to the other and the considerations to make before doing so.

Finally, it explains how to combine workflow types to optimize for cost. Nesting state machines between types enables teams to work on individual workflows, turning them into modular reusable building blocks.

Visit the Serverless Workflows Collection to browse the many deployable workflows to help build your serverless applications.