The biggest problem with EventBridge Scheduler and how to fix it

UPDATE 02/08/2023: EventBridge Schedule now supports automated deletion upon completion. So the problem and solution discussed in this post is no longer relevant. Please see the announcement here.

 

The launch of EventBridge Scheduler was one of the highlights for me for re:Invent 2022. Finally, we have a scalable service that lets us schedule ad-hoc, one-off tasks in a serverless way!

For longtime followers of my work, you might have read “Serverless Architectures on AWS, 2nd Edition”. In the book, I spent an entire chapter showing you five ways to implement a similar service and discussed the different considerations for such a service:

  • Precision: how close to the scheduled time is the task executed?
  • Scalability (number of open tasks): can the service support millions of tasks that are scheduled but not yet executed?
  • Scalability (hotspots): can the service execute millions of tasks at the same time?
  • Cost

The chapter teaches you about architectural design and how to think about (and manipulate) trade-offs by walking you through five different implementations. While the lessons from this chapter are still relevant, the implementation ideas are largely superseded by EventBridge Scheduler. Unless you require millisecond-level precision, there is no good reason to build a custom solution anymore.

Having said that, EventBridge Scheduler still has a big problem.

At the time of writing, one-off schedules are not automatically deleted after they have been executed.

This is a problem because:

  1. It pollutes the control plane with lots of expired schedules that will never be executed again. It makes iterating through and finding relevant schedules more difficult.
  2. More importantly, there is an initial limit of 1,000,000 schedules per region per account. See the official quotas page for EventBridge Scheduler.

Even the official quotas page says “We recommend deleting your one-time schedules after they’ve completed…”. It’s a shame there is no support for automatic deletion at this point.

To me, this is the biggest problem with using EventBridge Scheduler for executing one-off tasks right now. It is exactly what I described as the “Scalability (number of open tasks)” criteria above.

The fix

Luckily, this is a problem that we can solve with relative ease.

I saw a blog post from Pubudu Jayawardana on how you can solve this problem using Step Functions.

It’s a clever idea and I like it. But a simpler and cheaper solution would be to use Lambda Destinations instead.

When EventBridge Scheduler invokes the target Lambda function, it does so via an asynchronous invocation. This means we can use Lambda Destinations (which doesn’t support synchronous invocations) to trigger the cleanup step and delete the schedule.

You can see an example of this in this demo repo.

For this to work, the onSuccess function needs to know the name of the schedule. It’s the only piece of information you need to delete a schedule, as you can see from the code snippet below.

const Scheduler = require('aws-sdk/clients/scheduler')
const SchedulerClient = new Scheduler()

module.exports.handler = async (event) => {
  const name = event.requestPayload.name

  await SchedulerClient.deleteSchedule({
    Name: name
  }).promise()
}

Luckily, we just need to make sure the target Lambda function (for the schedule) receives the name of the schedule as part of its invocation event. Because the onSuccess function would receive this as requestPayload when it’s invoked by the Lambda service, as you can see from the trace collected in Lumigo:

You can see how these fit together in my demo repo. In the repo, this is the API Gateway function that creates the schedule:

const Scheduler = require('aws-sdk/clients/scheduler')
const SchedulerClient = new Scheduler()
const uuid = require('uuid')

const { EXECUTE_ARN, ROLE_ARN } = process.env

/**
 * 
 * @param {import('aws-lambda').APIGatewayEvent} event 
 * @returns {Promise<import('aws-lambda').APIGatewayProxyResult>}
 */
module.exports.handler = async (event) => {
  const name = uuid.v4()
  const resp = await SchedulerClient.createSchedule({
    Name: name,
    ScheduleExpression: `at(${event.body})`,
    FlexibleTimeWindow: {
      Mode: 'OFF'
    },
    Target: {
      Arn: EXECUTE_ARN,
      RoleArn: ROLE_ARN,
      Input: JSON.stringify({
        name
      })
    }
  }).promise()  

  return {
    statusCode: 200,
    body: resp.ScheduleArn
  }
}

Note the name of the schedule is passed along in Target.Input. This input then becomes the invocation event for the target Lambda function.

module.exports.handler = async (event) => {
  // the name of the schedule
  // is captured in event.name
}

And it’s passed along to the target Lambda function, and then eventually to the onSuccess function. Which is used to delete the schedule from EventBridge Schedule.

Wrap up

I hope you have found this article useful and helps you make better use of EventBridge Scheduler. It’s one of the most exciting services that AWS has launched in recent years. If you are using it already or are thinking about using it, then please let me know via Twitter or LinkedIn what

I also want to thank Pubudu for sharing his idea of using Step Functions, it gave me the inspiration to write up my thoughts and share them with you.

If you want to learn more about building serverless architecture, then check out my upcoming workshop where I would be covering topics such as testing, security, observability and much more.

Hope to see you there.