Monitoring tools for serverless environments and AWS Lambda

Serverless computing platforms like AWS Lambda represent a new computing paradigm. Over the past decade, we’ve been trying to abstract the application layer from the infrastructure layer. VMs started this by virtualizing hardware servers, and Docker containers extended this by packaging just the application code separate from the host system.

The next step in this process is to completely remove the host from the equation, and simply focus on the application code—which is what serverless computing is all about. But as with most things new, serverless forces you to change how you used to do things, including the way you monitor functions.

Below, I explain how monitoring changes in a serverless environment. I focus on Lambda in particular, although most of these observations apply to any type of modern serverless platform.

Screenshot of serverless environment

How monitoring is different with serverless

In traditional client-server apps, you typically monitor the performance of your servers, network latency, and more. However, with serverless platforms like Lambda, these metrics are irrelevant. This is because the vendor manages the underlying servers and plumbing, leaving you to focus exclusively on your application code.

This means you don’t need to worry about how much compute power your servers have available to execute your code. Lambda automatically scales the available compute capacity to ensure your code is executed (there’s a caveat, but more on that later). You don’t have to worry about load balancing across multiple servers, or optimizing network latency. AWS takes care of this, too.

Serverless metrics to monitor

Yet while all this is hidden from you, and handled by AWS, there are other metrics you should monitor.

The most important element within your control is your application code. With Lambda, you upload your application code as a function, and AWS handles the execution of this code. Errors in any line of your code will result in the function not executing as expected.

Although Lambda handles provisioning of resources to execute your functions, it has limits to the amount of memory and concurrent executions it can allocate to functions. The maximum allocated memory is 1536MB, and the maximum number of concurrent executions varies by region—some can run 500 events per minute, while others like US West can run 3,000 events.

If a function exceeds the limits on memory and concurrency, Lambda stops executing the function and throws an exception. If, for example, you experience latency with any of your functions, you should check the memory and concurrency rate. This is different from monitoring traditional server performance, and takes some getting used to. You may need to remove some functions, or make some functions smaller and more simple. However, this is still much easier than having to manage the infrastructure yourself and keep scaling resources according to demand.

Another common cause of errors in Lambda is access and permissions. If your Lambda function is supposed to access data stored in another AWS service, but doesn’t have the necessary permissions set in AWS IAM, your code will not execute.

Amazon’s built-in monitoring for Lambda

AWS primarily uses CloudWatch to monitor Lambda performance. CloudWatch tracks metrics like the number of functions executed, latency in execution, and errors during execution. By default, these metrics are recorded in one-minute intervals. If you want to go beyond these, you can setup custom metrics in CloudWatch using the AWS CLI or API. Custom metrics are more powerful as they can be recorded in intervals as low as one second. However, these high-resolution metrics come with a fee, unlike the default metrics which are free of charge.

CloudWatch also records errors to logs. The errors in CloudWatch logs are another crucial source of insight when troubleshooting serverless application problems.

Additionally, AWS provides scanning for application performance on Lambda using X-Ray. This service tracks the progress of a request in an application across various AWS services. So, if a Lambda function is triggered by an API Gateway, and during execution accesses data from an S3 bucket, X-Ray can trace the progress of this function and display it visually. This helps with root cause analysis.

Once you’ve identified the root cause and you’d like to dig deeper, you can look into CloudTrail logs to get additional details on errors. CloudTrail logs all API calls within Lambda. It includes details such as the source IP address, time, frequency of occurrence, and more.

Error Monitoring for Lambda

While AWS has provided many options to monitor Lambda performance, you still need to monitor your application logic for errors. Logical errors can cause your service to give failure responses to clients in the best case, or fail silently in the worst case. If you use a tool like Rollbar for end-to-end monitoring of your system, you can send your application errors and uncaught exceptions to Rollbar for holistic analysis. You can also leverage Rollbar’s unique monitoring features, like de-duping of alerts, viewing of detailed stack traces, and gauging the impact of errors with user tracking to fix errors sooner.

Conclusion

As you work with Lambda, there’s a lot to adapt to. Monitoring Lambda is different from monitoring traditional applications. There isn’t much lower-level infrastructure to monitor, but there’s a new way to monitor application performance at the upper layers of the stack. You need to leverage AWS’ built-in monitoring services and features like CloudWatch, X-Ray, and custom metrics. Your monitoring can also be more effective by adding on error monitoring tools like Rollbar which give you a deeper view into logical errors.

If you haven’t already, signup for a 14-day free trial of Rollbar and let us help you take control of impactful production errors. 🙂