Serverless in 2023

Symphonia
Mike Roberts
Mar 15, 2023

I was recently a guest on the ThoughtWorks Tech Podcast to give my thoughts about the state of Serverless today. We covered a lot, but the main conversation points to me were:

  • What does Serverless actually mean today?
  • The Enterprise-ification of Serverless
  • Lambda is great, but teams don’t know how to build with it
  • Will companies embrace the Serverless-enabled event-driven renaissance?

In this article I go through these four areas in detail.

The (over) expansion of Serverless

Back in 2016 I described Serverless as the combination of two related areas of cloud technology:

  • Functions-as-a-Service / FaaS services, like AWS Lambda, provide the ability to run custom backend code without having to think about server hosts or processes.
  • Backend-as-a-Service / BaaS services are provider-managed services, that can be incorporated into an application’s architecture, and have the same “scale to zero”, “no server operations”, and “implicit High Availability” traits of FaaS. AWS S3 is one of the oldest examples of BaaS, but there are now a huge number of such services from many different vendors.

These two areas of Serverless still mostly do a decent job of covering the subject today, with both having largely matured. However, Serverless in 2023 also means more than this, both for better and for worse.

On one hand an exciting area of growth has been “Serverless hosts for web deployment” - providers like Vercel, Netlify, AWS Amplify, and others. These services are something of a combination of both FaaS - able to run custom server-side code - and BaaS - providing application-level services managed by a third party. But they’re also more than this - they’re higher level application platforms, but still with all the Serverless benefits we’re now used to.

Similarly, “Serverless backend compute” today means more than just “functions”, with legitimate offerings from multiple vendors to host web API services and containers. App Runner is AWS’ contribution here, but Google Cloud has been leading the way in this area for several years. It’s arguable that low-code / no-code platforms, including orchestrators like AWS Step Functions, are also in this blurry area.

On the other hand, unfortunately, some marketing teams have realized that “Serverless” is a money-making buzzword and are playing a little fast-and-loose with the definition. What is most disappointing is that AWS themselves have made a habit of this. Notorious examples are Neptune (a graph database), OpenSearch Serverless (an Elasticsearch competitor), and Aurora Serverless (a relational database).

These, and several other cloud services, are now being mis-branded as Serverless, even though what their differentiating features really amount to are “very good managed-auto-scaling”. The problem is that these services don’t scale to zero and still require a lot of operational hand-holding.

I personally don’t feel that this is a “minor quibble” - services that scale their costs to zero can be freely brought up and down many times across an organization. Services that don’t scale costs to zero often require very different operational management, for example sharing instances across developers, and such differences often drive architecture.

These “not really Serverless” offerings are great in their own way, and useful to many companies, but slapping the “Serverless” moniker on them doesn’t help customers and just makes life more confusing for the industry, especially for people new to this area of cloud.

In other words Serverless is now subject to “semantic diffusion” - what it means really does depend on who you talk to.

The Enterprise-ification of Serverless

I mentioned earlier that FaaS and BaaS have largely matured - certainly when we look at the services that have been around since 2015 or before.

What I mean by “matured” is that most of the surprising functionality gaps that can trip teams up as their applications grow have either gone away, or are so glaringly obvious that they scream “this is not the right solution for you!”. For example back in 2016 AWS Lambda’s deployment tooling and configuration were big gaps but both of these problems now have decent (if not perfect) solutions.

AWS and other providers have “met customers where they are” and we now see a proliferation of features, knobs, and dials on their services - Lambda being one of the clearest cases of this. Some of these solve somewhat universal problems, e.g. Reserved Concurrency is a useful technique to avoid overloading a relational database. However, others are specific to particular use cases, and often seem to be focussed on “big enterprise” usage. A good example is a recent feature from Lambda named Runtime Management Controls. This feature is all about letting customers decide when the underlying operating system version, and runtime point-release version, of their Lambda functions can be upgraded. For most people using Lambda this is not useful - we’ve done fine without it for years and trust AWS to patch security problems, etc., as soon as they can. But for larger organizations that need much more strict control around this kind of thing - for whatever reason - they now have the option.

So, meeting customers where they are, more capabilities, this is all good, right? My response is “yes, but…”

One big problem with all of this is that it makes building and operating Serverless applications ever more complicated. The larger the surface area of techniques that an engineer can use, the more decisions they need to make.

AWS is not the only problem here - most Serverless services that have been around a while and are actively used are suffering from “feature bloat” (I’m looking at you, Auth0).

The key to being a product visionary is not just adding new features, but also providing an experience that guides users to the best solution for them. Unfortunately this second aspect is lacking in many parts of the technology industry, Serverless included.

Or to put it another way, serverless product leaders - especially at AWS - could do well with revisiting the Alan Kay quote:

“Simple things should be simple, complex things should be possible.”

A second problem to feature proliferation is coherence. As these services grow, and multiple product managers are involved over time, it’s often the case that less frequently used features within a service are incompatible with other features in the same service. An example for Lambda today is that the new SnapStart feature (which I’ve written about previously) is incompatible with several other important parts of Lambda, like X-Ray.

Lambda is great, but teams don’t know how to build with it

Part of the maturing of AWS Lambda is that it is a solid general-purpose cloud compute platform. Sure, there are certain types of use case where I don’t think it’s a good fit (like low-latency trading and big-data processing) but on the whole it’s my recommended “first option” for hosting server-side code on AWS.

There are still a few concerns - mostly around how well AWS is going to maintain Lambda and related tools over the years - but I feel better about AWS than certain other cloud providers that have a habit of killing things.

The big problem that remains with Lambda is not what it can do, but how software development teams use it. This isn’t just about design - it runs the gamut of areas from architecture to code organization, and from team management to deployment and delivery.

AWS does a good job these days of providing an interesting catalog of feature design patterns - especially relating to the vast number of options of integrating Lambda with other services (one of AWS’ strengths.) However there is a glaring lack of strong guidance on how to build larger Lambda-based applications.

Here’s an example - say you are building a moderately-sized service that has 40 API entry points, 10 event-driven entry points, and 10 scheduled tasks. How many Lambda Functions should this service have? There is no strong, consistent, guidance for this question.

The fallback position from AWS is that every Lambda Function should perform one task, and one task only. This reduces startup latency and allows for “least privilege” security boundaries to be enforced. The problem is that for your moderately-sized service example you’ll now have 60 individual Lambda Functions, and I can tell you from experience that that is a horribly large number of Functions to deal with in one service. Deployment will be slow, which will make developer experience frustrating, and operations will be a constant headache. The only frequently voiced alternative is to put everything into one Lambda Function - a so-called “MonoLambda” or “Lambdalith”, but this has its own significant downsides.

My recommendation is typically to have something between the two ends of this spectrum, which I wrote about last year. But this opinion is not treated widely as “usually the best idea”. The strange thing to me is that this is a problem I see with every client of mine that is building something non-trivial on Lambda.

The “number of Lambda Functions” problem is just one question that teams immediately hit when they are looking at designing and running Lambda applications, and not just Lambda nano-services. Others include “how does CI/CD change for Lambda?”, “does using Lambda mean I can’t use Microservices / Monolith / Monorepo design anymore”, etc. The frustrating aspect for me is that I think that all of these questions now have good answers, the knowledge just isn’t broadly out there.

I’m thinking about trying to help solve this problem. Stay tuned for future developments…

Will companies embrace the Serverless-enabled event-driven renaissance?

Over the last couple of years AWS has been embracing asynchronous event-driven architecture, and services, as part of their Serverless platform. Lambda itself is primarily an event-driven service, but there are many other services in the AWS catalog that buy into this idea - from EventBridge to Step Functions to the huge number of messaging services within the AWS cloud.

Event-driven architecture offers many benefits: reduced coupling, adaptive scaling, re-use, even reduced costs. I think event-driven architecture is often a great solution.

But there’s a problem - we’ve been here before as an industry, and event-driven hasn’t stuck. Back in the early 2000s messaging-based systems had all the hype, frequently driven by vendor tools with large marketing budgets. My friend and former colleague Gregor Hohpe, along with Bobby Woolf, wrote a fantastic book in 2003 helping folk learn what was core to messaging systems beyond the vendor fluff, and how to build solid messaging-oriented applications. It’s a book that twenty years on is still full of wisdom that applies today. But most developers have never heard of it, nor most of the ideas within it.

A key problem, I think, to why event-driven architecture has never truly taken off is because it’s harder than the alternative of synchronous architecture, and developers aren’t taught asynchronous design as a fundamental technique. The first taste that developers have of distributed systems is often where one system calls another via a synchronous API, and it really doesn’t feel that different to one function calling another within the same process. There are some operational constraints to deal with (latency, error handling, security) but the fundamental shape of a system doesn’t change in the head of an engineer. A calls B calls C, which returns to B, which returns to A. Whether A, B, and C are in one process or three is not hugely important to functionality. But messaging based, asynchronous, systems are different.

When I look at what’s happening with AWS Serverless event-driven services my concern is not primarily about the capabilities of the platform - it’s about whether people will actually use it. After all, AWS Lambda can be used in a traditional synchronous way - an app calls API Gateway, which calls a Lambda function, which calls a database, and then back up the chain the responses go. A calls B calls C. Most of the larger projects I’ve seen that use Lambda architect their solutions primarily using API Gateway and synchronous interaction.

Which begs the question - is Serverless event-driven destined to be an architectural niche? Or something bigger? And if it is a niche would the larger Serverless community actually be better off focusing on traditional synchronous architecture?

Personally I’m optimistic enough that I’m investing my time in providing a good story around building asynchronous Serverless designs. But I also understand that there are enough other problems that teams have with Serverless right now that adding async to the mix is often too much to swallow.

Wrapping up

Overall I’m still happy with the trajectory of Serverless eight years after I first used it. More companies are embracing it, more engineers are learning it, and the capabilities of Serverless services grow by the month.

My primary concern with Serverless today is not technical - it’s educational. The large vendors - AWS especially - could significantly aid people’s understanding of how to build with Serverless if they address some of the concerns I’ve listed in this article.

For a larger discussion of this topic listen to the podcast I took part in on this subject. And, of course, feel free to email me at mike@symphonia.io, or you can find me on Mastodon.