View this email in your browser

So much has been written about Lambda cold starts. It’s easily one of the most talked-about and yet, misunderstood topics when it comes to Lambda. Depending on who you talk to, you will likely get different advice on how best to reduce cold starts.

So in this post, I will share with you everything I have learned about cold starts in the last few years and back it up with some data.

But first -

What are Lambda cold starts?

Lambda automatically scales the number of workers (think containers) that are running your code based on traffic. A “cold start” is the 1st request that a new Lambda worker handles. This request takes longer to process because the Lambda service needs to:

  1. find a space in its EC2 fleet to allocate the worker
  2. initialize the worker
  3. initialize your function module

before it can pass the request to your handler function. If you want to learn more about this process, then check out this talk from re:invent 2019 (around the 5:46 mark).

You can see these steps in an X-Ray trace for a cold start, although step 1 and step 2 are not explicitly captured in the trace. We could, however, infer their duration from the available information. For example, in the X-Ray trace below, we can infer that step 1 and step 2 took a total of ~80ms, while step 3 took 116ms.

How long step 1 and step 2 takes are mostly outside of your control. It’s also an area where AWS has optimized aggressively. Over the years, this part of the cold start has improved significantly across all language runtimes. 12 months ago, this used to average around 150–200ms.

As such, module initialization (step 3) usually accounts for the bulk of cold start durations in the wild. When optimizing cold start durations, this is where you should focus on. As we’ll see later in the post, several factors can affect this initialization time and therefore the total roundtrip time for processing a request during a cold start.

When should you care about cold starts?

For many, cold starts are a non-issue because their primary workload is data processing, so spikes in latency don’t negatively impact the user experience.

Or maybe their traffic pattern is so uniform and stable that there are seldom spikes that cause a flurry of cold starts.

However, user behaviours are difficult to predict and traffic patterns can change over time. Also, even if the overall traffic confirms to a bell-curve it doesn’t mean that there are no unpredictable spikes at the individual function’s level where cold starts occur.

Which is why you should let data tell you whether or not cold starts is a problem for you, and where (as in, which functions). In the Lumigo dashboard, you can see at a glance the functions with the most cold starts. When you see functions with a high percentage of cold starts, such as the graphql-api-prod-listSports function below (with 57.36% of its invocations being cold starts), these are functions that you need to pay special attention to!

You can drill into each of these functions further and see how bad these cold starts are in terms of duration. After all, if the cold start duration is short then the cold starts have a much smaller impact on our user experience when they happen. The worst-case scenario is when the cold start duration is long and cold starts happen frequently!

Furthermore, you can set up alerts in Lumigo so you can be notified when your functions experience a high percentage of cold starts. This is a great way to keep an eye on those user-facing functions where you’re concerned about end-to-end latency. Maybe cold starts are not an issue for you today, but maybe they will be in the future if user behaviours and traffic patterns change. These alerts will tell you which functions are experiencing a high percentage of cold starts, and so you can prioritize your optimization efforts on those functions.

One way to eliminate cold starts altogether is to use Provisioned Concurrency, which you can read all about here. As I explained in the post, there are many caveats you need to consider when using Provisioned Concurrency and they do come with a certain amount of operational as well as cost overhead. Which is why they should be used as a last resort rather than your first step.

In most cases, it’s possible to optimize your function so that even the cold start durations fall within acceptable latency range (e.g. 99 percentile latency of 1 second). In this post, I’m going to explain the different factors that affect the cold start duration so you can formulate an optimization strategy that ACTUALLY works.

For instance, many people would you that you should use a higher memory size to improve cold start duration. Turns out, that doesn’t work, because of this one interesting caveat…

Module initialization is always run at full power

Kudos to Michael Hart for this nugget of information, even though he was clearly abusing it for fun & profit!

Because of this, adding more memory is NOT GOING TO improve your cold start time. This is further backed up by this analysis by Mikhail Shilkov, which also found that memory size has no meaningful impact on the cold start time for most language runtimes. However, .Net functions are the exceptions, and adding more memory does significantly improve their cold start times.

Another facet you need to consider is that…

There are two “types” of cold starts

Another wonderful insight from Michael Hart, who noticed that there are noticeable differences between:

  1. cold starts that happen immediately after a code change
  2. other cold starts (e.g. when Lambda needs to scale up the number of workers to match traffic demand)

Perhaps there are some additional steps that need to be performed during the first cold start after a code deployment. Hence why the first cold start after a code change takes longer than the other cold starts.

In practice, most of the cold starts you will see in the wild will be of the 2nd type and it’s where we should focus on. However, I was really intrigued by this discovery and ran several experiments myself.

The big fat experiment

In one such experiment, I measured the roundtrip duration for a few different functions:

  • control— a hello world function with no dependencies whatsoever.
module.exports.handler = async event => {
  return { 
    statusCode: 200, 
    body: '{}' 
  • AWS SDK is bundled but not required- the same function as control, but the deployment artifact includes the Node.js AWS SDK (even though the function doesn't actually require it), which results in a 9.5MB deployment artifact.
  • control with big assets - the same function as control, but the deployment artifact includes two large MP3 files, which results in a 60.2MB deployment artifact.
  • require bundled AWS SDK- a function that requires the AWS SDK duration module initialization. This function bundles the AWS SDK as part of its deployment artifact (9.5MB).
const AWS = require('aws-sdk')
module.exports.handler = async event => { 
  return { 
    statusCode: 200, 
    body: '{}' 
  • require AWS SDK via Layer- the same function as require bundled AWS SDK but the AWS SDK is not bundled in the deployment artifact. Instead, the AWS SDK is injected via a Lambda layer.
  • require built-in AWS SDK- the same function as require bundled AWS SDK but the AWS SDK is not bundled in the deployment artifact. Instead, it's using the AWS SDK that is included in the Lambda execution environment.

For each of these functions, I collected 200 data points for the post deploy cold starts (type 1) and 1000 data points for the other cold starts (type 2). The results are as follows.

There are a few things you can learn from these data.

Type 1 is slower

Type 1 cold starts (immediately after a code deployment) consistently take longer than type 2, especially as we look at the tail latencies (p99).

Deployment artifact size matters

The size of the artifact has an impact on cold start even if the function does not actively require them. The following three tests all have the same function code:

module.exports.handler = async event => { 
  return { 
    statusCode: 200, 
    body: '{}' 

The only difference is in the size of the deployment artifact. As you can see below, bundling the Node.js AWS SDK in the deployment artifact adds 20–60ms to the roundtrip latency for a cold start. But when that artifact gets much bigger, so too does the latency impact.

When the artifact is 60MB, this adds a whopping 250–450ms!

So, deployment size does impact cold start, but the impact is somewhat minimal if it’s just the AWS SDK.

Where the dependency is loaded from matters

Often times, the AWS SDK is an unavoidable dependency. But turns out where the AWS SDK comes from matters too. It’s fastest to use the AWS SDK that’s built into the Lambda execution environment. Interestingly, it’s also much faster to load the AWS SDK via Layers than it is when you bundle it in the deployment artifact! The difference is much more significant than the aforementioned 20–60ms, which suggests that there are additional factors at play.

Before you decide to never bundle the AWS SDK in your deployment artifacts, there are other factors to consider.

For example, if you use the built-in AWS SDK then you effectively lose immutable infrastructure. There have also been instances when people’s functions suddenly break when AWS upgraded the version of the AWS SDK. Read this post for more details.

If you use Lambda layers, then you must carry additional operational overhead since the Lambda layer requires a separate deployment and you still have to update every function that references this layer. Read this post for why Lambda layer is not a silver bullet and should be used sparingly.

That being said, for serverless framework users, there is a clever plugin called serverless-layers which sidesteps a lot of the operational issues with Lambda Layers. Effectively, it doesn’t use Layers as a way to share code but use it purely as an optimization. During each deployment it checks if your dependencies have changed, and if so, package and deploy the dependencies as a Lambda layer (just for that project) and update all the functions to reference the layer.

But wait! There’s more.

What you require matters

Just what is taking the time when this line of code runs during module initialization?

const AWS = require('aws-sdk')

Behind the scenes, the Node runtime must resolve the dependency and check if aws-sdk exists in any of the paths on the NODE_PATH. And when the module folder is found, it has to run the initialization logic on the aws-sdk module and resolve all of its dependencies and so on.

All these takes CPU cycles and filesystem IO calls, and that’s where we incur the latency overhead.

So, if your function just needs the DynamoDB client then you can save yourself a lot of cold start time by requiring ONLY the DynamoDB client.

const DynamoDB = require('aws-sdk/clients/dynamodb')

And since a lot of the cold start time is going towards resolving dependencies, what if we remove the need for runtime dependency resolution altogether?

The webpack effect

By using a bundler like webpack, we can resolve all the dependencies ahead of time and shake them down to only the code that we actually need.

This creates savings in two ways:

  • smaller deployment artifact
  • no runtime resolution

And the result is awesome!

So, if you’re running Node.js and want to minimize your Lambda cold start time. Then the most effective thing you can do is to be mindful of what you require in your code and then apply webpack. It addresses several of the contributing factors to cold time latency simultaneously.

For the Serverless framework users out there, you can use the serverless-webpack plugin to do this for you.

For Java functions, have a look at this post by CapitalOne on some tips for reducing cold starts.

Eliminating cold starts

If it’s not enough to make cold starts faster and you must eliminate them altogether, then you can use Provisioned Concurrency instead. This might be the case if you have a strict latency requirement, or maybe you have to deal with inter-microservice calls where cold starts can stack up. Whatever the case, you should check out this post to see how Provisioned Concurrency works and some caveats to keep in mind when you use it.

As mentioned before, Lumigo offers a lot of tools to help you identify functions that are worst affected by cold starts and are therefore the prime candidates for Provisioned Concurrency. The Lumigo dashboard is usually my first port-of-call when I want to find out which functions to use Provisioned Concurrency on and how many instances of Provisioned Concurrency I need.

Click here to register a free account with Lumigo and let us help you eliminate Lambda cold starts.

This post first appeared on on September 20, 2020.
Copyright © 2020 theburningmonk limited, All rights reserved.

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.