AJ Stuyvenberg's Avatar

AJ Stuyvenberg

@ajs.bsky.social

AWS Hero Staff Eng @ Datadog Streaming at: twitch.tv/aj_stuyvenberg Videos at: youtube.com/@astuyve I write about serverless minutia at aaronstuyvenberg.com/

1,859 Followers  |  238 Following  |  320 Posts  |  Joined: 24.04.2023  |  1.6825

Latest posts by ajs.bsky.social on Bluesky

The faster your cold starts are, the cheaper these will be!

05.08.2025 17:51 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Lambda now charges for init time, so it's useful to count sandboxes which are proactively initialized but never receive a request.

Here's what happens after a 10k request burst. Hundreds of sandbox shutdowns, along with 22 sandboxes which were spun up but never received a request.

05.08.2025 17:51 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Happy Lambda Init Billing day to those who celebrate. Fix your cold starts!

01.08.2025 15:04 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Yeah!

01.08.2025 08:43 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
AWS Lambda response streaming now supports 200 MB response payloads - AWS Discover more about what's new at AWS with AWS Lambda response streaming now supports 200 MB response payloads

aws.amazon.com/about-aws/wh...

01.08.2025 01:11 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

NEW: Lambda can now send up to 200mb payloads using response streaming! I assume this is mostly directed at LLM inference workloads, where chatbots can stream large amounts of data over the wire as it becomes available.

01.08.2025 01:10 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Preview
Monitor Lambda-hosted web apps with the Lambda Web Adapter integration | Datadog Learn how Datadog makes it easy to monitor legacy web apps running in AWS Lambda by automatically capturing logs, metrics, and traces through the Lambda Web Adapter.

www.datadoghq.com/blog/monitor...

30.07.2025 17:43 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

That said, I'm excited to share that @Datadog's Serverless monitoring product now supports LWA!

Thanks to Harold and AWS Labs for collaborating with us on the PRs, and huge thanks to Alex Gallotta for driving this work.

30.07.2025 17:43 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

I've long been an advocate for the Lambda Web Adapter project which lets anyone pretty easily ship an app to Lambda without learning about the event model/API.

Honestly AWS should simply support this natively.

30.07.2025 17:43 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Operational Excellence Is the Moat with Sam Lambert
Today, Sam Lambert from Planetscale is back for a third time. Planetscale just announced Planetscale Postgres, so we had to get Sam back to tell us how and why they decided to add support for Postgres. It's always great to have Sam on -- he brings great stories about real customers and honest insight about the state of the database industry. In this episode, we talk about the road to Postgres and how operational excellence is the only true advantage in database providers. Sam walks us through the current Planetscale Postgres offering, along with details on Nova, a new sharded Postgres project that Planetscale is working on. Along the way, we get updates on Planetscale Metal, how demand has been for Planetscale Postgres, and future plans for Planetscale. *Timestamps* 01:16 Start 06:37 The Timeline 15:15 Not Much IP in the Database Market 21:48 PSBouncer 24:17 Zonal affinity 27:38 Query Insights 29:34 How to sign up 32:02 Convex 34:37 Other data stores? 56:18 Acquisitions Operational Excellence Is the Moat with Sam Lambert

I'm a big fan of continuous profiling/measuring your software against real world use cases. This is also how I often learn about in to new system changes in AWS early, heh.

Great episode of Software Huddle w/ @alexbdebrie: www.youtube.com/watch?v=JAw9...

28.07.2025 14:59 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

"We run benchmarks continually across all of our competitors, not just queries - even connections, ensuring we don't add any latency at all." @isamlambert

Performance is such a competitive advantage which easily slips away if you're not constantly paying attention to it.

28.07.2025 14:59 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Datadog rewrote its AWS Lambda Extension from #Golang to #Rustlang with no prior Rust experience. @ajs.bsky.social will share how they achieved an 80% Lambda cold start improvement along with a 50% memory footprint reduction at our free and virtual #P99CONF. www.p99conf.io?latest_sfdc_...

#ScyllaDB

23.07.2025 14:17 โ€” ๐Ÿ‘ 6    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0


In our case the secret is a Datadog API key which isn't required until we actually flush data, so deferring it to that point saves us over 50ms.

22.07.2025 15:31 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Here's another 33% cold start reduction, which comes from deferring expensive decryption calls made to AWS Secrets Manager until the secret is actually needed.

Lazy loading is great!

22.07.2025 15:31 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

50% lol was not reading the profile carefully

18.07.2025 16:45 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

By switching to a memory arena, we preallocate a slab of memory and virtually eliminate the linear growth of malloc syscalls, which cuts down kernel mode switches, improving latency.

Thank you profiling (and jemalloc)!

18.07.2025 15:07 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image Post image

Here's how to visualize a 100% memory allocation improvement!

A recent stress test revealed that malloc calls bottlenecked when sending > 100k spans through the API and aggregator pipelines in Lambda.

18.07.2025 15:07 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

Now writing a job to a log and then using a subscription filter to run them async is deeply fucking cursed though omg

17.07.2025 17:01 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I think OP's intentions were pretty pure until they felt they were mistreated by AWS. So many people end up taking to social media in those instances so in my opinion it was mostly fine.

17.07.2025 17:01 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Thanks Corey!

17.07.2025 16:41 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

cc @quinnypig.com, as I saw this post in your newsletter

17.07.2025 16:25 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Does AWS Lambda have a silent crash in the runtime? Understanding whatโ€™s happening in the โ€œAWS Lambda Silent Crashโ€ blog post, what went wrong, and how to fix it

aaronstuyvenberg.com/posts/does-l...

17.07.2025 15:29 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

NEW: A recent blog post went viral in the AWS ecosystem, about how there's a silent crash in AWS Lambda's NodeJS runtime.

Today I'll step you through the actual Lambda runtime code which causes this confusing issue, and walk you through how to safely perform async work in Lambda:

17.07.2025 15:29 โ€” ๐Ÿ‘ 7    ๐Ÿ” 0    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 0


I assume this helps with capacity, especially as so many functions are triggered on schedules for the top of the hour or on a routine call schedule.

I've long suspected that I could get faster cold starts/placements by scheduling a function at the 58th minute instead of the top of the hour.

14.07.2025 16:07 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Now after only ~4 invocation/shutdown cycles, Lambda shuts down the sandbox 1 minute after my request:

14.07.2025 16:07 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Lambda's fleet management shutdown algorithm is learning faster!

I'm calling this function every 8 or so minutes. At first the gap from invocation to shutdown is about 5-6 minutes, which was the fastest I've observed during previous experiments.

14.07.2025 16:07 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

NEW: AWS is rolling out a new free tier beginning July 15th!!

New accounts get $100 in credits to start and can earn $100 exploring AWS resources. You can now explore AWS without worrying about incurring a huge bill, this is great!

docs.aws.amazon.com/awsaccountbi...

11.07.2025 15:23 โ€” ๐Ÿ‘ 7    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image Post image

You should care about your p99! By improving the function cold start time, the service on the left performed:
2x faster in RPS and thus, duration.
p99 from 1.52s -> .949s

The code and functionality is identical, but improving the cold start from 816ms to 301ms made all the difference.

10.07.2025 15:36 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Quick PSA to make sure you're using a DLQ and setting a max receive count for SQS, otherwise you may find yourself looking at a flamegraph like this.

Hundreds of attempts, multiple messages in queue and not burning down and average age of message ticking up! Seems common knowledge, but...

08.07.2025 15:55 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

of client devices, or used over webrtc.

Check it out!

03.07.2025 15:16 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@ajs is following 20 prominent accounts