How Beam runs GPUs anywhere

December 12, 2024 · 6 min read

DevRel Enthusiast

What do you do when you need to serve up a completely custom, 7+ billion parameter model with sub 10 second cold start times? And without writing a Dockerfile or managing scaling policies yourself. It sounds impossible, but Beam's serverless GPU platform provides performant, scalable AI infrastructure with minimal configuration. Your code already does the AI inference in a function. Just add a decorator to get that function running somewhere in the cloud with whatever GPU you specify. It turns on when you need it, it turns off when you don't. This can save you orders of magnitude over running a persistent GPU in the cloud.

Tigris tiger watching a beam from a ground satellite. Image generated with Flux [dev] from Black Forest Labs on fal.ai

Tigris tiger watching a beam from a ground satellite. Image generated with Flux [dev] from Black Forest Labs on fal.ai.

Fast iteration loops for hyper productive developers

Beam was founded to reduce cost and iteration time when developing AI / ML applications. Using $bigCloud and Docker directly is too slow for iterative development, and waiting for a new image to build and redeploy to a dev environment just takes too long. Let alone the time it takes to setup and manage your own GPU cluster. Beam hot reloads your code to a live inference server for testing and provides single command deployments, beam deploy. And they integrate with workflow tools like ComfyUI so development is even easier.

AI workloads are at the bleeding edge for basically every part of the stack. For the best results you need top-of-the line hardware, drivers that should work (but are still being proven), the newest kernel and userland you can get, and management of things that your sysadmin/SRE teams are not the most familiar with yet. Much like putting CPU-bound workloads into the cloud, serverless GPUs offload the driver configuration, hardware replacement, and other parts of how it all works to the platform so you can focus on doing what you want to.

The big tradeoff with serverless GPUs is between cost and latency. If you want to use the cheapest GPU available, it's almost certainly going to not be close to your user or the data. The speed of light is only so fast, sending 50GB to another continent is going to be slow no matter what you do. Every time you have to do that you rack up longer cold start times, burn more cloud spend on expensive GPU minutes, and incur yet more egress fees.

Challenge

The challenge? Beam’s serverless GPU platform ran largely on one cloud, and they needed a cost-efficient way to more flexibly spread their compute across many clouds. All while offering consistent developer experience: <10s cold starts, as fast as 50ms warm starts, and limitless horizontal scale. Luke Lombardi, CTO & Co-Founder of Beam, shares, “We’re a distributed GPU cloud: one of the core things we need is cross region consistency. We don't want to think about it when we move to new regions. It's critical that all of our storage infrastructure is just there and the latency is predictable.”

Another notable challenge was strategic: Beam wanted to reduce lock-in to the major cloud providers so they could run their compute anywhere. And they wanted to open source much of their codebase so their platform interface could be used by anyone, anywhere, on any bare metal server. Flexibility was key. They needed decouple compute from storage and build a highly flexible storage layer with cross region replication and global distribution.

Solution

The Beam engineering team knew the architectural importance of separating the storage layer, and they knew egress costs were a non-starter. Their primary need was performant data distibution across clouds. However, none of the object storage solutions they tried made it easy to manage multi-cloud, or even multi-region, workloads without extensive configuration, coding around file type restrictions, or accounting for dips in availability and latency. Using Tigris eliminated this undifferentiated work, and enabled Beam to ship faster.

Quote

Before finding Tigris, we were planning on implementing our own data distribution and replication for our training and inference workloads. Now we just point our code to a single endpoint and Tigris handles it all. We've saved months of work, not to mention the maintenance cost & peace of mind.

—Luke Lombardi, Co-founder & CTO, Beam

Choosing Flexibility

Tigris provided a reliable and performant storage layer to complement Beam’s compute layer, enabling them to spread their workloads across many public clouds to take advantage of cost and GPU availability. Luke shared, “We're mostly benefiting from moving our object storage outside of the major cloud providers and having it as a separate managed service. The egress costs are important, but even just the fact that we're not locked into S3 is a good thing for us too.“

As a result of re-platforming and separating their storage and compute, Beam was able to open source their platform, beta9. Now users can self-host Beam and add their own compute with the same CLI experience across any cloud, on-prem, or hybrid architecture. This flexibility is a key difference between Beam and other serverless GPU providers: traditionally serverless has been a high lock-in architecture, but with beta9 and the self-hosted CLI, your workloads are portable.

Designing the Storage Layer to be Object Storage Native

After using another filesystem solution that wasn’t backed by object storage, Luke said, “When we were switching to another filesystem, we knew it had to be backed by object storage.” For AI workloads especially, data expands rapidly, sometimes beyond the available ram or even the available volume. Beam handles very large models and training data sets with ease, using Tigris to dynamically move the data based on access patterns. Since the objects are already in the region where you need them, it’s quick to load models and hydrate training data sets to fast-access volumes as needed.

Do One Thing, and Do It Right

When designing their storage layer, Luke didn’t want to worry about troubleshooting intermittent 500s or hunting down unexpected bottlenecks. Storage should just work ™️ and be consistent across clouds. He said, “I felt much more comfortable knowing that we’re using a product where the team just does object storage all day.”

Want to try it out?

Make a global bucket with no egress fees

Get Started

Fast iteration loops for hyper productive developers​

Challenge​

Solution​

Choosing Flexibility​

Designing the Storage Layer to be Object Storage Native​

Do One Thing, and Do It Right​