3 posts tagged with "performance"

How do large language models get so large?

January 23, 2025 · 8 min read

Senior Cloud Whisperer

A majestic blue tiger riding on a sailing ship. The tiger is very large.

A majestic blue tiger riding on a sailing ship. The tiger is very large. Image generated using PonyXL.

AI models can get pretty darn large. Larger models seem to perform better than smaller models, but we don’t quite know why. My work MacBook has 64 gigabytes of RAM and I’m able to use nearly all of it when I do AI inference. Somehow these 40+ gigabyte blobs of floating point numbers are able to take a question about the color of the sky and spit out an answer. At some level this is a miracle of technology, but how does it work?

Today I’m going to cover what an AI model really is and the parts that make it up. I’m not going to cover the linear algebra at play nor any of the neural networks. Most people want to start with an off the shelf model, anyway.

Nomadic Infrastructure Design for AI Workloads

November 12, 2024 · 19 min read

Xe Iaso

Senior Cloud Whisperer

A nomadic server hunting down wild GPUs in order to save money on its cloud computing bill. Image generated with Flux [dev] from Black Forest Labs on fal.ai

A nomadic server hunting down wild GPUs in order to save money on its cloud computing bill. Image generated with Flux [dev] from Black Forest Labs on fal.ai.

Taco Bell is a miracle of food preparation. They manage to have a menu of dozens of items that all boil down to permutations of 8 basic items: meat, cheese, beans, vegetables, bread, and sauces. Those basic fundamentals are combined in new and interesting ways to give you the crunchwrap, the chalupa, the doritos locos tacos, and more. Just add hot water and they’re ready to eat.

Even though the results are exciting, the ingredients for them are not. They’re all really simple things. The best designed production systems I’ve ever used take the same basic idea: build exciting things out of boring components that are well understood across all facets of the industry (eg: S3, Postgres, HTTP, JSON, YAML, etc.). This adds up to your pitch deck aiming at disrupting the industry-disrupting industry.

A bunch of companies want to sell you inference time for your AI workloads or the results of them inferencing AI workloads for you, but nobody really tells you how to make this yourself. That’s the special Mexican Pizza sauce that you can’t replicate at home no matter how much you want to be able to.

Today, we’ll cover how you, a random nerd that likes reading architectural articles, should design a production-ready AI system so that you can maximize effectiveness per dollar, reduce dependency lock-in, and separate concerns down to their cores. Buckle up, it’s gonna be a ride.

We're making our availability metrics public

October 2, 2024 · 4 min read

Katie Schilling

DevRel Enthusiast

Xe Iaso

Senior Cloud Whisperer

At Tigris Data, we provide object storage to our users. People put bytes into our servers with a name, and expect that come hell and high water, when they put in the name, they get the exact same bytes back. This is a very high trust position to be in because when people ask themselves things like “Oh, what would happen if my object storage provider is unreliable”, that conversation usually involves phrases like “Maybe we should have gone with The Big Cloud afterall”.

Such conversations are rarely good for the business.

A battle rages on in the field, yet the strong oak tree remains unscathed

A battle rages on in the field, yet the strong oak tree remains unscathed