A nomadic server hunting down wild GPUs in order to save money on its
cloud computing bill. Image generated with Flux [dev] from Black Forest
Labs on fal.ai.
Taco Bell is a miracle of food preparation. They manage to have a menu of dozens
of items that all boil down to permutations of 8 basic items: meat, cheese,
beans, vegetables, bread, and sauces. Those basic fundamentals are combined in
new and interesting ways to give you the crunchwrap, the chalupa, the doritos
locos tacos, and more. Just add hot water and they’re ready to eat.
Even though the results are exciting, the ingredients for them are not. They’re
all really simple things. The best designed production systems I’ve ever used
take the same basic idea: build exciting things out of boring components that
are well understood across all facets of the industry (eg: S3, Postgres, HTTP,
JSON, YAML, etc.). This adds up to your pitch deck aiming at disrupting the
industry-disrupting industry.
A bunch of companies want to sell you inference time for your AI workloads or
the results of them inferencing AI workloads for you, but nobody really tells
you how to make this yourself. That’s the special Mexican Pizza sauce that you
can’t replicate at home no matter how much you want to be able to.
Today, we’ll cover how you, a random nerd that likes reading architectural
articles, should design a production-ready AI system so that you can maximize
effectiveness per dollar, reduce dependency lock-in, and separate concerns down
to their cores. Buckle up, it’s gonna be a ride.