Supplemental resources for Affording your AI chatbot friends

Senior Cloud Whisperer

Hey! Thanks for catching my talk Affording your AI chatbot friends. Here's some resources that were mentioned during the talk.

The title slide for the talk

One of the big topics I brought up was Nomadic Compute. If you're interested in learning more about that, check out the blog post! I also did a talk at a local AI meetup about the philosophy behind it in case you learn better through video.

You don't have to pay for workloads that aren't doing anything! Scale down your AI workloads when they're not in use!

It's okay to use the cloud, just make sure you have an exit strategy.

The only specs that matter for generative AI:

Model parameter count
Model parameter size
Overhead when performing inference
GPU memory amount
GPU memory bandwidth
GPU model year

Models:

Facebook's Llama series - a series of large language models that is usually good enough for most tasks
Nous Research Hermes 3 - the models I use for my chatbots, but may not be suitable for all tasks
DeepSeek R1 and V3 - very large models that score very well for their efficiency per watt

Filter models:

Facebook's Llama Guard - a filter model that can be used to filter out content that doesn't meet your standards
Google's ShieldGemma - like Llama Guard but by Google and in more sizes

Inference Engines:

Ollama - like docker for large language models
Llama.cpp - a lower level library and inference runtime for large language models across multiple platforms
vllm - a large language model to OpenAI API converter

Nomadic Compute tools:

SkyPilot - describe what you want, it picks the cheapest infrastructure for you
PostgreSQL - a database that is so boring that it requires no special setup for any stack, use this for structured data
Tigris - multi-cloud native object storage for files, datasets, and other unstructured data

Terms:

AI Agent: A model that has access to tools that allow it to interact with the outside world.
AI Model: A large bag of floating point numbers that generates new output from user input.
Inference Engine: A program that runs AI models. This is called inference for historical reasons because the model "infers" what comes next.
Nomadic Compute: Structuring your workloads to not rely on specialized features of any platform so that it's easily portable between them.

Other tools and platforms referenced in the talk:

Beam - a platform for deploying and managing serverless AI infrastructure
fal - easy API access for open-weights models
Fluidstack - a platform for deploying and managing GPU infrastructure with native SSH access
pgvector - a PostgreSQL extension that turns Postgres into a vector database (read: AI model powered search engine)
RunPod - a platform that makes it easy to spin up AI workloads in the cloud
Vast.ai - a bid-acquired GPU marketplace for getting very cheap GPUs for your AI workloads

Thanks for watching the talk! I'll have the video up by late next week and it'll be embedded here. If you have any questions, feel free to reach out to me on LinkedIn.