Hey! Thanks for catching my talk Affording your AI chatbot friends. Here's some resources that were mentioned during the talk.
One of the big topics I brought up was Nomadic Compute. If you're interested in learning more about that, check out the blog post! I also did a talk at a local AI meetup about the philosophy behind it in case you learn better through video.
You don't have to pay for workloads that aren't doing anything! Scale down your AI workloads when they're not in use!
It's okay to use the cloud, just make sure you have an exit strategy.
The only specs that matter for generative AI:
- Model parameter count
- Model parameter size
- Overhead when performing inference
- GPU memory amount
- GPU memory bandwidth
- GPU model year
Models:
- Facebook's Llama series - a series of large language models that is usually good enough for most tasks
- Nous Research Hermes 3 - the models I use for my chatbots, but may not be suitable for all tasks
- DeepSeek R1 and V3 - very large models that score very well for their efficiency per watt
Filter models:
- Facebook's Llama Guard - a filter model that can be used to filter out content that doesn't meet your standards
- Google's ShieldGemma - like Llama Guard but by Google and in more sizes
Inference Engines:
- Ollama - like docker for large language models
- Llama.cpp - a lower level library and inference runtime for large language models across multiple platforms
- vllm - a large language model to OpenAI API converter
Nomadic Compute tools:
- SkyPilot - describe what you want, it picks the cheapest infrastructure for you
- PostgreSQL - a database that is so boring that it requires no special setup for any stack, use this for structured data
- Tigris - multi-cloud native object storage for files, datasets, and other unstructured data
Terms:
- AI Agent: A model that has access to tools that allow it to interact with the outside world.
- AI Model: A large bag of floating point numbers that generates new output from user input.
- Inference Engine: A program that runs AI models. This is called inference for historical reasons because the model "infers" what comes next.
- Nomadic Compute: Structuring your workloads to not rely on specialized features of any platform so that it's easily portable between them.
Other tools and platforms referenced in the talk:
- Beam - a platform for deploying and managing serverless AI infrastructure
- fal - easy API access for open-weights models
- Fluidstack - a platform for deploying and managing GPU infrastructure with native SSH access
- pgvector - a PostgreSQL extension that turns Postgres into a vector database (read: AI model powered search engine)
- RunPod - a platform that makes it easy to spin up AI workloads in the cloud
- Vast.ai - a bid-acquired GPU marketplace for getting very cheap GPUs for your AI workloads
Thanks for watching the talk! I'll have the video up by late next week and it'll be embedded here. If you have any questions, feel free to reach out to me on LinkedIn.