Using model weights in Tigris anywhere with fly.io
The most common way to deploy AI models in production is by using “serverless” inference. This means that every time you get a request, you don’t know what state the underlying hardware is in. You don’t know if you have your models cached, and in the worst case you need to do a cold start and download your model weights from scratch.
A couple fixable problems arise when running your models on serverless or any frequently changing infrastructure:
- Model distribution that's not optimized for latency causes needless GPU idle time as the model weights are downloaded to the machine on cold start. Tigris behaves like a content delivery network by default and is designed for low latency, saving idle time on cold start.
- Compliance restrictions like data sovereignty and GDPR increase complexity quickly. Tigris makes regional restrictions a one-line configuration, guide here.
- Reliance on third party caches for distributing models creates an upstream dependency and leaves your system vulnerable to downtime. Tigris guarantees 99.99% availability with public availability data.
Fly.io
Fly.io is a modern public cloud with datacentres all over the world. Some datacentres include GPU machines. For more information about using GPUs on fly.io, check out their documentation on how to do this.
Coincidentally, all of the regions that Fly.io has GPUs in also have Tigris in them, so pulling models is very fast (network line rate fast in many cases). You can use our discount code to complete this blueprint on us.
Usecase
You can put AI model weights into Tigris so that they are cached and fast no matter where you’re inferencing from. This allows you to have cold starts be faster and you can take advantage of Tigris' globally distributed architecture, enabling your workloads to start quickly no matter where they are in the world.
For this example, we’ll set up SDXL Lightning by ByteDance for inference with the weights stored in Tigris.
Getting Started
Download the sdxl-in-tigris
template from GitHub:
git clone https://github.com/tigrisdata-community/sdxl-in-tigris
Prerequisite tools
In order to run this example locally, you need these tools installed:
- Python 3.11
- pipenv
- The AWS CLI
Also be sure to configure the AWS CLI for use with Tigris: Configuring the AWS CLI.
To build a custom variant of the image, you need these tools installed:
- Mac/Windows: Docker Desktop app, alternatives such as Podman Desktop will not work.
- Linux: Docker daemon, alternatives such as Podman will not work.
- Replicate's cog tool
- jq
To install all of the tool depedencies at once, clone the template repo and run
brew bundle
.
Create a new bucket for generated images, it’ll be called generated-images
in
this article.
aws s3 create-bucket --acl private generated-images
Optional: upload your own model
If you want to upload your own models, create a bucket for this. It'll be
referred to as model-storage-demo
in this tutorial.
Both of these buckets should be private.
Then activate the virtual environment with pipenv shell
and install the
dependencies for uploading a model:
pipenv shell --python 3.11
pip install -r requirements.txt
Run the prepare_model
script to massage and upload a Stable Diffusion XL model
or finetune to Tigris:
python scripts/prepare_model.py ByteDance/SDXL-Lightning model-storage-demo
Want differently styled images? Try finetunes like
Kohaku XL! Pass the Hugging
Face repo name to the prepare_model
script like this:
python scripts/prepare_model.py KBlueLeaf/Kohaku-XL-Zeta model-storage-demo
Access keys
Create a new access key in the Tigris Dashboard. Don't assign any permissions to it.
Copy the access key ID and secret access keys into either your notes or a
password manager, you will not be able to see them again. These credentials will
be used later to deploy your app in the cloud. This keypair will be referred to
as the workload-keypair
in this tutorial.
Limit the scope of this access key to only the
model-storage-demo
(or a custom bucket if you're uploading your own models)
and generated-images
buckets.
Deploying it to Fly.io
Optional: building your own image
In order to deploy this, you need to build the image with the cog tool. Log into a Docker registry and run this command to build and push it:
cog push your-docker-username/sdxl-tigris --use-cuda-base-image false
Replace yasomimi/sdxl-tigris
in the below examples with your docker
repository.
Every fly.io resource lives inside an "app". Create a new app for this to live in:
fly apps create your-app-name-here
As a reminder, this example is configured with environment variables. Set the following environment variables in your deployments:
Envvar name | Value |
---|---|
AWS_ACCESS_KEY_ID | The access key ID from the workload keypair |
AWS_SECRET_ACCESS_KEY | The secret access key from the workload keypair |
AWS_ENDPOINT_URL_S3 | https://fly.storage.tigris.dev |
AWS_REGION | auto |
MODEL_PATH | ByteDance/SDXL-Lightning |
MODEL_BUCKET_NAME | model-storage-demo (Optional: replace with your own bucket name) |
PUBLIC_BUCKET_NAME | generated-images (replace with your own bucket name) |
Then create a GPU machine with an l40s GPU in it in Seattle:
fly machine run \
-a your-app-name-here \
--name sdxl-lightning \
-e AWS_ACCESS_KEY_ID=<workload-keypair-access-key-id> \
-e AWS_SECRET_ACCESS_KEY=<workload-keypair-secret-access-key> \
-e AWS_ENDPOINT_URL_S3=https://fly.storage.tigris.dev \
-e AWS_REGION=auto \
-e MODEL_BUCKET_NAME=model-storage-demo \
-e PUBLIC_BUCKET_NAME=generated-images \
-e MODEL_PATH=ByteDance/SDXL-Lightning \
--vm-gpu-kind l40s \
-r sea \
yasomimi/sdxl-tigris:latest:latest \
-- python -m cog.server.http --host ::
This will print a machine IP like this:
Machine started, you can connect via the following private ip
fdaa:0:641b:a7b:165:347b:d972:2
Then proxy to the machine:
fly proxy -a your-app-name-here \
5001:5000 \
fdaa:0:641b:a7b:165:347b:d972:2
Then you need to wait a few minutes while the machine sets itself up. It's done when it prints this line in the logs:
{"logger": "cog.server.probes", "timestamp": "2024-10-22T17:36:06.651457Z", "severity": "INFO", "message": "Not running in Kubernetes: disabling probe helpers."}
Do a test generation with this curl command:
curl "http://localhost:5001/predictions/$(uuidgen)" \
-X PUT \
-H "Content-Type: application/json" \
--data-binary '{
"input": {
"prompt": "The space needle in Seattle, best quality, masterpiece",
"aspect_ratio": "1:1",
"guidance_scale": 3.5,
"num_inference_steps": 4,
"max_sequence_length": 512,
"output_format": "png",
"num_outputs": 1
}
}'
If all goes well, you should get an image like this:
You can destroy the machine with this command:
fly machine destroy --force -a your-app-name-here sdxl-lightning