Using model weights in Tigris anywhere with Vast.ai

The most common way to deploy AI models in production is by using “serverless” inference. This means that every time you get a request, you don’t know what state the underlying hardware is in. You don’t know if you have your models cached, and in the worst case you need to do a cold start and download your model weights from scratch.

A couple fixable problems arise when running your models on serverless or any frequently changing infrastructure:

Model distribution that's not optimized for latency causes needless GPU idle time as the model weights are downloaded to the machine on cold start. Tigris behaves like a content delivery network by default and is designed for low latency, saving idle time on cold start.
Compliance restrictions like data sovereignty and GDPR increase complexity quickly. Tigris makes regional restrictions a one-line configuration, guide here.
Reliance on third party caches for distributing models creates an upstream dependency and leaves your system vulnerable to downtime. Tigris guarantees 99.99% availability with public availability data.

Vast.ai

Vast.ai is a marketplace for buying and selling GPU compute time. You can request time on GPUs provided by the Vast.ai community and run them for cheap (typically pennies on the dollar in comparison to other cloud providers). There are tradeoffs in terms of data integrity and the GPUs available, but it is one of the best places to get GPU compute on consumer grade cards for cheap.

To get started, create an account on Vast.ai and load it with credit if you don't have one already. You should need at least $5 of credit to complete this blueprint.

Usecase

You can put AI model weights into Tigris so that they are cached and fast no matter where you’re inferencing from. This allows you to have cold starts be faster and you can take advantage of Tigris' globally distributed architecture, enabling your workloads to start quickly no matter where they are in the world.

For this example, we’ll set up SDXL Lightning by ByteDance for inference with the weights stored in Tigris.

Getting Started

Download the sdxl-in-tigris template from GitHub:

git clone https://github.com/tigrisdata-community/sdxl-in-tigris

Prerequisite tools

In order to run this example locally, you need these tools installed:

Python 3.11
pipenv
The AWS CLI

Also be sure to configure the AWS CLI for use with Tigris: Configuring the AWS CLI.

To build a custom variant of the image, you need these tools installed:

Mac/Windows: Docker Desktop app, alternatives such as Podman Desktop will not work.
Linux: Docker daemon, alternatives such as Podman will not work.
Replicate's cog tool
jq

To install all of the tool depedencies at once, clone the template repo and run brew bundle.

Create a new bucket for generated images, it’ll be called generated-images in this article.

aws s3 create-bucket --acl private generated-images

Optional: upload your own model

If you want to upload your own models, create a bucket for this. It'll be referred to as model-storage-demo in this tutorial.

Both of these buckets should be private.

Then activate the virtual environment with pipenv shell and install the dependencies for uploading a model:

pipenv shell --python 3.11
pip install -r requirements.txt

Run the prepare_model script to massage and upload a Stable Diffusion XL model or finetune to Tigris:

python scripts/prepare_model.py ByteDance/SDXL-Lightning model-storage

info

Want differently styled images? Try finetunes like Kohaku XL! Pass the Hugging Face repo name to the prepare_model script like this:

python scripts/prepare_model.py KBlueLeaf/Kohaku-XL-Zeta model-storage

Access keys

Create a new access key in the Tigris Dashboard. Don't assign any permissions to it.

Copy the access key ID and secret access keys into either your notes or a password manager, you will not be able to see them again. These credentials will be used later to deploy your app in the cloud. This keypair will be referred to as the workload-keypair in this tutorial.

Limit the scope of this access key to only the model-storage-demo (or a custom bucket if you're uploading your own models) and generated-images buckets.

Deploying it to Vast.ai

Optional: building your own image

In order to deploy this, you need to build the image with the cog tool. Log into a Docker registry and run this command to build and push it:

cog push your-docker-username/sdxl-tigris --use-cuda-base-image false

Replace yasomimi/sdxl-tigris in the below examples with your docker repository.

In your virtual environment that you used to optimize your model, install the vastai CLI tool:

pip install --upgrade vastai;

Follow Vast.ai's instructions on how to load your API key.

Then you need to find an instance. This example requires a GPU with 12 GB of vram. Use this command to find a suitable host:

vastai search offers 'verified=true cuda_max_good>=12.1 gpu_ram>=12 num_gpus=1 inet_down>=850' -o 'dph+'

The first column is the instance ID for the launch command. You can use this to assemble your launch command. It will be made up out of the following:

The docker image name you pushed to the docker hub (or another registry)
The "environment" string with your exposed ports and environment variables
A signal to the runtime that we need 48 GB of disk space to run this app
The onstart command telling the runtime to start the cog process

As a reminder, this example is configured with environment variables. Set the following environment variables in your deployments:

Envvar name	Value
`AWS_ACCESS_KEY_ID`	The access key ID from the workload keypair
`AWS_SECRET_ACCESS_KEY`	The secret access key from the workload keypair
`AWS_ENDPOINT_URL_S3`	`https://fly.storage.tigris.dev`
`AWS_REGION`	`auto`
`MODEL_PATH`	`ByteDance/SDXL-Lightning`
`MODEL_BUCKET_NAME`	`model-storage-demo` (Optional: replace with your own bucket name)
`PUBLIC_BUCKET_NAME`	`generated-images` (replace with your own bucket name)

Format all of your environment variables as you would in a docker run command. EG:

"-p 5000:5000 -e AWS_ACCESS_KEY_ID=<workload-keypair-access-key-id> -e AWS_SECRET_ACCESS_KEY=<workload-keypair-secret-access-key> -e AWS_ENDPOINT_URL_S3=https://fly.storage.tigris.dev -e AWS_REGION=auto -e MODEL_BUCKET_NAME=model-storage-demo -e MODEL_PATH=ByteDance/SDXL-Lightning -e PUBLIC_BUCKET_NAME=generated-images"

Then execute the launch command:

vastai create instance \
  <id-from-search> \
  --image yasomimi/sdxl-tigris:latest:latest \
  --env "-p 5000:5000 -e AWS_ACCESS_KEY_ID=<workload-keypair-access-key-id> -e AWS_SECRET_ACCESS_KEY=<workload-keypair-secret-access-key> -e AWS_ENDPOINT_URL_S3=https://fly.storage.tigris.dev -e AWS_REGION=auto -e MODEL_BUCKET_NAME=model-storage-demo -e MODEL_PATH=ByteDance/SDXL-Lightning -e PUBLIC_BUCKET_NAME=generated-images" \
  --disk 48 \
  --onstart-cmd "python -m cog.server.http"

It will report success with a message like this:

Started. {'success': True, 'new_contract': 13288520}

The new_contract field is your instance ID.

Give it a moment to download and start up. If you want to check on it, run this command:

vastai logs <instance-id>

It's done when it prints this line in the logs:

{"logger": "cog.server.probes", "timestamp": "2024-10-22T17:36:06.651457Z", "severity": "INFO", "message": "Not running in Kubernetes: disabling probe helpers."}

Then fetch the IP address and port for your app with this command:

vastai show instance <instance-id> --raw | jq -r '"\(.public_ipaddr):\(.ports["5000/tcp"][0].HostPort)"'

Finally, run a test generation with this curl command:

curl "http://ip:port/predictions/$(uuidgen)" \
  -X PUT \
  -H "Content-Type: application/json" \
  --data-binary '{
    "input": {
        "prompt": "The space needle in Seattle, best quality, masterpiece",
        "aspect_ratio": "1:1",
        "guidance_scale": 3.5,
        "num_inference_steps": 4,
        "max_sequence_length": 512,
        "output_format": "png",
        "num_outputs": 1
    }
}'

If all goes well, you should get an image like this:

The word 'success' in front of the Space Needle

You can destroy the machine with this command:

vastai destroy instance <instance-id>

Vast.ai​

Usecase​

Getting Started​

Access keys​

Deploying it to Vast.ai​

Vast.ai

Usecase

Getting Started

Access keys

Deploying it to Vast.ai