vLLM Use Cases
vLLM Use Cases
Updated on 05 Jun 2025

Step 1: Create a GPU Container using vllm-openai template

In the Environment Variables field, customize the value to match the API key (use for inferencing request) and your Hugging Face token to download model from Hugging Face.

In this tutorial, we are using Deepseek-R1-Distill-Qwen-1.5B. Please replace the value of MODEL with any other model you prefer for inference. Alt text

Please remember to replace the value of your Hugging Face token into the HF_TOKEN field.

Alt text

Alt text

Step 2: Testing using Postman. Use your API_Token added in Step 1 to authorize

{HTTP Endpoint}/v1/completions

Alt text