About Us
Highlights FPT Cloud Server FPT AI Factory FPT Network FPT Cloud Backup & DR FPT Storage FPT Security FPT Container FPT Database FPT Cloud Monitoring FPT Integration FPT Data Suite FPT.AI

Show all

Object Storage

Secure, unlimited storage to ensures efficiency as well as high and continuous data access demand.

GPU Server

Virtual server integration for 3D Rendering, AI or ML

FPT Load Balancing

Enhance application capacity and availability.

FPT AI Factory

Secure your exclusive advance order today and gain priority access to an all-inclusive stack for AI development, driven by NVIDIA’s powerful technology!

Cloud WAF

FPT Web Application Firewall provides powerful protection for web applications

Cloud Server

Advanced virtual server with rapid scalability

Backup Service

Backup and restore data instantly, securely and maintain data integrity.

Cloud Server

Advanced virtual server with rapid scalability

FPT AI Factory

Secure your exclusive advance order today and gain priority access to an all-inclusive stack for AI development, driven by NVIDIA’s powerful technology!

FPT Load Balancing

Enhance application capacity and availability.

Backup Service

Backup and restore data instantly, securely and maintain data integrity.

Disaster Recovery Service

Recovery, ensuring quick operation for the business after all incidents and disasters.

Block Storage

Diverse throughput and capacity to meet various business workloads.

Object Storage

Secure, unlimited storage to ensures efficiency as well as high and continuous data access demand.

Cloud WAF

FPT Web Application Firewall provides powerful protection for web applications

FPT Cloud WAPPLES

Intelligent and Comprehensive Virtual Web Application Firewall - Security Collaboration between FPT Cloud and Penta Security.

Next-Gen Firewall

The Next generation firewall security service

Container Registry

Easily store, manage, deploy, and secure Container images

Kubernetes Engine

Safe, secure, stable, high-performance Kubernetes platform

FPT Database for MongoDB

Provided as a service to deploy, monitor, backup, restore, and scale MongoDB databases on cloud.

FPT Database for Redis

Provided as a service to deploy, monitor, backup, restore, and scale Redis databases on cloud.

PostgreSQL Database Engine

Provided as a service to deploy, monitor, backup, restore, and scale PostgreSQL databases on cloud.

API Management

The service automatically initiates, maintains, manages, and protects APIs of any sizes.

FPT Data Suite

Helps reduce operational costs by up to 40% compared to traditional BI solutions, while improving efficiency through optimized resource usage and infrastructure scaling.
Pricing
Partner
- Tech News
- White Paper
Event

Service

Cloud Server

FPT AI Factory

FPT Load Balancing

API Management

FPT Data Suite

Cloud Insights

ENG

Tiếng Việt English 中文 (中国) 日本語

All documents

Managed – FPT Kubernetes Engine

FPT Monitoring

Incident Management

Billing AI Factory

Billing

AI Marketplace

AI Inference

AI Studio

FPT AI Inference

AI Inference

AI Infrastructure

FPT Security

FPT Cloud Server

FPT DevSecOps Services

FPT Integration

FPT Database Engine

Managed – FPT Database Engine

FPT Cloud Backup & DR

FPT Storage

FPT Network

FPT Container

Deploy an Application with GPU Workload on K8s

Updated on 29 Nov 2024

Print: Export: PDF

Kubernetes manages and utilizes GPU resources similar to how it manages CPU resources. Depending on the GPU specifications of the Worker Group, you need to configure GPU resources for applications running on Kubernetes accordingly.

Note:

You can define GPU limits without defining GPU requests since Kubernetes default uses limits value as requests.
You can define both GPU limits and requests but these values must be equal.
You can not define GPU requests without defining limits.

You can view the GPU specs on Kubernetes by running this command:

kubectl get node ||worker-node|| -o json | jq ‘.items[].metadata.labels‘

For example, the image above shows a worker using the NVIDIA A30 GPU, with the configuration strategy set to all-balanced, and the status is success.

You can view the GPU Instance specifications by running this command (ssh to the worker node, then execute the command):

nvidia-smi mig -lgi

Example of deploying an application with GPU workload:

With the strategy set to single, the GPU resources are declared as:

nvidia.com/gpu: ||gpu-count||

Example:

nvidia.com/gpu: 1

Note: With strategy: single, the GPU is divided equally into instances.

Example deployment with strategy: single GPU usage:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-gpu-app
spec:
  replicas: 1
  selector:
    matchLabels:
      component: gpu-app
  template:
    metadata:
      labels:
        component: gpu-app
    spec:
      containers:
        – name: gpu-container
          securityContext:
            capabilities:
              add:
                – SYS_ADMIN
          resources:
            limits:
              nvidia.com/gpu: 1
          image: nvidia/samples:dcgmproftester-2.0.10-cuda11.0-ubuntu18.04
          command: [“/bin/sh”, “-c”]
          args:
            – while true; do /usr/bin/dcgmproftester11 –no-dcgm-validation -t 1004 -d 300; sleep 30; done

With the strategy set to mixed, the GPU resources are declared as:

nvidia.com/||mig-profile||: ||gpu-count||

Example:

nvidia.com/mig-1g.6gb: 2

Note: With strategy: mixed, the GPU is divided into 2 instance types, so we need to explicitly define the instance type when declaring the resource.

Example deployment with strategy: mixed GPU usage:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-gpu-app
spec:
  replicas: 1
  selector:
    matchLabels:
      component: gpu-app
  template:
    metadata:
      labels:
        component: gpu-app
    spec:
      containers:
        – name: gpu-container
          securityContext:
            capabilities:
              add:
                – SYS_ADMIN
          resources:
            limits:
              nvidia.com/mig-1g.6gb: 1
          image: nvidia/samples:dcgmproftester-2.0.10-cuda11.0-ubuntu18.04
          command: [“/bin/sh”, “-c”]
          args:
            – while true; do /usr/bin/dcgmproftester11 –no-dcgm-validation -t 1004 -d 300; sleep 30; done

With the strategy set to none, the GPU resources are declared as:

nvidia.com/gpu: 1

Note: With strategy: none, the GPU is fully allocated to the application pod.

Example deployment with strategy: none GPU usage:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-gpu-app
spec:
  replicas: 1
  selector:
    matchLabels:
      component: gpu-app
  template:
    metadata:
      labels:
        component: gpu-app
    spec:
      containers:
        – name: gpu-container
          securityContext:
            capabilities:
              add:
                – SYS_ADMIN
          resources:
            limits:
              nvidia.com/gpu: 1
          image: nvidia/samples:dcgmproftester-2.0.10-cuda11.0-ubuntu18.04
          command: [“/bin/sh”, “-c”]
          args:
            – while true; do /usr/bin/dcgmproftester11 –no-dcgm-validation -t 1004 -d 300; sleep 30; done

Adding GPU Worker Group

Setting up GPU Telemetry

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months
cookielawinfo-checbox-functional	11 months
cookielawinfo-checbox-others	11 months
cookielawinfo-checkbox-necessary	11 months
cookielawinfo-checkbox-performance	11 months
viewed_cookie_policy	11 months