About Us
Highlights FPT Cloud Server FPT AI Factory FPT Network FPT Cloud Backup & DR FPT Storage FPT Security FPT Container FPT Database FPT Cloud Monitoring FPT Integration FPT.AI

Show all

Object Storage

Secure, unlimited storage to ensures efficiency as well as high and continuous data access demand.

GPU Server

Virtual server integration for 3D Rendering, AI or ML

FPT Load Balancing

Enhance application capacity and availability.

FPT AI Factory

Secure your exclusive advance order today and gain priority access to an all-inclusive stack for AI development, driven by NVIDIA’s powerful technology!

Cloud WAF

FPT Web Application Firewall provides powerful protection for web applications

Cloud Server

Advanced virtual server with rapid scalability

Backup Service

Backup and restore data instantly, securely and maintain data integrity.

Cloud Server

Advanced virtual server with rapid scalability

FPT AI Factory

Secure your exclusive advance order today and gain priority access to an all-inclusive stack for AI development, driven by NVIDIA’s powerful technology!

FPT Load Balancing

Enhance application capacity and availability.

Backup Service

Backup and restore data instantly, securely and maintain data integrity.

Disaster Recovery Service

Recovery, ensuring quick operation for the business after all incidents and disasters.

Block Storage

Diverse throughput and capacity to meet various business workloads.

Object Storage

Secure, unlimited storage to ensures efficiency as well as high and continuous data access demand.

Cloud WAF

FPT Web Application Firewall provides powerful protection for web applications

FPT Cloud WAPPLES

Intelligent and Comprehensive Virtual Web Application Firewall - Security Collaboration between FPT Cloud and Penta Security.

Next-Gen Firewall

The Next generation firewall security service

Container Registry

Easily store, manage, deploy, and secure Container images

Kubernetes Engine

Safe, secure, stable, high-performance Kubernetes platform

FPT Database for MongoDB

Provided as a service to deploy, monitor, backup, restore, and scale MongoDB databases on cloud.

FPT Database for Redis

Provided as a service to deploy, monitor, backup, restore, and scale Redis databases on cloud.

PostgreSQL Database Engine

Provided as a service to deploy, monitor, backup, restore, and scale PostgreSQL databases on cloud.

API Management

The service automatically initiates, maintains, manages, and protects APIs of any sizes.
Pricing
Partner
- Tech News
- White Paper
Event

Service

Cloud Server

FPT AI Factory

FPT Load Balancing

API Management

Cloud Insights

ENG

Tiếng Việt English 中文 (中国) 日本語

All documents

Model Serving

FPT Cloud Server

FPT Container

FPT Network

FPT Storage

FPT Cloud Backup & DR

FPT Database Engine

Managed – FPT Database Engine

FPT Integration

FPT DevSecOps Services

FPT Monitoring

Incident Management

FPT Security

AI Infrastructure

AI Inference

Model Serving

FPT AI Inference

AI Studio

AI Marketplace

AI Marketplace

Billing

Create a New Model Serving Deployment

Updated on 05 Feb 2025

Print: Export: PDF

Step 1: Select AI Platform → Model Serving → Deployment → New Deployment.

Step 2: Enter the Model Settings information, then click Next

Model Information: AI deployment information. Select Model Type:
- Model included in Image: AI Model included in Container Image
- Model not included in Image: AI Model not included in Container Image
- NVIDIA NGC Catalog: AI Model using NVIDIA NGC technology
If Model Type is Model included in Image, select Model Source:
- Model Source: Model selection source. Select Model Source:
  - Model Catalog: Centralized repository of public models, shared for users to use.
    - Model Name: Name of the model selected on the Model Catalog.
    - Model Version: Version of the model selected on the Model Catalog.
    - Model Token: Token authenticated with the Model Catalog for deployment (Create token by: on the home page interface, select Token → Create)
  - Private Model: Private repository of users, can be used internally within the organization.
    - Model Name: Name of the model selected on Private.
    - Model Version: Version of the model selected on Private Model.
    - Model Token: Token authenticated with Private Model to deploy (Create token by: on the home page interface, select Token → Create)
  - Custom Model: Custom model on the Internet, currently only supporting Hugging Face models.
    - Model URL: Path to the custom model
    - Model Token: User authentication token on the platform of the selected Custom Model (e.g., Hugging Face)

If you select Model Type as Model included in Image or Model not Included in Image, select Image Information:

Image Information: Container Image deployment information. Enter Image information:
- Image Source: Select Image type Public (no need to enter user/password) or Private (need to enter user/password)
- Image Registry: Link to the container image storage location.
- Image Tag: Container image version

Alt text

If Model Type is NVIDIA NIM – NGC Catalog, select deployment information:

NIM Model: Select the NIM Model to deploy. Refer to the Support matrix to select the correct Model compatible with the deployment infrastructure.
NIM Helm Chart: Select the appropriate Helm Chart to deploy the Model.
NCG Personal Key: The personal key to authenticate the user with NGC Catalog.
(Refer to the NGC Catalog User Guide to generate the personal key.)

Alt text

Step 3: Enter the Deployment Settings information, then click Next.

Deployment Information: Information about the Deployment
- Serving Name: The name of the deployment to be served.
- Choose Cluster : Select the K8S cluster to serve from the list of K8S clusters in this VPC.
- Instance Replica: The number of processing units in this deployment.
- Resource Type: Information about resource configuration. There are two types of resources:
  - Flavor: Pre-configured selection for CPU/RAM/DISK/GPU
  - Custom: Custom configuration for CPU/RAM/DISK/GPU according to needs.

Advance Settings: Enter advanced configurations for Deployment. Click See More to configure.
- Deployment Strategy: Choose a deployment strategy for K8S. Available strategies include:
  - Recreate: Recreate instances when changes are made (downtime will occur)
  - Rolling: Gradually replace instances during updates (no downtime), but requires additional resources equivalent to one instance.
- Startup Command: Configure the startup command for instances
  - Startup Command: The command executed when the instance starts
  - Arguments: Parameters passed to the startup command
- Environment Variable: Define environment variables for the instance
  - Key: The name of the environment variable
  - Value: The value assigned to the environment variable
- Nodes Selector: Select specific worker nodes/worker groups for deployment
  - Key: The label key assigned to the node
  - Value: The label value assigned to the node
- Tags: Assign tags to the Deployment
  - Key: The label key assigned to the Deployment
  - Value: The label value assigned to the Deployment

Step 4: Enter configuration details for Traffic Settings, then click Next

Traffic Information: Configure settings for the Deployment's external connection
- Services Type: The type of service for the external connection
  - Load Balancer: Use load balancing
  - Cluster IP: Use internal communication within the Kubernetes Cluster
  - Ingress: Use the Ingress application to manage connection flows
- Traffic Type: Specify the connection type: public or private
- Port: The external connection port

Step 5: Review the entered information and click Confirm to create the Deployment cluster

Alt text

Initial Setup

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months
cookielawinfo-checbox-functional	11 months
cookielawinfo-checbox-others	11 months
cookielawinfo-checkbox-necessary	11 months
cookielawinfo-checkbox-performance	11 months
viewed_cookie_policy	11 months