About Us
Highlights FPT Cloud Server FPT AI Factory FPT Network FPT Cloud Backup & DR FPT Storage FPT Security FPT Container FPT Database FPT Cloud Monitoring FPT Integration FPT Data Suite FPT.AI

Show all

Object Storage

Secure, unlimited storage to ensures efficiency as well as high and continuous data access demand.

GPU Server

Virtual server integration for 3D Rendering, AI or ML

FPT Load Balancing

Enhance application capacity and availability.

FPT AI Factory

Secure your exclusive advance order today and gain priority access to an all-inclusive stack for AI development, driven by NVIDIA’s powerful technology!

Cloud WAF

FPT Web Application Firewall provides powerful protection for web applications

Cloud Server

Advanced virtual server with rapid scalability

Backup Service

Backup and restore data instantly, securely and maintain data integrity.

Cloud Server

Advanced virtual server with rapid scalability

FPT AI Factory

Secure your exclusive advance order today and gain priority access to an all-inclusive stack for AI development, driven by NVIDIA’s powerful technology!

FPT Load Balancing

Enhance application capacity and availability.

Backup Service

Backup and restore data instantly, securely and maintain data integrity.

Disaster Recovery Service

Recovery, ensuring quick operation for the business after all incidents and disasters.

Block Storage

Diverse throughput and capacity to meet various business workloads.

Object Storage

Secure, unlimited storage to ensures efficiency as well as high and continuous data access demand.

Cloud WAF

FPT Web Application Firewall provides powerful protection for web applications

FPT Cloud WAPPLES

Intelligent and Comprehensive Virtual Web Application Firewall - Security Collaboration between FPT Cloud and Penta Security.

Next-Gen Firewall

The Next generation firewall security service

Container Registry

Easily store, manage, deploy, and secure Container images

Kubernetes Engine

Safe, secure, stable, high-performance Kubernetes platform

FPT Database for MongoDB

Provided as a service to deploy, monitor, backup, restore, and scale MongoDB databases on cloud.

FPT Database for Redis

Provided as a service to deploy, monitor, backup, restore, and scale Redis databases on cloud.

PostgreSQL Database Engine

Provided as a service to deploy, monitor, backup, restore, and scale PostgreSQL databases on cloud.

API Management

The service automatically initiates, maintains, manages, and protects APIs of any sizes.

FPT Data Suite

Helps reduce operational costs by up to 40% compared to traditional BI solutions, while improving efficiency through optimized resource usage and infrastructure scaling.
Pricing
Partner
- Tech News
- White Paper
Event

Service

Cloud Server

FPT AI Factory

FPT Load Balancing

API Management

FPT Data Suite

Cloud Insights

ENG

Tiếng Việt English 中文 (中国) 日本語

All documents

Model Fine-tuning

FPT Monitoring

Incident Management

Billing AI Factory

Billing

AI Marketplace

AI Inference

AI Studio

FPT AI Inference

AI Inference

AI Infrastructure

FPT Security

FPT Cloud Server

FPT DevSecOps Services

FPT Integration

FPT Database Engine

Managed – FPT Database Engine

FPT Cloud Backup & DR

FPT Storage

FPT Network

FPT Container

Set hyperparameters

Updated on 26 Jun 2025

Print: Export: PDF

For all trainers (Pre-training, SFT, DPO)

Parameters	Description	Type	Supported values	Default value
learning_rate	Learning rate for training	float	[0.00001-0.001]	0.00001
batch_size	Batch size for training. In case of distributed training, this is batch size on each device	int	updating	1
epochs	Number of training epochs	int	updating	1
gradient_accumulation_steps	Number of updates steps to accumulate the gradients for, before performing a backward/update pass	int	updating	4
checkpoint_steps	Number of training steps before two checkpoint saves if save_strategy="steps".	int	updating	1000
max_sequence_length	Max input length, longer sequences will be cut-off to this value.	int	updating	2048
finetuning_type	Which parameter mode to use.	enum[string]	lora/full	lora
distributed_backend	Backend to use for distributed training. Default is ddp	enum[string]	ddp/deepseed	ddp
deepspeed_zero_stage	Stage to apply DeepSpeed ZeRO algorithm. Only apply when distributed_backend=deepspeed	enum[int]	1/2/3	1
lr_scheduler_type	Learning rate scheduler to use	enum[string]	linear/cosine/constant	linear
lr_warmup_steps	Number of steps used for a linear warmup from 0 to learning_rate	int	updating	0
disable_gradient_checkpointing	Whether or not to disable gradient checkpointing	bool	true/false	false
eval_strategy	The evaluation strategy to adopt during training.	enum[string]	no/epoch/steps	epoch
eval_steps	Number of update steps between two evaluations if eval_strategy="steps". Will default to the same value as logging_steps if not set. Should be an integer or a float in range [0,1). If smaller than 1, will be interpreted as ratio of total training steps. Only apply when eval_strategy=steps.	int	updating	1000
mixed_precision	Type of mixed precision to use	enum[string]	bf16/fp16/none	bf16
optimizer	Optimizer to use for training	enum[string]	adamw/sgd	adamw
lora_alpha	Alpha parameter for LoRA	int	updating	32
lora_dropout	Dropout rate for LoRA	float	updating	0.05
lora_rank	Rank of the LoRA matrices	int	updating	16
quantization_bit	The number of bits to quantize the model using on-the-fly quantization. Currently only applicable when finetuning_type=LoRA (called QLoRA).	enum[string]	int4/int8/none	none
flash_attention_v2	Whether to use flash attention version 2	bool	false	false
logging_steps	Number of steps between logging events including stdout logs and MLflow data point. logging_steps=-1 means log on every step.	int	updating	10
checkpoint_strategy	The checkpoint save strategy to adopt during training. "best" only applicable when eval_strategy is not "no".	enum[string]	no/epoch/steps	epoch
max_grad_norm	Maximum norm for gradient clipping	float	updating	1
number_of_checkpoints	If a value is passed, will limit the total amount of checkpoints	int	updating	5
seed	Random seed for reproducibility	int	updating	1309
full_determinism	Ensure reproducible results in distributed training. Important: this will negatively impact the performance, so only use it for debugging. If True, seting seed will not take effect.	bool	true/false	false
weight_decay	Weight decay to apply to the optimizer	float	updating	0
target_modules	Target modules for quantization or fine-tuning.	string	updating	all-linear

If Trainer = DPO, make sure to include a few extra parameters on top of the standard config

Parameters	Description	Type	Supported values	Default value
pref_beta	The beta parameter in the preference loss	float	[0,1]	0.1
pref_loss	The type of DPO loss to use	enum[string]	sigmoid/hinge/ipo/kto_pair/orpo/simpo	sigmoid
pref_ftx	The supervised fine-tuning loss co-efficient in DPO training	float	updating	0
dpo_label_smoothing	The robust DPO label smoothing parameter in cDPO that should be between 0 and 0.5	float	updating	0
simpo_gamma	The target reward margin term in SimPO loss. Used only when the pref_loss = “simpo”	float	updating	0.5

Select data format

How to use model from other sources

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months
cookielawinfo-checbox-functional	11 months
cookielawinfo-checbox-others	11 months
cookielawinfo-checkbox-necessary	11 months
cookielawinfo-checkbox-performance	11 months
viewed_cookie_policy	11 months