All documents
Data sample : Access GitHub (link) to get the sample data for use
Access the Model Fine-tuning service and choose Pipeline Management tab, click button "Create pipeline"
Base Model | Description |
---|---|
Llama-3.1-8B | Base language model, 8B parameters, versatile, ideal for fine-tuning |
Llama-3.2-1B | Lightweight base model, 1B parameters, fast, efficient, suitable for edge use |
Llama-3.2-8B-Instruct | Instruction-tuned model, 8B parameters, optimized for dialogue and tasks |
Llama-3.2-11B-Vision-Instruct | Multimodal instruction-tuned model, 11B parameters, optimized for vision-language tasks |
Llama-3.3-70B-Instruct | Instruction-tuned LLaMA model, 70B parameters, excels at complex tasks |
Meta-Llama-3-8B-Instruct | Instruction-tuned LLaMA model, 8B parameters, optimized for conversational tasks |
Qwen2-0.5B-Instruct | Small instruction-tuned model, 0.5B parameters, lightweight and task-efficient |
Qwen2-VL-7B-Instruct | Multimodal instruction-tuned model, 7B parameters, efficient vision-language understanding |
Qwen2-VL-72B | Multimodal base model, 72B parameters, handles both vision and language |
Qwen2-VL-72B-Instruct | Multimodal instruction-tuned model, 72B parameters, vision-language understanding and generation |
Qwen2.5-0.5B-Instruct | Updated instruction-tuned model, 0.5B parameters, improved efficiency and task handling |
Qwen2.5-14B-Instruct | Instruction-tuned language model, 14B parameters, balanced power and efficiency |
Qwen2.5-32B-Instruct | Instruction-tuned language model, 32B parameters, strong at understanding tasks |
Qwen2.5-VL-72B-Instruct | Multimodal instruction-tuned model, 72B parameters, excels at vision-language tasks |
Mixtral-8x7B-v0.1 | Sparse Mixture-of-Experts model, 8 experts, high efficiency, strong performance |
Mixtral-8x22B-v0.1 | Large Mixture-of-Experts model, 8×22B experts, scalable, efficient, powerful reasoning |
Mixtral-8x22B-Instruct-v0.1 | Instruction-tuned MoE model, 8×22B experts, excels at following tasks |
DeepSeek-R1 | Foundation language model by DeepSeek, versatile, powerful, and open-source |
DeepSeek-R1-Distill-Llama-70B | Efficient language model, distilled from LLaMA 70B, optimized performance |
DeepSeek-R1-V3-0324 | Advanced multilingual model, latest DeepSeek version, optimized for diverse tasks |
Note: If you want to upload your models, please contact us!
Data Format | Description | Data Structure | File Format |
---|---|---|---|
Alpaca | Instruction-following format with input, output pairs for supervised fine-tuning tasks | {instruction, input, output} | json, zip |
Corpus | Large structured text collection, used for training and evaluating models | {text} | json, zip |
ShareGPT | Trained on ShareGPT dataset for improved conversational responses | multi-turn chats / {conversations [from, value]} | json, zip |
ShareGPT_Image | Optimized for multimodal (text & image) processing | multi-turn chats / {conversations [from, value]} + image_path | zip: train.json and folder images |
More details about Data format, please visit here: https://fptcloud.com/en/documents/model-fine-tuning/?doc=select-data-format
You have two ways to upload the Training/Evaluation dataset:
Choose a connection: Choose a connection and enter the path to the object within the bucket
Before selecting a connection, you need to access the Data Hub and create a connection by selecting a data source, entering the endpoint URL to the bucket, providing the access key and secret key. You can also refer to the connection creation guide here: https://fptcloud.com/en/documents/data-hub/?doc=create-connection
Trainer | Description | Supported Data Format |
---|---|---|
Pre-training | Initial training phase using large unlabeled data for language understanding | Corpus |
SFT | Supervised fine-tuning trainer, aligns model behavior using labeled data | Alpaca/ ShareGPT/ ShareGPT_Image |
DPO | Direct Preference Optimization trainer, aligns model with human preference signals directly | ShareGPT/ ShareGPT_Image |
Parameters | Description | Type | Supported values | Default value |
---|---|---|---|---|
learning_rate | Learning rate for training | float | [0.00001-0.001] | 0.00001 |
batch_size | Batch size for training. In case of distributed training, this is batch size on each device | int | updating | 1 |
epochs | Number of training epochs | int | updating | 1 |
gradient_accumulation_steps | Number of updates steps to accumulate the gradients for, before performing a backward/update pass | int | updating | 4 |
checkpoint_steps | Number of training steps before two checkpoint saves if save_strategy="steps". | int | updating | 1000 |
max_sequence_length | Max input length, longer sequences will be cut-off to this value. | int | updating | 2048 |
finetuning_type | Which parameter mode to use. | enum[string] | lora/full | lora |
distributed_backend | Backend to use for distributed training. Default is ddp | enum[string] | ddp/deepseed | ddp |
deepspeed_zero_stage | Stage to apply DeepSpeed ZeRO algorithm. Only apply when distributed_backend=deepspeed | enum[int] | 1/2/3 | 1 |
More details about Hyperparameters, please visit here: https://fptcloud.com/en/documents/model-fine-tuning/?doc=set-hyperparameters
Trigger | Description |
---|---|
Manual | User-initiated fine-tuning. |
Scheduled | Automated fine-tuning based on a set schedule. |
ft_[base model]_[timestamp]
ft_[base model]_[timestamp]_template
You can manually start the pipeline by clicking the Start button or set it to run on a scheduled basis.
Cookie | Duration | Description |
---|---|---|
cookielawinfo-checbox-analytics | 11 months | |
cookielawinfo-checbox-functional | 11 months | |
cookielawinfo-checbox-others | 11 months | |
cookielawinfo-checkbox-necessary | 11 months | |
cookielawinfo-checkbox-performance | 11 months | |
viewed_cookie_policy | 11 months |