All documents

Model Fine-Tuning

    FPT AI Factory Solution
    FPT AI Factory Solution
    Updated on 05 Nov 2025

    JAIST's ambitious project to build a premier Japanese LLM required a partner that could provide not just raw computing power, but also a sophisticated platform to manage the entire model development lifecycle. FPT AI Factory, with its integrated FPT AI Studio and FPT AI Inference services, provided the end-to-end solution JAIST needed.

    • Data Discovery

    The collaboration began with a systematic search for the most effective training data combination. Using FPT AI Studio, JAIST’s researchers trained the Qwen3-0.6B model across 768 unique training data combinations, equivalent to 768 separate training runs. This critical phase was also accelerated by utilizing FPT AI Inference’s embedding models to analyze and classify text domains within the mixed training data.

    • Training phases

    Once the ideal data combination was identified, JAIST embarked on a massive continual pre-training effort using the Qwen2.5-32B as the base model. This process was broken down into three distinct, computationally intensive phases, all managed within FPT AI Studio:

    • Phase 1: The base model was trained on a 100B tokens dataset, utilizing a powerful cluster of 30 nodes, each equipped with 8 NVIDIA H100 GPUs.
    • Phase 2: The training was scaled up significantly, with the model learning from a 267B tokens dataset running on 29 nodes. We promptly detected a faulty node and proceeded to isolate it.
    • Phase 3: The final phase involved a 273B tokens dataset. This dataset included the 267B tokens from the previous phase, augmented with new instruction data generated by the Qwen3-235B-A22B model, a task facilitated by FPT AI Inference services. This phase reused a 30-node H100 GPU cluster for training.

    Throughout this complex process, FPT AI Factory's engineers provided close, dedicated support, ensuring the seamless execution of these large-scale training jobs.

    • Evaluation

    For evaluation, JAIST utilized the full capabilities of FPT AI Studio. The continually pretrained models underwent LoRA fine-tuning and were rigorously benchmarked against the Nejumi Leaderboard 3 using the Test Jobs feature. Furthermore, the Interactive Session feature allowed JAIST researchers to serve the fine-tuned models and conduct their own internal, custom benchmarks.