General - FPT Smart Cloud

About Us
Highlights FPT Cloud Server FPT AI Factory FPT Network FPT Cloud Backup & DR FPT Storage FPT Security FPT Container FPT Database FPT Cloud Monitoring FPT Data Suite FPT.AI

Show all

Object Storage

Secure, unlimited storage to ensures efficiency as well as high and continuous data access demand.

GPU Server

Virtual server integration for 3D Rendering, AI or ML

FPT Load Balancing

Enhance application capacity and availability.

FPT AI Factory

Access to an all-inclusive stack for AI development, driven by NVIDIA’s powerful technology!

Cloud WAF

FPT Web Application Firewall provides powerful protection for web applications

Cloud Server

Advanced virtual server with rapid scalability

Backup Service

Backup and restore data instantly, securely and maintain data integrity.

Cloud Server

Advanced virtual server with rapid scalability

FPT AI Factory

Access to an all-inclusive stack for AI development, driven by NVIDIA’s powerful technology!

FPT Load Balancing

Enhance application capacity and availability.

Backup Service

Backup and restore data instantly, securely and maintain data integrity.

Disaster Recovery Service

Recovery, ensuring quick operation for the business after all incidents and disasters.

Block Storage

Diverse throughput and capacity to meet various business workloads.

Object Storage

Secure, unlimited storage to ensures efficiency as well as high and continuous data access demand.

Cloud WAF

FPT Web Application Firewall provides powerful protection for web applications

FPT Cloud WAPPLES

Intelligent and Comprehensive Virtual Web Application Firewall - Security Collaboration between FPT Cloud and Penta Security.

Next-Gen Firewall

The Next generation firewall security service

Container Registry

Easily store, manage, deploy, and secure Container images

Kubernetes Engine

Safe, secure, stable, high-performance Kubernetes platform

FPT Database for MongoDB

Provided as a service to deploy, monitor, backup, restore, and scale MongoDB databases on cloud.

FPT Database for Redis

Provided as a service to deploy, monitor, backup, restore, and scale Redis databases on cloud.

PostgreSQL Database Engine

Provided as a service to deploy, monitor, backup, restore, and scale PostgreSQL databases on cloud.

Monitoring

System Monitoring Solution anywhere, anytime, anyplatform

FPT Data Suite

Helps reduce operational costs by up to 40% compared to traditional BI solutions, while improving efficiency through optimized resource usage and infrastructure scaling.
Pricing
Partner
- Tech news
- White Paper
Event

Service

Cloud Server

FPT AI Factory

FPT Load Balancing

Monitoring

FPT Data Suite

Cloud Insights

ENG

Tiếng Việt English 中文 (中国) 日本語

Blogs Tech

Categories

News

FPT launches FPT AI Factory to accelerate AI development in Japan, offering local companies pre-orders for its NVIDIA H200 Tensor Core GPUs Cloud Service

13:46 13/11/2024

FPT Cloud

News

Practical Guide to Distributed Training Large Language Models (LLMs) with Slurm and LLaMA-Factory on Metal Cloud

17:24 28/02/2025

FPT Cloud

News

FPT Empowers Developers to Fast-Track AI Innovation with AI Notebook Running On NVIDIA Accelerated Computing

00:09 06/11/2025

FPT Cloud

FPT AI Factory – The Launchpad for Next-Gen AI Startups

11:24 22/10/2025

In today’s fast-evolving AI landscape, startups and ventures face a common challenge: how to transform bold AI ideas into real-world solutions fast, efficiently, and securely. At IGNITE 2025, Mr. Pham Vu Hung, Solutions Architect of FPT Smart Cloud, FPT Corporation, shared how FPT AI Factory serves as a launchpad for the region’s AI ecosystem by providing startups with the infrastructure, technical expertise, and collaborative network they need to scale their innovations from concept to market. Image. Mr. Pham Vu Hung at IGNITE 2025 Empowering Startups to Build, Scale, and Thrive FPT AI Factory combines the best of global technology and local proximity, offering a powerful platform for startups to build and scale AI solutions closer to their markets. With the AI infrastructure powered by NVIDIA H100/H200 GPUs (ranked 36th and 38th in the TOP500 list, June 2025), FPT AI Factory delivers world-class performance while ensuring data sovereignty, security, and low latency for customers. More than that, FPT AI Factory offers a complete ecosystem designed to help startups and innovators accelerate AI development, deployment, and growth. 1. Flexible yet powerful AI infrastructure Startups can easily access GPU-powered environments tailored to their needs, from GPU Container for experimentation to Virtual Machine and FPT AI Studio for production workloads. This flexibility allows early-stage organizations to build, test, and deploy AI models quickly while keeping operational costs under control. 2. No-code/low-code platforms to build AI With FPT AI Studio, startups can develop and fine-tune AI models using a visual drag-and-drop interface, with no deep coding required. Once trained, these models can be deployed instantly via FPT AI Inference, enabling rapid market validation and iteration. 3. Expert guidance and technical support FPT AI Factory provides not only computing resources but also hands-on technical consultation, helping teams choose optimal architectures, fine-tune models, and design efficient AI workflows. For startups without large in-house AI teams, this means faster development with lower risk. 4. Scalable growth within a secure, regional ecosystem As startups scale, FPT AI Factory provides the solid foundation they need, from GPU Clusters and Kubernetes orchestration to secure storage and MLOps tools, all hosted in state-of-the-art data centers. With this setup, startups can build and deploy AI solutions with low latency and strong data protection. From Concept to Market: Real Impact Across the Region Mr. Hung showcased how FPT AI Factory has supported startups and enterprises in the region to achieve faster time-to-market. One standout case involves a Japanese IT company that fine-tuned a 300GB document understanding model using GPU Container and Object Storage on FPT AI Factory. The company is then able to achieve faster iteration cycles, optimized costs, and a more efficient path from prototype to production. Another highlight was a live demo of an AI Camera Agent capable of video search and summarization that is built in just one day using NVIDIA Blueprints and deployed on FPT AI Factory. This case demonstrates how startups can move from idea to working prototype with remarkable speed. Image. AI Camera Agent built on FPT AI Factory Why AI Startups Choose FPT AI Factory By offering the right foundation of infrastructure, expertise, and ecosystem connections, FPT is empowering a new generation of AI startups to grow beyond borders and lead the region’s digital transformation. Get started fast: Access GPU-powered AI environments within minutes. Scale flexibly: Choose the right compute option for each stage of growth. Reduce time-to-market: Build and deploy AI models rapidly with ready-to-use tools. Stay secure and compliant: Operate in top-tier, regionally hosted data centers. Collaborate across borders: Connect with FPT’s AI experts, partners, and investor network. Connect with our experts and explore FPT AI Factory now: https://aifactory.fptcloud.com

FPT AI Factory: Powering Scalable and Competitive AI Startup Growth

16:20 21/10/2025

Artificial intelligence (AI) is emerging as a powerful driver of transformation, reshaping economies and societies across the globe. Day by day, AI is accelerating innovation, enhancing productivity, and delivering solutions to complex challenges across diverse industries. For developing countries, AI represents a strategic opportunity to drive growth, strengthen competitiveness, and build a resilient, future-ready economy. Yet, while AI offers immense opportunities, startups often encounter a critical bottleneck when it comes to accessing the high-performance computing infrastructure needed to turn ambitious ideas into impactful solutions. The rapid growth of AI has created an unprecedented demand for GPUs and other computational resources. Startups seeking to develop cutting-edge AI solutions, from generative models to advanced analytics, must harness powerful infrastructure to train models efficiently, handle massive datasets, and iterate quickly. In this fiercely competitive landscape, success is determined not only by creativity, but also by speed, scalability, and the ability to deploy solutions reliably. Startups that can leverage the right resources gain a decisive edge, while those constrained by limited computing power risk falling behind. The Infrastructure Gap Hindering AI Startup Scalability Amid this rapid AI evolution, startups face intense competition where speed, efficiency, and differentiation often determine survival. Turning innovative ideas into impactful AI solutions requires more creativity and ambition; it demands access to high-performance computing infrastructure capable of handling complex workloads. Training and fine-tuning large-scale AI models, particularly in domains like generative AI, relies on powerful GPUs, scalable storage, and flexible systems that can swiftly adapt to rapidly changing demands. Yet, building such infrastructure independently is often out of reach for most startups. High upfront expenses, ongoing maintenance, and specialized expertise strain limited budgets, making it difficult to experiment at scale. Limited computing power can slow model training, restrict data processing, and prevent rapid iteration to meet customer or investor expectations. At a practical startup level, accessing sufficient compute represents a significant investment, reflecting the projected growth of the global high-performance computing market from 55.2 billion USD in 2024 to 101.48 billion USD by 2033. For example, a startup training a mid-size generative AI model may require a cluster of 8 - 16 high-end GPUs, which can cost tens of thousands of USD per month in cloud compute alone. These expenses often force startups to scale down experiments or prolong model development, creating a tangible infrastructure gap compared to better-funded competitors. Additionally, beyond hardware, startups also face human capital challenges. Recruiting and retaining AI engineers, data scientists, and ML operations specialists is highly competitive and costly. Even with talent in place, coordinating teams to handle complex AI pipelines efficiently demands robust operational processes, something many young companies have yet to fully establish. Emerging AI companies face many challenges, including high infrastructure costs, limited talent, and constant pressure to deliver fast results. These factors pose a significant barrier to entry and make it difficult for startups to compete effectively on a global scale. Balancing computational strength, cost efficiency, and human resources has therefore become one of the most pressing hurdles for startups striving to compete on a global stage. Without solutions to these constraints, even the most innovative ideas risk never reaching their full potential. Introducing FPT AI Factory: A Comprehensive Suite for End-to-End AI Product Lifecycle Recognizing the growing demands of AI development, FPT has partnered with NVIDIA to launch FPT AI Factory – a comprehensive platform designed to help organizations accelerate their AI journey with confidence. More than a toolset, FPT AI Factory is a robust ecosystem combining cutting-edge GPU infrastructure, pre-built AI applications, and a unified environment for model training, fine-tuning, and deployment. It provides businesses with speed, scalability, and flexibility to build, optimize, and operationalize AI solutions efficiently. Whether developing custom generative AI models, refining architectures, or deploying AI-driven services, FPT AI Factory delivers computational power and streamlined workflows to turn ideas into impactful innovations. The “Build Your Own AI” Philosophy A core philosophy of FPT AI Factory is “Build your own AI.” This enables startups and enterprises to create custom models tailored to their business needs. Success requires the right combination of infrastructure, tools, and applications, allowing companies to experiment freely, iterate quickly, and deploy with confidence. At its core, the platform leverages NVIDIA H100 and H200 GPUs, high-performance storage, and GPU containers. Complementing this, FPT provides AI Studio for model testing, fine-tuning, and data management, and AI Inference for flexible deployment. Live models can interact with users through AI agents and applications, generating tangible business value. Use Cases Across Industries Beyond philosophy, the true impact of FPT AI Factory is best seen through practical applications across different sectors: Banking & Finance Develop and deploy LLM-powered voicebots for customer service. Host image processing models for eKYC: ID verification, facial recognition, and deepfake detection. Build personal financial assistants capable of analyzing reports and synthesizing financial news. Healthcare Deploy AI models for early diagnosis of breast cancer and cytology analysis. Run image analysis workloads on GPU containers to interpret ultrasound scans more efficiently. Biotech Apply genetic code analysis solutions to accelerate biological research and drug discovery. Technology Develop chatbots for customer service and internal support. Train and fine-tune custom AI models based on business-specific data. Build large-scale visual AI models for multi-task processing, ensuring reliable system operations. Operate AI agents using models like DeepSeek to enhance sales processes and customer engagement. These examples illustrate how AI capabilities can be transformed into real-world solutions when supported by the right infrastructure and platforms. Empowering the Next Wave of AI Startups In today’s highly competitive AI ecosystem, the ability to develop, train, and deploy models efficiently can determine a startup’s success. FPT AI Factory provides not only the technological foundation but also practical pathways for innovation. By embracing the “Build Your Own AI” approach and leveraging real-world use cases across industries, startups and enterprises can accelerate their AI journey, moving from ideas to impactful applications faster, smarter, and more cost-effectively.

FPT at Tech Week Singapore 2025: Pioneering the Future of AI-Powered Business Transformation

10:41 14/10/2025

At Tech Week Singapore 2025 – one of the largest tech events in the APAC, FPT showcased its latest innovations in Artificial Intelligence, reinforcing its position as a leading technology partner for enterprises looking to drive intelligent transformation. With a strong focus on robust infrastructure, enterprise-ready AI platforms, and agentic AI capabilities, FPT attracted the attention of business leaders, technology professionals, and innovators from across the region. Spotlight on AI Innovation at the FPT Booth The FPT booth served as a gateway into the future of AI-powered enterprises, introducing a suite of advanced solutions designed to help businesses harness AI at scale and speed. Image. The Vietnamese Ambassador in Singapore visited FPT’s booth FPT AI Factory: End-to-End AI Development at Scale For developers, researchers, and AI engineers, FPT introduced the FPT AI Factory – a comprehensive solution that accelerates the full AI development lifecycle. Leveraging the latest NVIDIA H100/H200 GPUs, FPT AI Factory empowers organizations to train, fine-tune, and customize AI models using proprietary data, unlocking new levels of productivity and innovation. FPT AI Agents: Multilingual, Multi-Channel, Ready to Deploy Visitors experienced firsthand the FPT AI Agents platform, enabling organizations to create and operate multilingual AI Agents across multiple channels. With over 20 ready-to-use AI applications including Telesales Agents, Omni-channel AI Agents, Quality Control Agents, and Admin Agents, the platform offers immediate value by automating customer interactions, enhancing service delivery, and improving operational efficiency at a fraction of costs. Accelerating Solutions with NVIDIA-Powered Technologies FPT also highlighted how its AI solutions are built to take full advantage of cutting-edge NVIDIA technologies such as NVIDIA Blueprint and NVIDIA AI Enterprise. These tools allow FPT to optimize AI infrastructure and rapidly design scalable solutions across a variety of industries. From high-performance compute environments to integrative products and services, FPT is enabling businesses to go from concept to deployment faster than ever. Keynote Highlight: Unlocking the Future with AI Factory and Agentic AI A major highlight of FPT’s participation was the keynote by Mr. Mark Hall Andrew, Chief Revenue Officer of FPT Smart Cloud, FPT Corporation titled: "From AI Factory to Agentic AI: Building the Future of Intelligent Enterprises." In his presentation, Mr. Mark Hall painted a compelling picture of how AI is rapidly transforming the global and, specifically, Asia/Pacific economy, reshaping GDP growth, redefining workforce structures, and creating new competitive advantages. At the core of this transformation is the emergence of AI Agents, offering businesses a revolutionary new way to collaborate with intelligent systems. Image. FPT’s keynote presented by Mr. Mark Hall Andrew at the AI & Data in Practice Theatre He introduced FPT’s “Build Your Own AI” strategy, aiming to help organizations become AI-native enterprises by embedding intelligence into every layer of operations. According to Mr. Hall Andrew, FPT’s approach is structured around three foundational pillars: 1. Enterprise AI: Driving End-to-End Transformation With the Rapid AI Deployment Factory and AI Architecture Design, FPT empowers enterprises to integrate AI into customer engagement, product development, and internal processes. These offerings are designed for scalability, security, and quick time-to-value, aligning with business transformation goals. 2. Industrial AI: Enabling Smart Manufacturing In manufacturing, FPT leverages Agentic AI and NVIDIA technologies such as Omniverse, IsaacSim, and Gr00t to optimize processes, automate robotics, and simulate operations using digital twins. These solutions are not just theoretical; they are already helping manufacturers streamline operations and boost productivity. 3. AI Infrastructure: Building the Foundation for Scalable Intelligence Finally, FPT helps organizations modernize their data and AI infrastructure with enterprise-grade architecture. This includes maximizing hardware efficiency while ensuring robust security and scalability, which are critical requirements for sustained AI adoption. The backbone of these pillars is FPT AI Factory - a unified, end-to-end stack that brings together data, infrastructure, and AI models to accelerate innovation. FPT AI Factory serves as the foundation for developing and deploying AI Agents at scale, enabling enterprises to seamlessly move from experimentation to real-world adoption. By combining advanced computing power, domain-specific expertise, and an open, collaborative ecosystem, FPT is helping organizations across the APAC and beyond build the future of intelligent, AI-native enterprises. Charting the Future of AI Together Tech Week Singapore 2025 marked not just another milestone for FPT, but a clear statement of intent. AI is no longer a future aspiration. It is a present-day imperative. FPT stands ready to help businesses navigate this new era, offering scalable, secure, and intelligent AI solutions tailored to real-world needs. As AI continues to reshape industries, FPT is committed to being a trusted partner in helping organizations unlock their full potential with future-ready technologies. ------ Ready to accelerate your AI journey? Explore our AI solutions or get in touch to discover how FPT can support your transformation into an AI-native enterprise. Connect now: https://fptsmartcloud.vn/8USYu

What’s New on FPT AI Factory

16:39 30/09/2025

Welcome to FPT AI Factory Release Notes! Here we’ll provide regular updates on what’s happening across the FPT AI Factory ecosystem from new product features to infrastructure upgrades, billing improvements, and more. September, 2025 August, 2025

Enhancing the Power of Generative AI with Retrieval-Augmented Generation

18:38 29/09/2025

Artificial Intelligence (AI) is advancing rapidly, transforming industries and reshaping how organizations interact with technology. At the center of this evolution are Large Language Models (LLMs) such as OpenAI’s ChatGPT and Google Gemini. These models deliver impressive capabilities in understanding and generating natural language, making them valuable across multiple business domains. However, LLMs also have inherent limitations. Their knowledge is based solely on pre-trained data, which can become static, outdated, or incomplete. As a result, they may produce inaccurate or misleading outputs, and struggle with specialized or real-time queries. To overcome these challenges, Retrieval-Augmented Generation (RAG) has emerged. This approach combines the generative strengths of LLMs with the precision of external knowledge retrieval, enabling more accurate, reliable, and business-ready AI solutions. What Is Retrieval-Augmented Generation? Retrieval-Augmented Generation (RAG) is an AI approach built to improve how large language models (LLMs) generate responses. Instead of relying solely on the model’s pre-trained knowledge, RAG integrates a retriever component that sources information from external knowledge bases such as APIs, online content, databases, or document repositories. RAG was developed to improve the quality of feedback for LLMs The retriever can be tailored to achieve different levels of semantic precision and depth, commonly using: Vector Databases: User queries are transformed into dense vector embeddings (via transformer-based models like BERT) to perform similarity searches. Alternatively, sparse embeddings with TF-IDF can be applied, relying on term frequency. Graph Databases: Knowledge is structured through relationships among entities extracted from text. This ensures high accuracy but requires very precise initial queries. SQL Databases: Useful for storing structured information, though less flexible for semantic-driven search tasks. RAG is especially effective for handling vast amounts of unstructured data, such as the information scattered across the internet. While this data is abundant, it is rarely organized in a way that directly answers user queries. That is why RAG has become widely adopted in virtual assistants and chatbots (e.g., Siri, Alexa). When a user asks a question, the system retrieves relevant details from available sources and generates a clear, concise, and contextually accurate answer. For instance, if asked, “How do I reset the ABC remote?”, RAG can pull instructions from product manuals and deliver a straightforward response. By blending external knowledge retrieval with LLM capabilities, RAG significantly enhances user experiences, enabling precise and reliable answers even in specialized or complex scenarios. The RAG model is often applied in virtual assistants and chatbots Why is RAG important? Large Language Models (LLMs) like OpenAI’s ChatGPT and Google Gemini have set new standards in natural language processing, with capabilities ranging from comprehension and summarization to content generation and prediction. Yet, despite their impressive performance, they are not without limitations. When tasks demand domain-specific expertise or up-to-date knowledge beyond the scope of their training data, LLMs may produce outputs that appear fluent but are factually incorrect. This issue is commonly referred to as AI hallucination. The challenge becomes even more apparent in enterprise contexts. Organizations often manage massive repositories of proprietary information—technical manuals, product documentation, or knowledge bases—that are difficult for general-purpose models to navigate. Even advanced models like GPT-4, designed to process lengthy inputs, can still encounter problems such as the “lost in the middle” effect, where critical details buried in large documents fail to be captured. Retrieval-Augmented Generation (RAG) emerged as a solution to these challenges. By integrating a retrieval mechanism, RAG allows LLMs to pull information directly from external sources, including both public data and private enterprise repositories. This approach not only bridges gaps in the model’s knowledge but also reduces the risk of hallucination, ensuring responses are grounded in verifiable information. For applications like chatbots, virtual assistants, and question-answering systems, the combination of retrieval and generation marks a significant step forward—enabling accurate, up-to-date, and context-aware interactions that enterprises can trust. RAG enables LLMs to retrieve information from external sources, limiting AI hallucination Retrieval-Augmented Generation Pipeline Benefits of RAG RAG offers several significant advantages over standalone LLMs: Up-to-Date Knowledge: Dynamically retrieves the latest information without retraining the model. Reduced Hallucination: Grounded answers minimize the risk of fabricated content. Transparency: Provides source references, enabling users to verify claims. Cost Efficiency: Eliminates frequent re-training cycles, reducing computational and financial overhead. Scalability: Works across domains, from healthcare and finance to enterprise IT. Versatility: Powers applications such as chatbots, search systems, and intelligent summarization tools. Practical Use Cases Across Industries RAG is emerging as the key to helping Generative AI overcome the limitations of models like ChatGPT or Gemini, which rely solely on pre-trained data that can quickly become outdated or inaccurate. By combining the generative capabilities of language models with external data retrieval, RAG delivers clear, real-time answers, minimizes AI hallucination, and helps businesses optimize costs. In practice, RAG is already shaping the future of AI across multiple domains: Chatbots and Customer Service: Provide instant, accurate responses by retrieving answers directly from product manuals, FAQs, or knowledge bases. Healthcare: Deliver reliable medical insights by sourcing information from verified clinical guidelines and research databases. Finance: Equip analysts with real-time market updates and contextual insights drawn from live data feeds. Knowledge Management: Help employees interact with technical documentation and compliance materials in a natural, conversational way. These practical use cases illustrate how RAG makes AI more reliable, transparent, and truly valuable across industries. Future Outlook RAG represents a pivotal step toward trustworthy, authoritative AI. By bridging parameterized knowledge (learned during training) with retrieved knowledge (dynamic, external data), RAG overcomes one of the greatest limitations of LLMs. With advancements in agentic AI, where models orchestrate retrieval, reasoning, and generation autonomously, will push RAG even further. Combined with hardware acceleration (e.g., NVIDIA’s Grace Hopper Superchip) and open-source frameworks like LangChain, and supported by enterprise-ready infrastructures such as FPT AI Factory, which delivers high-performance GPUs for training and deploying complex RAG models, RAG will continue to evolve into the backbone of enterprise-grade generative AI. Ultimately, Retrieval-Augmented Generation is not just a solution to hallucinations and knowledge gaps, it is the foundation enabling intelligent assistants, advanced chatbots, and enterprise-ready AI systems across industries.

AI Factory Playbook: A Developer’s Guide to Building Secure, Accelerated Gen AI Applications

11:46 24/09/2025

At NVIDIA AI Day, Mr. Pham Vu Hung, Solutions Architect & Senior Consultant at FPT Smart Cloud, FPT Corporation delivered the keynote “AI Factory Playbook: A Developer's Guide to Building Secure, Accelerated Gen AI Applications.” Mr. Hung gives insights on how to achieve end-to-end AI development, from building generative AI models to deploying AI agents for your enterprise, on the NVIDIA H100/H200 GPU Cloud Platform using the domestically deployed AI factory. Specifically, the presentation touches on the benefits of the homegrown AI factory through rapid development and an optimized inference environment, with specific use cases. End-to-end AI development of a domestic AI factory: complete development from generation AI to AI agents in a secure environment at a data center. Acceleration with NVIDIA H100/H200 GPUs: Accelerate training and inference with the latest GPUs to significantly shorten development time. Generative AI construction and fine-tuning: Highly accurate models are realized through state-of-the-art model construction and fine-tuning with individual data. Building Up the AI/ML Stack FPT AI Factory provides a comprehensive AI/ML infrastructure stack built on NVIDIA-certified Tier 3 & 4 data centers, ranked 36th and 38th in the TOP500 list (June 2025). Among its wide range of offerings, standout services include GPU Container, GPU Virtual Machine, and FPT AI Studio. Developers can also leverage Bare Metal, GPU Cluster, AI Notebook, and FPT AI Inference. [caption id="attachment_67178" align="aligncenter" width="1972"] Image: The AI/ML stack architecture on FPT AI Factory[/caption] GPU Container: Designed for experimentation workloads with built-in monitoring, logging, and collaborative notebooks. Developers can easily share data, write code, unit test, and execute in a highly flexible environment. GPU Virtual Machine: Multi-purpose VMs optimized for both training and inference, with flexible configuration options (from 1 to 8 GPUs per VM, up to 141GB VRAM per GPU). GPU Cluster: Scalable infrastructure for distributed training and large-scale inference. Equipped with NVLink, MIG/MPS/Time-slice GPU sharing, and advanced security add-ons like audit logs and CIS benchmarks. AI Notebook: A managed JupyterLab environment preloaded with essential AI/ML libraries. Developers can start coding instantly on enterprise-grade GPUs without setup overhead, achieving up to 70% cost savings compared to traditional notebook environments. FPT AI Studio: A no-code/low-code MLOps platform that integrates data pipelines, fine-tuning strategies (SFT, DPO, continual training), experiment tracking, and model registry. Its drag-and-drop GUI enables developers to fine-tune models quickly and store them in a centralized model hub. FPT AI Inference: Ready-to-use APIs with competitive token pricing, enabling developers to deploy fine-tuned models quickly and cost-effectively. During the keynote, Mr. Hung emphasized not only the broad capabilities of AI Factory but also illustrated them through a concrete customer case. For instance, FPT collaborated with a Japanese IT company to fine-tune the Donut (Document Understanding Transformer) model on a dataset exceeding 300GB. By leveraging GPU Container in combination with FPT Object Storage, the customer was able to handle large-scale document data efficiently while optimizing costs - a practical example of how enterprises can take advantage of FPT AI Factory’s infrastructure for real-world workloads. [caption id="attachment_67179" align="aligncenter" width="1674"] Image: Fine-tuning pipeline of the Donut model on FPT AI Factory[/caption] Accelerating the Deployment of Real-World AI Solution One of the highlights was a live demo of an AI Camera Agent designed for video search and summarization. The workflow is simple yet powerful: select a video, provide a brief description of what you want to find, and the agent automatically identifies relevant segments and generates concise summaries in real time. What makes this possible is the integration of NVIDIA Blueprints, which provide pre-validated solution architectures and tools for rapid experimentation. Instead of spending months building a prototype from scratch, we were able to move from concept to a working demo in just a single day. This acceleration not only validates the feasibility of the solution but also gives enterprises a tangible way to envision how AI can be applied to their own video data challenges. [caption id="attachment_67180" align="aligncenter" width="1262"] Image: The architecture of the AI Camera Agent solution (NVIDIA)[/caption] In particular, FPT AI Factory delivers a full-stack environment, from infrastructure components such as GPU, VM, and Kubernetes to the developer tools required, to deploy AI solutions quickly and efficiently. With a flexible architecture and ready-to-use models, developers can even stand up complete solutions powered by just a single NVIDIA H100 GPU, balancing performance, scalability, and cost-effectiveness. For example, FPT AI Inference offers a library of ready-to-use models that developers can integrate instantly through simple API calls. With competitive per-token pricing, teams can run inference workloads faster while significantly reducing costs, enabling businesses to bring AI-powered applications to market more efficiently. Taking AI Model Fine-Tuning to the Next Level Developers can fine-tune models on GPU Container, but more for experimentation. For implemented solution, we need solutions that can automate the fine-tuning process. Introducing FPT AI Studio with popular components in the MLOps processes like AI Notebooks, Data Processing… FPT AI Studio allows users to integrate data, base model, different fine-tuning strategies such as continual training… The GUI is user-friendly, drag-and-drop interface. The fine-tuned model can be stored in the model hub. After that, we can transfer these models to FPT AI Inference. Developers today can fine-tune models directly on GPU container, which is great for experimentation and quick iteration. However, moving from one-off experiments to a production-ready solution requires more than just compute power. It needs automation, reproducibility, and integration into a full MLOps pipeline. FPT AI Studio provides the right environment to streamline fine-tuning and deployment. The platform is designed to be accessible, with a drag-and-drop GUI for building workflows quickly, while still allowing deep customization for advanced users. It comes with common MLOps components: AI Notebook for code-driven experimentation Data Processing pipelines to handle preprocessing and feature engineering. Fine-tuning strategies including continual training, domain adaptation, and transfer learning. Once a model is fine-tuned in AI Studio, it can be stored in the Model Hub - a central repository for versioning, sharing, and reuse. From there, models can be seamlessly transferred to FPT AI Inference for scalable, low latency serving in production environments. [caption id="attachment_67182" align="aligncenter" width="1312"] Image: The training pipeline of FPT AI Studio[/caption] For illustration, Mr. Hung walked through a case study of how FPT AI Studio can be applied to adapt a large language model for the Vietnamese healthcare domain. The base model chosen is Llama-3.1-8B, which provides a strong balance between capacity and efficiency. The task is to develop a model optimized for healthcare question answering, requiring domain-specific adaptation while retaining the general reasoning ability of the base model. The dataset consists of Vietnamese healthcare documents, and the goal is to enhance factual recall, domain precision, and response quality in clinical Q&A scenarios. The first approach relies on continual pre-training. Using 24 NVIDIA H100 GPUs across three nodes, the model is exposed to the healthcare dataset for three epochs. The entire pipeline takes approximately 31 hours to complete. The second approach applies supervised fine-tuning with LoRA adapters, which represents a more resource-efficient alternative. In this setting, only four NVIDIA H100 GPUs are used on a single node, and training is performed for five epochs. The total runtime of the pipeline is roughly 3 hours. While less computationally demanding, this strategy still delivers significant improvements for downstream Q&A tasks. [caption id="attachment_67183" align="aligncenter" width="922"] Image. Results of pre-training and SFT LLM with the healthcare dataset[/caption] Best Practices First, it’s important to select the right tools for the right workloads to maximize both performance and cost-efficiency. With FPT AI Factory, users are equipped with the necessary tools for any types of AI/ML workloads for faster, more efficient AI innovation. For early experimentation, GPU Container or AI Notebook provide developers with flexible environments for testing ideas and running quick prototypes. For deployment, the right choice depends on the workload: GPU Container are ideal for light-weight inferencing, whereas GPU Virtual Machine deliver the performance needed for real-time or batch inferencing. High-performance computing (HPC) workloads run best on Metal Cloud, which provides bare-metal performance for intensive tasks. Finally, organizations looking for ready-to-use models can turn to the AI Marketplace, which offers pre-trained LLMs and services to accelerate adoption without additional fine-tuning. [caption id="attachment_67184" align="aligncenter" width="941"] Image. FPT AI Factory solutions for different AI/ML workloads[/caption] Second, developers should optimize training workloads. Optimizing training workloads for large generative AI models requires a combination of hardware-aware techniques and workflow engineering. One key practice is to leverage mixed-precision training, using formats such as FP16 or BF16 to accelerate computation on NVIDIA GPUs while reducing memory usage by up to half. This not only shortens training time but also maintains accuracy with automatic scaling. Distributed training is equally important, where strategies like PyTorch DDP or pipeline parallelism allow workloads to scale across multiple GPUs or nodes, improving throughput and accelerating development cycles. In multi-node environments, optimizing cluster interconnects with NVLink or InfiniBand can further boost training speed by up to three times, ensuring efficient synchronization for large-scale AI tasks. Data pipelines and storage must also be optimized, employing NVIDIA DALI and scalable I/O to avoid bottlenecks. Finally, benchmarking tools such as FPT AI Factory’s GPU performance tests and NVIDIA’s MLPerf results help validate configurations, ensuring cost-effective scaling for fine-tuning. Third, it is crucial to optimize inference workloads for delivering scalable, low-latency generative AI services. One effective approach is applying quantization and lower precision with NVIDIA TensorRT, converting models to FP8 or INT8 for up to 1.4× higher throughput with minimal accuracy trade-offs. For large language models, managing the KV cache efficiently is equally important; techniques such as PagedAttention and chunked prefill can cut memory fragmentation and reduce time-to-first-token by as much as 2–5× in multi-user scenarios. Speculative decoding further boosts performance by pairing a smaller draft model with the main LLM to predict multiple tokens at once, yielding 1.9–3.6× throughput gains while minimizing latency, which is especially valuable in real-time applications like video summarization. Scaling with multi-GPU parallelism also plays a key role, enabling up to 1.5× gains on distributed inference tasks in high-volume clusters. Finally, model distillation and pruning help shrink models, cutting costs and latency by 20–30% without sacrificing output quality. Key Takeaways How to Architect a Secure, End-to-End AI Workflow: We will deconstruct the architecture of a production "AI factory," focusing on the design principles for creating a secure development lifecycle within a local data center. You'll learn the technical steps for ensuring data isolation, managing secure model hosting, and creating a reliable pathway from model fine-tuning to the deployment of enterprise-grade AI agents. Practical Techniques for GPU-Accelerated LLM Operations: Go beyond the specs and learn how to practically leverage high-performance GPUs (like the NVIDIA H100/H200). This session will cover specific, actionable best practices for optimizing both training and inference workloads to maximize throughput, minimize latency, and significantly reduce development cycles for demanding generative AI applications.

The Comprehensive Workflow of Agentic AI: How FPT AI Factory is Accelerating AI Agents Development

18:29 17/09/2025

As artificial intelligence continues to revolutionize industries, understanding the inner workings of AI systems becomes not just fascinating but crucial. Among the most intriguing innovations lies Agentic AI, a technology designed to mimic human-like decision-making, problem-solving, and even creativity. Rather than serving as static tools that merely respond to user prompts, agentic systems are built to operate with autonomy: they can interpret objectives, decompose them into actionable steps, and pursue outcomes through iterative reasoning and execution. This capability positions Agentic AI not just as an enhancement of existing models, but as a framework for orchestrating complex, multi-step processes with minimal human intervention. But what does the journey of an Agentic AI look like behind the scenes? How does it seamlessly process complex tasks, adapt to challenges, and improve over time? These are the primary steps of artificial intelligence agents, each crucial for creating adaptive, intelligent systems. 1. Perception AI agent perception refers to the ability of an AI system to gather and interpret information from its environment, whether that be through visual, auditory, textual, or other forms of data. This process enables the agent to sense the world, creating a foundational layer for decision-making and problem-solving. Just as humans rely on their senses to navigate their surroundings, AI agents depend on their perception capabilities to understand inputs, recognize patterns, and respond accordingly. Perception is not a passive process. It involves actively gathering data, processing it, and then using this information to form an understanding of the current situation. The types of data that AI agents perceive vary based on the system's design, and these can include everything from written text and spoken words to images, sounds, or even environmental changes. In essence, perception serves as the AI agent’s window to the world, providing it with the necessary information to act intelligently and adaptively. AI agents utilize various types of perception to understand and interpret their environment. Each type of perception allows an agent to interact with the world in distinct ways, enabling it to process different forms of data and make informed decisions. The key categories include: Textual Perception: Understanding and generating text through Natural Language Processing (NLP). Allowing AI systems to interact with textual data, such as articles, books, emails, and web pages. This is essential for applications like chatbots and virtual assistants. Predictive Perception: AI anticipates future events based on historical data, used in fields like finance and autonomous vehicles. Visual Perception: Using computer vision to interpret images and videos, crucial for tasks such as object detection and facial recognition. Environmental Perception: AI gathers information through sensors like GPS or motion detectors to adapt to dynamic environments. For example, robots use this to detect and avoid obstacles while moving. Auditory Perception: The ability to process and understand sound, particularly speech, enabling systems like voice assistants. 2. Reasoning and Decision-making Reasoning is the cognitive process that allows AI agents to make decisions, solve problems, and infer conclusions based on the information they perceive. It is a critical aspect of an AI agent's ability to act intelligently and adaptively in dynamic environments. While perception enables AI to gather data about the world, reasoning empowers the agent to interpret that data, draw logical conclusions, and make informed choices. In other words, perception is noticing the traffic light turning red; reasoning is realizing you need to stop the car to avoid danger. For AI agents, reasoning works in a similar way, it bridges raw input and purposeful action. In essence, reasoning involves using rules, heuristics, logic, and learned patterns to process the information provided by the perception system. It allows AI agents to not only understand the current state of their environment but also to predict outcomes, handle uncertainties, and devise strategies for achieving their goals. Reasoning can be divided into various types, each of which plays a unique role in enabling AI systems to operate effectively in different scenarios. Heuristic Reasoning: Simplifies decision-making using experience-based rules of thumb, ideal for real-time applications. For instance, when navigating a map, AI might choose the "best" route based on experience rather than calculating every possible path. ReWoo (Recursive World Optimization): A process of iterative refinement where AI improves its understanding and decisions over time. In practical terms, ReWoo allows an AI agent to adjust and optimize its strategies based on feedback and changing circumstances. ReAct (Reasoning and Acting): A hybrid approach where reasoning and acting occur simultaneously, beneficial in environments requiring immediate feedback such as autonomous driving or real-time strategy games. Self-reflection: The agent evaluates its past decisions to learn and improve. Conditional Logic: Decision-making based on specific conditions, often used in automation systems. For example, a smart thermostat might use conditional logic to adjust the temperature: "If the room temperature is below 70°F, then increase the heating." 3. Action The action module implements the agent’s decisions in the real world, allowing it to interact with users, digital systems, or even physical environments. After perceiving its environment and reasoning about the best course of action, the AI agent must execute its decisions in the real world. In the context of AI, action is not limited to physical movements or interactions but can also include processes such as data manipulation, decision execution, and the triggering of automated systems. Whether it involves a robot physically navigating an environment, a software system processing data, or an AI-powered virtual assistant responding to a command, action is the phase where the AI agent brings its reasoning and understanding to life. 4. Learn AI agent learning refers to the process through which an AI agent improves its performance over time by gaining knowledge from experience, data, or feedback. Instead of relying solely on pre-programmed instructions, an AI agent can adapt and evolve by learning from its environment and the outcomes of its actions. This ability to learn is what allows AI agents to handle new and unseen situations, make better decisions, and optimize their strategies in dynamic, real-world scenarios. AI agent learning is essential for creating intelligent systems capable of self-improvement. Just as humans learn from experience and apply that knowledge to future challenges, AI agents use various learning techniques to enhance their decision-making and problem-solving capabilities. Through continuous learning, AI agents can refine their behavior and better align with their goals over time. The methods of learning vary based on how the agent interacts with data, the feedback it receives, and the type of tasks it needs to perform. Below are the key learning approaches used by AI agents: Unsupervised Learning: Identifies patterns and structures in data without labeled examples. AI can group customers based on purchasing behavior without being given labels Supervised Learning: Trains AI on labeled data to predict outcomes based on known inputs. Reinforcement Learning: The agent learns through trial and error, receiving feedback in the form of rewards or penalties. Multiagent Learning: Involves collaboration and competition between agents to solve problems more effectively. Agentic AI represents more than just an upgrade to existing systems. It marks a shift toward truly adaptive, autonomous intelligence. By perceiving, reasoning, acting, and learning, these agents mirror essential aspects of human cognition while continuously improving through experience. However, building such agents is far from simple; organizations will need a strong and resilient infrastructure. From fast GPU resources to flexible model training environments and seamless model deployment, these capabilities are what transform theory into practice. 5. How FPT AI Factory Accelerates the Process of Developing AI Agents In response to this need, FPT has launched FPT AI Factory, providing a comprehensive solution for developing AI agents through three key services: FPT AI Infrastructure, FPT AI Studio, and FPT AI Inference. Data Processing Foundation (FPT AI Infrastructure) Every successful AI agent relies on a continuous data flywheel that drives improvement. FPT AI Factory’s NVIDIA H100/H200 GPU infrastructure powers this process by collecting diverse data (conversations, user interactions, sensor feeds), processing and labeling it for agent training, and deploying smarter AI agents. These agents generate new data from user interactions, feeding back into the system to enhance future iterations. This self-reinforcing cycle leads to increasingly intelligent and responsive AI systems as more agents are deployed, creating a continuous loop of improvement. AI Agent Development (FPT AI Studio) Once data is prepared, developers can use FPT AI Studio to build and train intelligent agents in a collaborative cloud environment. The platform supports the development of various AI agent types - from conversational assistants to decision-making systems - providing tools for model training, behavior fine-tuning, and agent performance optimization to ensure they respond accurately to real-world scenarios. AI Agent Deployment & Serving (FPT AI Inference) After development and testing, FPT AI Inference enables seamless deployment of AI agents into production environments. These deployed agents not only serve users reliably but also feed valuable interaction data back into the flywheel, creating a continuous improvement loop. Whether you're launching a customer service chatbot, deploying an autonomous vehicle system, or integrating a recommendation agent into an e-commerce platform, each user interaction becomes part of the data flywheel that makes your next generation of AI agents even smarter. From concept to production, FPT AI Factory provides the complete infrastructure backbone that transforms AI agent ideas into intelligent, self-improving systems through the power of the data flywheel effect.

Fine-Tuning Llama 3 in 30 Minutes on FPT AI Factory: Accelerating Enterprise AI Development

11:19 03/09/2025

Recently, FPT hosted a webinar titled “Fine-Tuning Llama 3 in 30 Minutes on FPT AI Factory”, featuring Mr. Donald Murataj, AI Expert at FPT. The session focused on practical techniques for efficiently fine-tuning the Llama 3 model on the FPT AI Factory platform. Generative AI – An Inevitable Trend for Enterprises In today’s landscape, artificial intelligence (AI) has become one of the key drivers of enterprise growth. In particular, Generative AI (GenAI) is emerging as a breakthrough technology that not only optimizes operational efficiency and enhances customer experience but also paves the way for entirely new business models. The greatest challenge for enterprises lies in how to personalize large language models such as Llama 3 with their own data and unique business context. This is precisely where fine-tuning becomes the key to unlocking the transformative value of GenAI. The webinar organized by FPT demonstrated that this otherwise complex process can be executed quickly, seamlessly, and effectively on the FPT AI Factory platform. Fine-Tuning Llama 3 in Just 30 Minutes The highlight of the webinar was a live demonstration, where an FPT expert successfully completed the entire fine-tuning process of Llama 3 in just 30 minutes, guiding participants step by step: Step 1: Preparing a training dataset tailored to real-world business needs, enabling the model to understand the specific context and language of the enterprise. Step 2: Initializing a GPU Container environment on FPT AI Factory to ensure high-speed processing, system stability, and seamless scalability. Step 3: Executing the fine-tuning process directly through an intuitive interface—simple to operate while providing full control over every stage. Step 4: Evaluating the results and comparing them with the baseline model to clearly demonstrate improvements in performance and accuracy. What impressed participants the most was the simplicity and accessibility of FPT AI Factory. Even technical teams with limited AI development experience could quickly build their own customized AI models. Whereas fine-tuning traditionally required several days, the entire process can now be completed in less than an hour—powerful evidence of the efficiency and optimization enabled by FPT AI Factory. This experience has transformed a traditionally complex process into one that is fast, practical, and easy to adopt, opening the door for enterprises to begin experimenting with AI from the very first steps. 👉 Watch the full webinar replay here: https://www.youtube.com/watch?v=6L1nQteXAnM&ab_channel=FPTAIFactory FPT AI Factory - A Comprehensive AI Development Platform for Enterprises All of this is made possible by FPT AI Factory – a comprehensive AI development platform built on state-of-the-art infrastructure, powered by NVIDIA H100/H200 GPUs and NVIDIA AI Enterprise software. Combined with FPT’s practical deployment expertise, FPT AI Factory enables enterprises to accelerate model development, optimize costs, and scale deployments with flexibility and security. The platform is comprised of four key components: FPT AI Infrastructure: High-performance, energy-efficient computing infrastructure for large language models (LLMs) and multimodal models. FPT AI Studio: A cost-efficient environment for fast fine-tuning, experimentation, and prototyping. FPT AI Inference: A high-performance, low-latency serving platform designed for production-ready AI applications. FPT AI Agents: A platform for building and operating intelligent, multilingual AI agents seamlessly integrated into enterprise workflows. In addition, FPT AI Factory offers more than 20 ready-to-use Generative AI products, enabling enterprises to quickly apply AI across customer experience, operations, human resources management, and cost optimization. 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months
cookielawinfo-checbox-functional	11 months
cookielawinfo-checbox-others	11 months
cookielawinfo-checkbox-necessary	11 months
cookielawinfo-checkbox-performance	11 months
viewed_cookie_policy	11 months