Blogs Tech

Categories

Blog chia sẻ kiến thức FPT Cloud

What’s New on FPT AI Factory

16:39 30/09/2025
   

Enhancing the Power of Generative AI with Retrieval-Augmented Generation

18:38 29/09/2025
Artificial Intelligence (AI) is advancing rapidly, transforming industries and reshaping how organizations interact with technology. At the center of this evolution are Large Language Models (LLMs) such as OpenAI’s ChatGPT and Google Gemini. These models deliver impressive capabilities in understanding and generating natural language, making them valuable across multiple business domains.  However, LLMs also have inherent limitations. Their knowledge is based solely on pre-trained data, which can become static, outdated, or incomplete. As a result, they may produce inaccurate or misleading outputs, and struggle with specialized or real-time queries.  To overcome these challenges, Retrieval-Augmented Generation (RAG) has emerged. This approach combines the generative strengths of LLMs with the precision of external knowledge retrieval, enabling more accurate, reliable, and business-ready AI solutions.  What Is Retrieval-Augmented Generation?  Retrieval-Augmented Generation (RAG) is an AI approach built to improve how large language models (LLMs) generate responses. Instead of relying solely on the model’s pre-trained knowledge, RAG integrates a retriever component that sources information from external knowledge bases such as APIs, online content, databases, or document repositories.  RAG was developed to improve the quality of feedback for LLMs  The retriever can be tailored to achieve different levels of semantic precision and depth, commonly using:  Vector Databases: User queries are transformed into dense vector embeddings (via transformer-based models like BERT) to perform similarity searches. Alternatively, sparse embeddings with TF-IDF can be applied, relying on term frequency.  Graph Databases: Knowledge is structured through relationships among entities extracted from text. This ensures high accuracy but requires very precise initial queries.  SQL Databases: Useful for storing structured information, though less flexible for semantic-driven search tasks.  RAG is especially effective for handling vast amounts of unstructured data, such as the information scattered across the internet. While this data is abundant, it is rarely organized in a way that directly answers user queries.  That is why RAG has become widely adopted in virtual assistants and chatbots (e.g., Siri, Alexa). When a user asks a question, the system retrieves relevant details from available sources and generates a clear, concise, and contextually accurate answer. For instance, if asked, “How do I reset the ABC remote?”, RAG can pull instructions from product manuals and deliver a straightforward response.  By blending external knowledge retrieval with LLM capabilities, RAG significantly enhances user experiences, enabling precise and reliable answers even in specialized or complex scenarios.  The RAG model is often applied in virtual assistants and chatbots  Why is RAG important?   Large Language Models (LLMs) like OpenAI’s ChatGPT and Google Gemini have set new standards in natural language processing, with capabilities ranging from comprehension and summarization to content generation and prediction. Yet, despite their impressive performance, they are not without limitations. When tasks demand domain-specific expertise or up-to-date knowledge beyond the scope of their training data, LLMs may produce outputs that appear fluent but are factually incorrect. This issue is commonly referred to as AI hallucination.  The challenge becomes even more apparent in enterprise contexts. Organizations often manage massive repositories of proprietary information—technical manuals, product documentation, or knowledge bases—that are difficult for general-purpose models to navigate. Even advanced models like GPT-4, designed to process lengthy inputs, can still encounter problems such as the “lost in the middle” effect, where critical details buried in large documents fail to be captured.  Retrieval-Augmented Generation (RAG) emerged as a solution to these challenges. By integrating a retrieval mechanism, RAG allows LLMs to pull information directly from external sources, including both public data and private enterprise repositories. This approach not only bridges gaps in the model’s knowledge but also reduces the risk of hallucination, ensuring responses are grounded in verifiable information.  For applications like chatbots, virtual assistants, and question-answering systems, the combination of retrieval and generation marks a significant step forward—enabling accurate, up-to-date, and context-aware interactions that enterprises can trust.  RAG enables LLMs to retrieve information from external sources, limiting AI hallucination  Retrieval-Augmented Generation Pipeline  Benefits of RAG  RAG offers several significant advantages over standalone LLMs:  Up-to-Date Knowledge: Dynamically retrieves the latest information without retraining the model.  Reduced Hallucination: Grounded answers minimize the risk of fabricated content.  Transparency: Provides source references, enabling users to verify claims.  Cost Efficiency: Eliminates frequent re-training cycles, reducing computational and financial overhead.  Scalability: Works across domains, from healthcare and finance to enterprise IT.  Versatility: Powers applications such as chatbots, search systems, and intelligent summarization tools.  Practical Use Cases Across Industries  RAG is emerging as the key to helping Generative AI overcome the limitations of models like ChatGPT or Gemini, which rely solely on pre-trained data that can quickly become outdated or inaccurate.  By combining the generative capabilities of language models with external data retrieval, RAG delivers clear, real-time answers, minimizes AI hallucination, and helps businesses optimize costs.  In practice, RAG is already shaping the future of AI across multiple domains:  Chatbots and Customer Service: Provide instant, accurate responses by retrieving answers directly from product manuals, FAQs, or knowledge bases.  Healthcare: Deliver reliable medical insights by sourcing information from verified clinical guidelines and research databases.  Finance: Equip analysts with real-time market updates and contextual insights drawn from live data feeds.  Knowledge Management: Help employees interact with technical documentation and compliance materials in a natural, conversational way.  These practical use cases illustrate how RAG makes AI more reliable, transparent, and truly valuable across industries.  Future Outlook  RAG represents a pivotal step toward trustworthy, authoritative AI. By bridging parameterized knowledge (learned during training) with retrieved knowledge (dynamic, external data), RAG overcomes one of the greatest limitations of LLMs.  With advancements in agentic AI, where models orchestrate retrieval, reasoning, and generation autonomously, will push RAG even further. Combined with hardware acceleration (e.g., NVIDIA’s Grace Hopper Superchip) and open-source frameworks like LangChain, and supported by enterprise-ready infrastructures such as FPT AI Factory, which delivers high-performance GPUs for training and deploying complex RAG models, RAG will continue to evolve into the backbone of enterprise-grade generative AI.  Ultimately, Retrieval-Augmented Generation is not just a solution to hallucinations and knowledge gaps, it is the foundation enabling intelligent assistants, advanced chatbots, and enterprise-ready AI systems across industries. 

AI Factory Playbook: A Developer’s Guide to Building Secure, Accelerated Gen AI Applications

11:46 24/09/2025
At NVIDIA AI Day, Mr. Pham Vu Hung, Solutions Architect & Senior Consultant at FPT Smart Cloud, FPT Corporation delivered the keynote “AI Factory Playbook: A Developer's Guide to Building Secure, Accelerated Gen AI Applications.” Mr. Hung gives insights on how to achieve end-to-end AI development, from building generative AI models to deploying AI agents for your enterprise, on the NVIDIA H100/H200 GPU Cloud Platform using the domestically deployed AI factory. Specifically, the presentation touches on the benefits of the homegrown AI factory through rapid development and an optimized inference environment, with specific use cases. End-to-end AI development of a domestic AI factory: complete development from generation AI to AI agents in a secure environment at a data center. Acceleration with NVIDIA H100/H200 GPUs: Accelerate training and inference with the latest GPUs to significantly shorten development time. Generative AI construction and fine-tuning: Highly accurate models are realized through state-of-the-art model construction and fine-tuning with individual data. Building Up the AI/ML Stack FPT AI Factory provides a comprehensive AI/ML infrastructure stack built on NVIDIA-certified Tier 3 & 4 data centers, ranked 36th and 38th in the TOP500 list (June 2025). Among its wide range of offerings, standout services include GPU Container, GPU Virtual Machine, and FPT AI Studio. Developers can also leverage Bare Metal, GPU Cluster, AI Notebook, and FPT AI Inference. [caption id="attachment_67178" align="aligncenter" width="1972"] Image: The AI/ML stack architecture on FPT AI Factory[/caption]   GPU Container: Designed for experimentation workloads with built-in monitoring, logging, and collaborative notebooks. Developers can easily share data, write code, unit test, and execute in a highly flexible environment. GPU Virtual Machine: Multi-purpose VMs optimized for both training and inference, with flexible configuration options (from 1 to 8 GPUs per VM, up to 141GB VRAM per GPU). GPU Cluster: Scalable infrastructure for distributed training and large-scale inference. Equipped with NVLink, MIG/MPS/Time-slice GPU sharing, and advanced security add-ons like audit logs and CIS benchmarks. AI Notebook: A managed JupyterLab environment preloaded with essential AI/ML libraries. Developers can start coding instantly on enterprise-grade GPUs without setup overhead, achieving up to 70% cost savings compared to traditional notebook environments. FPT AI Studio: A no-code/low-code MLOps platform that integrates data pipelines, fine-tuning strategies (SFT, DPO, continual training), experiment tracking, and model registry. Its drag-and-drop GUI enables developers to fine-tune models quickly and store them in a centralized model hub. FPT AI Inference: Ready-to-use APIs with competitive token pricing, enabling developers to deploy fine-tuned models quickly and cost-effectively. During the keynote, Mr. Hung emphasized not only the broad capabilities of AI Factory but also illustrated them through a concrete customer case. For instance, FPT collaborated with a Japanese IT company to fine-tune the Donut (Document Understanding Transformer) model on a dataset exceeding 300GB. By leveraging GPU Container in combination with FPT Object Storage, the customer was able to handle large-scale document data efficiently while optimizing costs - a practical example of how enterprises can take advantage of FPT AI Factory’s infrastructure for real-world workloads. [caption id="attachment_67179" align="aligncenter" width="1674"] Image: Fine-tuning pipeline of the Donut model on FPT AI Factory[/caption] Accelerating the Deployment of Real-World AI Solution One of the highlights was a live demo of an AI Camera Agent designed for video search and summarization. The workflow is simple yet powerful: select a video, provide a brief description of what you want to find, and the agent automatically identifies relevant segments and generates concise summaries in real time. What makes this possible is the integration of NVIDIA Blueprints, which provide pre-validated solution architectures and tools for rapid experimentation. Instead of spending months building a prototype from scratch, we were able to move from concept to a working demo in just a single day. This acceleration not only validates the feasibility of the solution but also gives enterprises a tangible way to envision how AI can be applied to their own video data challenges. [caption id="attachment_67180" align="aligncenter" width="1262"] Image: The architecture of the AI Camera Agent solution (NVIDIA)[/caption]   In particular, FPT AI Factory delivers a full-stack environment, from infrastructure components such as GPU, VM, and Kubernetes to the developer tools required, to deploy AI solutions quickly and efficiently. With a flexible architecture and ready-to-use models, developers can even stand up complete solutions powered by just a single NVIDIA H100 GPU, balancing performance, scalability, and cost-effectiveness. For example, FPT AI Inference offers a library of ready-to-use models that developers can integrate instantly through simple API calls. With competitive per-token pricing, teams can run inference workloads faster while significantly reducing costs, enabling businesses to bring AI-powered applications to market more efficiently. Taking AI Model Fine-Tuning to the Next Level Developers can fine-tune models on GPU Container, but more for experimentation. For implemented solution, we need solutions that can automate the fine-tuning process. Introducing FPT AI Studio with popular components in the MLOps processes like AI Notebooks, Data Processing… FPT AI Studio allows users to integrate data, base model, different fine-tuning strategies such as continual training… The GUI is user-friendly, drag-and-drop interface. The fine-tuned model can be stored in the model hub. After that, we can transfer these models to FPT AI Inference. Developers today can fine-tune models directly on GPU container, which is great for experimentation and quick iteration. However, moving from one-off experiments to a production-ready solution requires more than just compute power. It needs automation, reproducibility, and integration into a full MLOps pipeline. FPT AI Studio provides the right environment to streamline fine-tuning and deployment. The platform is designed to be accessible, with a drag-and-drop GUI for building workflows quickly, while still allowing deep customization for advanced users. It comes with common MLOps components: AI Notebook for code-driven experimentation Data Processing pipelines to handle preprocessing and feature engineering. Fine-tuning strategies including continual training, domain adaptation, and transfer learning. Once a model is fine-tuned in AI Studio, it can be stored in the Model Hub - a central repository for versioning, sharing, and reuse. From there, models can be seamlessly transferred to FPT AI Inference for scalable, low latency serving in production environments. [caption id="attachment_67182" align="aligncenter" width="1312"] Image: The training pipeline of FPT AI Studio[/caption]   For illustration, Mr. Hung walked through a case study of how FPT AI Studio can be applied to adapt a large language model for the Vietnamese healthcare domain. The base model chosen is Llama-3.1-8B, which provides a strong balance between capacity and efficiency. The task is to develop a model optimized for healthcare question answering, requiring domain-specific adaptation while retaining the general reasoning ability of the base model. The dataset consists of Vietnamese healthcare documents, and the goal is to enhance factual recall, domain precision, and response quality in clinical Q&A scenarios. The first approach relies on continual pre-training. Using 24 NVIDIA H100 GPUs across three nodes, the model is exposed to the healthcare dataset for three epochs. The entire pipeline takes approximately 31 hours to complete. The second approach applies supervised fine-tuning with LoRA adapters, which represents a more resource-efficient alternative. In this setting, only four NVIDIA H100 GPUs are used on a single node, and training is performed for five epochs. The total runtime of the pipeline is roughly 3 hours. While less computationally demanding, this strategy still delivers significant improvements for downstream Q&A tasks. [caption id="attachment_67183" align="aligncenter" width="922"] Image. Results of pre-training and SFT LLM with the healthcare dataset[/caption] Best Practices First, it’s important to select the right tools for the right workloads to maximize both performance and cost-efficiency. With FPT AI Factory, users are equipped with the necessary tools for any types of AI/ML workloads for faster, more efficient AI innovation. For early experimentation, GPU Container or AI Notebook provide developers with flexible environments for testing ideas and running quick prototypes. For deployment, the right choice depends on the workload: GPU Container are ideal for light-weight inferencing, whereas GPU Virtual Machine deliver the performance needed for real-time or batch inferencing. High-performance computing (HPC) workloads run best on Metal Cloud, which provides bare-metal performance for intensive tasks. Finally, organizations looking for ready-to-use models can turn to the AI Marketplace, which offers pre-trained LLMs and services to accelerate adoption without additional fine-tuning. [caption id="attachment_67184" align="aligncenter" width="941"] Image. FPT AI Factory solutions for different AI/ML workloads[/caption] Second, developers should optimize training workloads. Optimizing training workloads for large generative AI models requires a combination of hardware-aware techniques and workflow engineering. One key practice is to leverage mixed-precision training, using formats such as FP16 or BF16 to accelerate computation on NVIDIA GPUs while reducing memory usage by up to half. This not only shortens training time but also maintains accuracy with automatic scaling. Distributed training is equally important, where strategies like PyTorch DDP or pipeline parallelism allow workloads to scale across multiple GPUs or nodes, improving throughput and accelerating development cycles. In multi-node environments, optimizing cluster interconnects with NVLink or InfiniBand can further boost training speed by up to three times, ensuring efficient synchronization for large-scale AI tasks. Data pipelines and storage must also be optimized, employing NVIDIA DALI and scalable I/O to avoid bottlenecks. Finally, benchmarking tools such as FPT AI Factory’s GPU performance tests and NVIDIA’s MLPerf results help validate configurations, ensuring cost-effective scaling for fine-tuning. Third, it is crucial to optimize inference workloads for delivering scalable, low-latency generative AI services. One effective approach is applying quantization and lower precision with NVIDIA TensorRT, converting models to FP8 or INT8 for up to 1.4× higher throughput with minimal accuracy trade-offs. For large language models, managing the KV cache efficiently is equally important; techniques such as PagedAttention and chunked prefill can cut memory fragmentation and reduce time-to-first-token by as much as 2–5× in multi-user scenarios. Speculative decoding further boosts performance by pairing a smaller draft model with the main LLM to predict multiple tokens at once, yielding 1.9–3.6× throughput gains while minimizing latency, which is especially valuable in real-time applications like video summarization. Scaling with multi-GPU parallelism also plays a key role, enabling up to 1.5× gains on distributed inference tasks in high-volume clusters. Finally, model distillation and pruning help shrink models, cutting costs and latency by 20–30% without sacrificing output quality. Key Takeaways How to Architect a Secure, End-to-End AI Workflow: We will deconstruct the architecture of a production "AI factory," focusing on the design principles for creating a secure development lifecycle within a local data center. You'll learn the technical steps for ensuring data isolation, managing secure model hosting, and creating a reliable pathway from model fine-tuning to the deployment of enterprise-grade AI agents. Practical Techniques for GPU-Accelerated LLM Operations: Go beyond the specs and learn how to practically leverage high-performance GPUs (like the NVIDIA H100/H200). This session will cover specific, actionable best practices for optimizing both training and inference workloads to maximize throughput, minimize latency, and significantly reduce development cycles for demanding generative AI applications.

The Comprehensive Workflow of Agentic AI: How FPT AI Factory is Accelerating AI Agents Development

18:29 17/09/2025
As artificial intelligence continues to revolutionize industries, understanding the inner workings of AI systems becomes not just fascinating but crucial. Among the most intriguing innovations lies Agentic AI, a technology designed to mimic human-like decision-making, problem-solving, and even creativity.   Rather than serving as static tools that merely respond to user prompts, agentic systems are built to operate with autonomy: they can interpret objectives, decompose them into actionable steps, and pursue outcomes through iterative reasoning and execution. This capability positions Agentic AI not just as an enhancement of existing models, but as a framework for orchestrating complex, multi-step processes with minimal human intervention.  But what does the journey of an Agentic AI look like behind the scenes? How does it seamlessly process complex tasks, adapt to challenges, and improve over time?   These are the primary steps of artificial intelligence agents, each crucial for creating adaptive, intelligent systems.   1. Perception  AI agent perception refers to the ability of an AI system to gather and interpret information from its environment, whether that be through visual, auditory, textual, or other forms of data. This process enables the agent to sense the world, creating a foundational layer for decision-making and problem-solving. Just as humans rely on their senses to navigate their surroundings, AI agents depend on their perception capabilities to understand inputs, recognize patterns, and respond accordingly.  Perception is not a passive process. It involves actively gathering data, processing it, and then using this information to form an understanding of the current situation. The types of data that AI agents perceive vary based on the system's design, and these can include everything from written text and spoken words to images, sounds, or even environmental changes. In essence, perception serves as the AI agent’s window to the world, providing it with the necessary information to act intelligently and adaptively.  AI agents utilize various types of perception to understand and interpret their environment. Each type of perception allows an agent to interact with the world in distinct ways, enabling it to process different forms of data and make informed decisions. The key categories include:  Textual Perception: Understanding and generating text through Natural Language Processing (NLP). Allowing AI systems to interact with textual data, such as articles, books, emails, and web pages. This is essential for applications like chatbots and virtual assistants.  Predictive Perception: AI anticipates future events based on historical data, used in fields like finance and autonomous vehicles.  Visual Perception: Using computer vision to interpret images and videos, crucial for tasks such as object detection and facial recognition.  Environmental Perception: AI gathers information through sensors like GPS or motion detectors to adapt to dynamic environments. For example, robots use this to detect and avoid obstacles while moving.  Auditory Perception: The ability to process and understand sound, particularly speech, enabling systems like voice assistants.   2. Reasoning and Decision-making  Reasoning is the cognitive process that allows AI agents to make decisions, solve problems, and infer conclusions based on the information they perceive. It is a critical aspect of an AI agent's ability to act intelligently and adaptively in dynamic environments. While perception enables AI to gather data about the world, reasoning empowers the agent to interpret that data, draw logical conclusions, and make informed choices. In other words, perception is noticing the traffic light turning red; reasoning is realizing you need to stop the car to avoid danger. For AI agents, reasoning works in a similar way, it bridges raw input and purposeful action.  In essence, reasoning involves using rules, heuristics, logic, and learned patterns to process the information provided by the perception system. It allows AI agents to not only understand the current state of their environment but also to predict outcomes, handle uncertainties, and devise strategies for achieving their goals.  Reasoning can be divided into various types, each of which plays a unique role in enabling AI systems to operate effectively in different scenarios.   Heuristic Reasoning: Simplifies decision-making using experience-based rules of thumb, ideal for real-time applications. For instance, when navigating a map, AI might choose the "best" route based on experience rather than calculating every possible path.  ReWoo (Recursive World Optimization): A process of iterative refinement where AI improves its understanding and decisions over time. In practical terms, ReWoo allows an AI agent to adjust and optimize its strategies based on feedback and changing circumstances.  ReAct (Reasoning and Acting): A hybrid approach where reasoning and acting occur simultaneously, beneficial in environments requiring immediate feedback such as autonomous driving or real-time strategy games.  Self-reflection: The agent evaluates its past decisions to learn and improve.  Conditional Logic: Decision-making based on specific conditions, often used in automation systems. For example, a smart thermostat might use conditional logic to adjust the temperature: "If the room temperature is below 70°F, then increase the heating."  3. Action  The action module implements the agent’s decisions in the real world, allowing it to interact with users, digital systems, or even physical environments. After perceiving its environment and reasoning about the best course of action, the AI agent must execute its decisions in the real world.  In the context of AI, action is not limited to physical movements or interactions but can also include processes such as data manipulation, decision execution, and the triggering of automated systems. Whether it involves a robot physically navigating an environment, a software system processing data, or an AI-powered virtual assistant responding to a command, action is the phase where the AI agent brings its reasoning and understanding to life.  4. Learn  AI agent learning refers to the process through which an AI agent improves its performance over time by gaining knowledge from experience, data, or feedback. Instead of relying solely on pre-programmed instructions, an AI agent can adapt and evolve by learning from its environment and the outcomes of its actions. This ability to learn is what allows AI agents to handle new and unseen situations, make better decisions, and optimize their strategies in dynamic, real-world scenarios.  AI agent learning is essential for creating intelligent systems capable of self-improvement. Just as humans learn from experience and apply that knowledge to future challenges, AI agents use various learning techniques to enhance their decision-making and problem-solving capabilities. Through continuous learning, AI agents can refine their behavior and better align with their goals over time.  The methods of learning vary based on how the agent interacts with data, the feedback it receives, and the type of tasks it needs to perform. Below are the key learning approaches used by AI agents:  Unsupervised Learning: Identifies patterns and structures in data without labeled examples. AI can group customers based on purchasing behavior without being given labels  Supervised Learning: Trains AI on labeled data to predict outcomes based on known inputs.  Reinforcement Learning: The agent learns through trial and error, receiving feedback in the form of rewards or penalties.  Multiagent Learning: Involves collaboration and competition between agents to solve problems more effectively.  Agentic AI represents more than just an upgrade to existing systems. It marks a shift toward truly adaptive, autonomous intelligence. By perceiving, reasoning, acting, and learning, these agents mirror essential aspects of human cognition while continuously improving through experience.   However, building such agents is far from simple; organizations will need a strong and resilient infrastructure. From fast GPU resources to flexible model training environments and seamless model deployment, these capabilities are what transform theory into practice.  5. How FPT AI Factory Accelerates the Process of Developing AI Agents  In response to this need, FPT has launched FPT AI Factory, providing a comprehensive solution for developing AI agents through three key services: FPT AI Infrastructure, FPT AI Studio, and FPT AI Inference.   Data Processing Foundation (FPT AI Infrastructure)  Every successful AI agent relies on a continuous data flywheel that drives improvement. FPT AI Factory’s NVIDIA H100/H200 GPU infrastructure powers this process by collecting diverse data (conversations, user interactions, sensor feeds), processing and labeling it for agent training, and deploying smarter AI agents. These agents generate new data from user interactions, feeding back into the system to enhance future iterations. This self-reinforcing cycle leads to increasingly intelligent and responsive AI systems as more agents are deployed, creating a continuous loop of improvement.  AI Agent Development (FPT AI Studio)  Once data is prepared, developers can use FPT AI Studio to build and train intelligent agents in a collaborative cloud environment. The platform supports the development of various AI agent types - from conversational assistants to decision-making systems - providing tools for model training, behavior fine-tuning, and agent performance optimization to ensure they respond accurately to real-world scenarios.  AI Agent Deployment & Serving (FPT AI Inference)  After development and testing, FPT AI Inference enables seamless deployment of AI agents into production environments. These deployed agents not only serve users reliably but also feed valuable interaction data back into the flywheel, creating a continuous improvement loop. Whether you're launching a customer service chatbot, deploying an autonomous vehicle system, or integrating a recommendation agent into an e-commerce platform, each user interaction becomes part of the data flywheel that makes your next generation of AI agents even smarter.  From concept to production, FPT AI Factory provides the complete infrastructure backbone that transforms AI agent ideas into intelligent, self-improving systems through the power of the data flywheel effect.

Fine-Tuning Llama 3 in 30 Minutes on FPT AI Factory: Accelerating Enterprise AI Development

11:19 03/09/2025
Recently, FPT hosted a webinar titled “Fine-Tuning Llama 3 in 30 Minutes on FPT AI Factory”, featuring Mr. Donald Murataj, AI Expert at FPT. The session focused on practical techniques for efficiently fine-tuning the Llama 3 model on the FPT AI Factory platform.  Generative AI – An Inevitable Trend for Enterprises  In today’s landscape, artificial intelligence (AI) has become one of the key drivers of enterprise growth. In particular, Generative AI (GenAI) is emerging as a breakthrough technology that not only optimizes operational efficiency and enhances customer experience but also paves the way for entirely new business models.  The greatest challenge for enterprises lies in how to personalize large language models such as Llama 3 with their own data and unique business context. This is precisely where fine-tuning becomes the key to unlocking the transformative value of GenAI. The webinar organized by FPT demonstrated that this otherwise complex process can be executed quickly, seamlessly, and effectively on the FPT AI Factory platform.  Fine-Tuning Llama 3 in Just 30 Minutes  The highlight of the webinar was a live demonstration, where an FPT expert successfully completed the entire fine-tuning process of Llama 3 in just 30 minutes, guiding participants step by step:  Step 1: Preparing a training dataset tailored to real-world business needs, enabling the model to understand the specific context and language of the enterprise.  Step 2: Initializing a GPU Container environment on FPT AI Factory to ensure high-speed processing, system stability, and seamless scalability.  Step 3: Executing the fine-tuning process directly through an intuitive interface—simple to operate while providing full control over every stage.  Step 4: Evaluating the results and comparing them with the baseline model to clearly demonstrate improvements in performance and accuracy.  What impressed participants the most was the simplicity and accessibility of FPT AI Factory. Even technical teams with limited AI development experience could quickly build their own customized AI models. Whereas fine-tuning traditionally required several days, the entire process can now be completed in less than an hour—powerful evidence of the efficiency and optimization enabled by FPT AI Factory.  This experience has transformed a traditionally complex process into one that is fast, practical, and easy to adopt, opening the door for enterprises to begin experimenting with AI from the very first steps.  👉 Watch the full webinar replay here: https://www.youtube.com/watch?v=6L1nQteXAnM&ab_channel=FPTAIFactory  FPT AI Factory - A Comprehensive AI Development Platform for Enterprises All of this is made possible by FPT AI Factory – a comprehensive AI development platform built on state-of-the-art infrastructure, powered by NVIDIA H100/H200 GPUs and NVIDIA AI Enterprise software. Combined with FPT’s practical deployment expertise, FPT AI Factory enables enterprises to accelerate model development, optimize costs, and scale deployments with flexibility and security.  The platform is comprised of four key components:  FPT AI Infrastructure: High-performance, energy-efficient computing infrastructure for large language models (LLMs) and multimodal models.  FPT AI Studio: A cost-efficient environment for fast fine-tuning, experimentation, and prototyping.  FPT AI Inference: A high-performance, low-latency serving platform designed for production-ready AI applications.  FPT AI Agents: A platform for building and operating intelligent, multilingual AI agents seamlessly integrated into enterprise workflows.  In addition, FPT AI Factory offers more than 20 ready-to-use Generative AI products, enabling enterprises to quickly apply AI across customer experience, operations, human resources management, and cost optimization.  1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3

Fine-Tuning OpenFlamingo on NVIDIA H100 GPUs

16:43 21/08/2025
1. Flamingo Introduction: Few-Shot Learning for Visual Language Models [caption id="attachment_65686" align="aligncenter" width="800"] DoryCredit: www.istockphoto.com[/caption]   Flamingo (original paper: [https://arxiv.org/pdf/2204.14198]) is a family of Visual Language Models (VLMs) designed by a team in Google DeepMind to solve the challenge of few-shot learning in multimodal machine learning. The model is built with three key architectural innovations: It bridges powerful, pre-trained vision-only and language-only models. It can handle sequences of arbitrarily interleaved visual and textual data. It can seamlessly ingest images or videos as input. This flexibility allows Flamingo to be trained on large-scale web data with mixed images and text, which is crucial for its ability to learn new tasks with only a few examples. As a result, a single Flamingo model can achieve state-of-the-art performance on a wide range of tasks, including visual question-answering, captioning, and multiple-choice questions simply by being prompted with task-specific examples. This few-shot approach often allows Flamingo to outperform models that have been fine-tuned on thousands of times more data. 2. How Flamingo Works [caption id="attachment_65677" align="aligncenter" width="700"] Multimodal LLM. Credit: dataiku.com[/caption]   Flamingo operates through a multimodal interface, processing a combination of images, videos, and textto generate relevant textual responses. This design allows it to adapt seamlessly to different tasks, functioning similarly to large language models (LLMs), which use text-based prompts to tackle diverse language-related challenges. Model Architecture OpenFlamingo combines a pretrained vision encoder and a language model using cross attention layers. The model architecture is shown below. [caption id="attachment_65678" align="aligncenter" width="960"] Credit: Google Deep Mind[/caption]   The architecture can be understood by following two main pathways: a. The Visual Pathway (Left side) This pathway is responsible for processing the visual data (images) and preparing it for the language model. Vision Encoder: This is a pre-trained model (indicated by the frozen snowflake icon) that extracts features from the input images. A key design choice is that this encoder’s weights are “frozen” and do not change during training. Perceiver Resampler: The output of the Vision Encoder is then fed into the Perceiver Resampler. This module maps the variable-sized visual features to a small, fixed number of output tokens. This component is trained from scratch (indicated by the purple fill), learning to produce a concise summary of the visual data. For Flamingo, number of output image tokens are set to be 5. b. The Language Pathway (Right Side) This pathway processes the text and fuses it with the visual information to generate a final output. Interleaved Input: The model takes an input sequence of text mixed with image placeholders (<image>). LM Blocks: The core of this pathway is a large, pre-trained Language Model (LM) (like a Chinchilla model). Similar to the Vision Encoder, these blocks are “frozen,” meaning their vast knowledge of language is leveraged without needing to be retrained. Gated XATTN-DENSE: This is the key innovation that connects the two pathways. These are new modules, trained from scratch, that are inserted between the LM blocks. When the model encounters an <image> placeholder in the text stream, the Gated XATTN-DENSE layer performs a cross-attention operation. It uses the text information as queries to "look at" the visual tokens generated by the Perceiver Resampler. The "gated" part is a mechanism that controls how much visual information is allowed to influence the language generation, providing a dynamic way to fuse the two modalities. Setting a New Standard in Few-Shot Learning Flamingo has been rigorously tested on 16 different tasks and has consistently outperformed previous few-shot learning models, even when provided with as few as four examples per task. In several cases, it has demonstrated superior performance over methods that rely on extensive fine-tuning and significantly larger datasets, highlighting its ability to generalize effectively. By minimizing the need for large-scale annotations and task-specific retraining, Flamingo represents a significant advancement in visual language model efficiency. Its ability to learn quickly from limited examples brings AI closer to human-like adaptability, enabling a wider range of real-world applications with greater ease and accuracy. 3.Why do we finetune it? To validate the performance of our new H100 system, we’re testing its ability to run a LLM. For this evaluation, we’ve chosen to fine-tune a community-built implementation of the Flamingo model. This project serves a dual purpose: System Validation: We’re using this fine-tuning task to rigorously test our H100 infrastructure, ensuring it can handle the demanding computational requirements of training and running a large model. Code Verification: Since the original Flamingo model code wasn’t publicly released, we’re relying on a community-developed version. This process allows us to verify if this open-source implementation is a faithful and runnable recreation of the model described in the research paper. Therefore, please note that we particularly focus on our system capability here rather than the evaluation of model’s accuracy. In this project, a Flamingo-replica known as OpenFlamingo, developed by ML-Foundation, was utilized since the original Flamingo model has not been publicly released. The objective was to fine-tune OpenFlamingo on its original dataset and evaluate its performance under controlled conditions. This experiment served two primary purposes: (1) assessing the model’s stability and reproducibility when fine-tuned on the same dataset, and (2) benchmarking its performance on an NVIDIA H100 GPUs system to analyze computational efficiency, memory usage, and overall system capability for handling large-scale multimodal tasks. These insights help determine the feasibility of deploying OpenFlamingo in practical applications while optimizing hardware utilization. 4. How did we finetune it? Installation To install the package in an existing environment, run [code lang="js"] pip install open-flamingo [/code] or to create a conda environment for running OpenFlamingo, run [code lang="js"] conda env create -f environment.yml [/code] To install training or eval dependencies, run one of the first two commands. To install everything, run the third command. [code lang="js"] pip install open-flamingo[training] pip install open-flamingo[eval] pip install open-flamingo[all] [/code] There are three `requirements.txt` files: - `requirements.txt` - `requirements-training.txt` - `requirements-eval.txt` Depending on your use case, you can install any of these with pip install -r <requirements-file.txt>. The base file contains only the dependencies needed for running the model. Development Open-source authors use pre-commit hooks to align formatting with the checks in the repository.   To install pre-commit, run [code lang="js"] pip install pre-commit [/code] or use brew for MacOS [code lang="js"] brew install pre-commit [/code] Check the version installed with [code lang="js"] pre-commit - version [/code] Then at the root of this repository, run [code lang="js"] pre-commit install [/code] Then every time we run git commit, the checks are run. If the files are reformatted by the hooks, run [code lang="js"] git add [/code]  for your changed files and [code lang="js"] git commit [/code] again Training Procedure To train OpenFlamingo, please ensure your environment matches that of environment.yml. Data Processing The codebase uses WebDataset to efficiently load .tar files containing image and text sequences. We recommend resampling shards with replacement during training using the — dataset_resampled flag. LAION-2B Dataset LAION-2B contains 2B web-scraped (image, text) pairs. Please use img2dataset to download this dataset into tar files. Multimodal C4 Dataset OpenFlamingo trains on the full version of Multimodal C4 (MMC4), which includes 103M documents of web-scraped, interleaved image-text sequences. During training, it truncates sequences to 256 text tokens and six images per sequence. The codebase expects .tar files containing .json files, which include raw images encoded in base64. Scripts are provided to convert MMC4 to this format: (1) Download the MMC4 shards into .zip files using the MMC4-provided scripts (e.g., fewer_facesv2.sh). (2) Download the MMC4 raw images into an image directory using the MMC4-provided scripts (e.g., download_images.py). (3) Run scripts/convert_mmc4_to_wds.py to convert the downloaded items into the expected tar files. Customized Datase It is reported recently that the MMC4 dataset download URLs are having some access issue. Therefore, we have made a script that helps prepare customized dataset by transforming it into MMC4’s format (we used ADNI dataset as the target for this example, with a fixed sample base64 image data). You can modify this script upon your custom dataset: [code lang="js"] import json import os import tarfile def compress_directory_to_tar(directory_path): json_files = [f for f in os.listdir(directory_path) if f.endswith('.json')] os.makedirs('replicate_mmc4', exist_ok=True) for i in range(0, len(json_files), 20): batch_files = json_files[i:i+20] tar_file_path = os.path.join('replicate_mmc4', f"{i//20:09d}.tar") with tarfile.open(tar_file_path, "w:gz") as tar: for file in batch_files: tar.add(os.path.join(directory_path, file), arcname=file) print(f"Batch {i//20} compressed to {tar_file_path}") def convert_adni_to_mmc4(input_json_path, output_folder): # Ensure the output folder exists os.makedirs(output_folder, exist_ok=True) # Load the large JSON file with open(input_json_path, 'r') as f: data = json.load(f) matched_text_index = 0 # Iterate over each item in the list and save it as a separate JSON file for idx, item in enumerate(data): # Ensure compatibility with the structure of f9773b9c866145c28fe0b701dde8dfbe.json # Handle text list: conversations = item.get("conversations", None) if conversations is not None: text_list = [] for conversation in conversations: text_list.append(conversation["value"]) # Check for &amp;amp;lt;image&amp;amp;gt; tag in the first element of conversations list first_convo = conversations[0]["value"] if "&amp;amp;lt;image&amp;amp;gt;" in first_convo: if first_convo.startswith("&amp;amp;lt;image&amp;amp;gt;"): matched_text_index = 0 elif first_convo.endswith("&amp;amp;lt;image&amp;amp;gt;"): matched_text_index = 1 item["text_list"] = text_list # Handle image's base64 content: with open('./sample_base64.txt', 'r') as f: sample_img_base64_data = f.read() # Handle image info: img_info = [] images_list = item.get("image", None) if images_list is not None: for img in images_list: img_obj = {} img_obj["image_name"] = img img_obj["raw_url"] = "https://example.com/{}".format(img) img_obj["matched_text_index"] = matched_text_index img_obj["matched_sim"] = 0.75 img_obj["image_base64"] = sample_img_base64_data img_info.append(img_obj) # Create similarity_matrix similarity_matrix = [] for img in img_info: for _ in range(len(text_list)): inner_list = [0] * len(text_list) inner_list[matched_text_index] = 1 similarity_matrix.append(inner_list) # item["similarity_matrix"] = similarity_matrix output_item = { "id": item.get("id", None), "url": "https://example.com", "text_list": item.get("text_list", None), "image_info": img_info, "similarity_matrix": similarity_matrix, "could_have_url_duplicate": 0 } # Save the item as a separate JSON file output_path = os.path.join(output_folder, f"{idx:05d}.json") with open(output_path, 'w') as out_f: json.dump(output_item, out_f) [/code] ChatGPT-generated sequences A subset of our models (listed below) were also trained on experimental ChatGPT-generated (image, text) sequences, where images are pulled from LAION. The shards containing these sequences can be found at this CodaLab worksheet. They are unable to distribute raw images in the released shards; images must be pre-downloaded from the urls in the json files and converted to base64 before using this data for training in our codebase. Models trained with ChatGPT-generated sequences: OpenFlamingo-4B-vitl-rpj3b OpenFlamingo-4B-vitl-rpj3b-langinstruct Training Command A sample Slurm is provided in the training script in scripts/. You can also modify the following command (which was specifically used in our case): [code lang="js"] torchrun --nnodes=1 --nproc_per_node=8 open_flamingo/train/train.py \ --lm_path anas-awadalla/mpt-1b-redpajama-200b \ --tokenizer_path anas-awadalla/mpt-1b-redpajama-200b \ --cross_attn_every_n_layers 1 \ --dataset_resampled \ --batch_size_mmc4 2 \ --train_num_samples_mmc4 1000 \ --workers=4 \ --run_name OpenFlamingo-3B-vitl-mpt1b \ --num_epochs 20 \ --warmup_steps 1875 \ --mmc4_textsim_threshold 0.24 \ --mmc4_shards "modifications/VLM_ADNI_DATA/replicate_mmc4/{000000000..000000040}.tar" \ --report_to_wandb [/code] The MPT-1B base and instruct modeling code does not accept the `labels` kwarg or compute cross-entropy loss directly within `forward()`, as expected by our codebase. We suggest using a modified version of the MPT-1B models found here and here. Distributed raining By default, train.py uses Pytorch’s DistributedDataParallel for training. To use FullyShardedDataParallel, use the — fsdp flag. Some notes on FSDP from the OpenFlamingo team: We recommend using the — fsdp_use_orig_params df flag. If — fsdp is on without this flag, all language model embeddings will be unfrozen during training. (In contrast, the default behavior is to only train the newly added <image> and <|endofchunk|> tokens.) Note: We’ve encountered issues using OPT with this flag. Other language models should be compatible. Our current FSDP wrapping strategy does not permit training language model embeddings that use tied weights (i.e., tied input/output embeddings). To train such models with FSDP, the language model embeddings must be frozen with the — freeze_lm_embeddings flag. We also implement gradient checkpointing and mixed precision training. Use the — gradient_checkpointing and — precision arguments, respectively. Initializing an OpenFlamingo model OpenFlamingo supports pretrained vision encoders from the OpenCLIP package, which includes OpenAI’s pretrained models. They also support pretrained language models from the transformers package, such as MPT, RedPajama, LLaMA, OPT, GPT-Neo, GPT-J, and Pythia models [code lang="js"] from open_flamingo import create_model_and_transforms model, image_processor, tokenizer = create_model_and_transforms( clip_vision_encoder_path="ViT-L-14", clip_vision_encoder_pretrained="openai", lang_encoder_path="anas-awadalla/mpt-1b-redpajama-200b", tokenizer_path="anas-awadalla/mpt-1b-redpajama-200b", cross_attn_every_n_layers=1, cache_dir="PATH/TO/CACHE/DIR" # Defaults to ~/.cache [/code] 5. Results Below is the results reported from our WandBs: NVIDIA H100 GPUs NVIDIA H100 System that was employed: The system is equipped with 8*NVIDIA H100 80GB HBM3 GPUs. However, for this training setting, only 2 GPUs with distributed training are actually enough. Each NVIDIA H100 has 80GB of high-bandwidth memory (HBM3), making this a high-performance computing (HPC) or AI training system. The NVIDIA H100 GPUs are in P0 performance state, which indicates they are in the highest available performance mode. Model’s reported metrics [caption id="attachment_65687" align="aligncenter" width="960"] Credit: Wandbs.com[/caption]   The training metrics indicate a well-functioning process with expected behaviors across various parameters. The loss curve shows a sharp initial drop before stabilizing, suggesting good convergence. The learning ratefollows a linear warm-up schedule, which is a common practice to stabilize early training. Step time and data loading times remain mostly consistent, with occasional spikes that may be caused by system fluctuations, checkpointing, or data fetching delays. The global step progresses linearly, confirming steady training iteration increments. The samples per second per GPU metric remains stable, with a minor dip that does not appear to significantly impact performance. Overall, these metrics suggest normal training behavior, though monitoring occasional spikes in step time and data time could help optimize efficiency further. System’s reported metrics (what we care more): [caption id="attachment_65688" align="aligncenter" width="960"] Credit: Wandbs.com[/caption]   GPU Uncorrected Memory Errors (Top-left): The line remains at zero, indicating no uncorrected memory errors. GPU Corrected Memory Errors (Top-middle): The plot is also flat at zero, meaning no corrected memory errors. GPU Memory Clock Speed (Top-right): Normal; consistent clock speed suggests no dynamic frequency scaling or throttling. GPU Streaming Multiprocessor (SM) Clock Speed (Bottom-left): Normal; stable clock speed suggests no thermal throttling. GPU Power Usage (W) (Bottom-middle): Shows a cyclical pattern, indicating the GPU power consumption fluctuates during workload execution => could be due to batch processing, workload scheduling, or dynamic power management. [caption id="attachment_65689" align="aligncenter" width="960"] Credit: Wandbs.coM[/caption]   GPU Enforced Power Limit (W) (Top-left): Normal; this indicates that the GPU is not exceeding its predefined power limit. GPU Memory Allocated (Bytes) (Top-middle): Memory allocation remains stable but drops suddenly at the end => The drop is at when training finished. GPU Memory Allocated (%) (Top-right): Normal, same as GPU Memory Allocated (Bytes). GPU Time Spent Accessing Memory (%) (Bottom-left): Correlate with GPU Power Usage (W) above. GPU Temperature (°C) (Bottom-middle): Correlate with GPU Power Usage (W) above. GPU Utilization (%) (Bottom-right): Correlate with GPU Power Usage (W) above.

AI Factories Are Reshaping Data Infrastructure for an Intelligent Future

15:36 08/08/2025
From startups to global giants, the AI reasoning era is redefining how we build, think, and operate. Hyperscalers and innovation leaders are scaling AI factories globally, and soon every enterprise will depend on one to stay ahead.  The Rise of AI Factories: Manufacturing Intelligence for an AI-Native Era  The world is entering the age of AI Natives — a new generation of individuals, organizations, and economies born into environments where artificial intelligence is not just an enhancement, but a default operating layer.  Teenagers now grow up talking to AI assistants instead of typing search queries. Companies like Amazon rely on AI to manage logistics at machine speed. Tesla collects terabytes of real-world data every day to refine its autonomous driving models. Even governments are adopting AI copilots to streamline citizen services and policymaking.  In this new era, AI is embedded into products, into decisions, into every customer interaction. For AI-Native entities, intelligence must be continuous, generative, and scalable. To meet the demands of this fundamental shift, we need a new kind of infrastructure. This leads to the emergence of AI Factory - the next generation of data centers, designed not merely to store information but to produce intelligence at scale.  From Data Centers to Intelligence Manufacturing Hubs  Traditional data centers were built for general-purpose computing, capable of processing a wide range of workloads with relative flexibility. However, in today’s AI-driven economy, speed, scale, and specialization matter more than ever. Businesses and governments can no longer afford to wait months for fragmented AI initiatives to yield actionable insights. Instead, they require industrial-grade systems capable of managing the full AI lifecycle, from data ingestion to model training, fine-tuning, and high-volume inference in real-time.  AI Factories are purpose-built to meet this demand. They transform raw data into actionable intelligence with speed, continuity, and cost efficiency. Intelligence is no longer a byproduct. It is a product.  The key performance metric is AI token throughput, which measures how effectively an AI Factory produces reasoning and predictions to power decisions, enable automation, and unlock value.  AI Factories: Building the Backbone of the AI Economy  Around the world, governments and enterprises are accelerating efforts to build AI factories as strategic drivers of economic growth, innovation, and efficiency.  In Europe, the European High-Performance Computing Joint Undertaking has unveiled plans to develop seven AI factories in partnership with 17 EU member states, marking a significant step toward establishing AI infrastructure at scale.  This movement is part of a broader global wave, as countries and corporations invest heavily in AI factories to transform industries and power national competitiveness:  India: Yotta Data Services, in collaboration with NVIDIA, has introduced the Shakti Cloud Platform—democratizing access to advanced GPU computing. By combining NVIDIA AI Enterprise software with open-source tools, Yotta offers a streamlined platform for AI development and deployment.  Japan: Top cloud providers such as GMO Internet, Highreso, KDDI, Rutilea, and SAKURA Internet are building NVIDIA-powered AI infrastructure to revolutionize sectors ranging from robotics and automotive to healthcare and telecommunications.  Norway: Telenor has launched an AI factory leveraging NVIDIA technologies to drive AI adoption across the Nordic region, with a strong emphasis on workforce upskilling and sustainable development.  Together, these initiatives highlight a pivotal shift: AI factories are no longer optional; they are emerging as foundational infrastructure for the digital economy, much like telecommunications and energy grids once were.  Inside an AI Factory: Where Intelligence Is Manufactured  At the core of every AI factory lies a set of vital ingredients: foundation models, trustworthy customer data, and a suite of powerful AI tools. These components come together in a purpose-built environment where models are fine-tuned, prototyped, and optimized for real-world deployment.  As these models enter production, they initiate a continuous learning cycle, drawing insights from new data, refining performance through feedback loops, and evolving with every iteration. This closed-loop system, often called a data flywheel, enables organizations to unlock ever-smarter AI, fueling enterprise growth through adaptability, precision, and scale.   Source: NVIDIA via blog.nvidia.com FPT AI Factory: Vietnam’s Pioneer, Japan’s Trusted Partner  Amid this global shift, FPT AI Factory stands at the forefront of the region’s AI transformation. It is the first of its kind in Vietnam and a trusted infrastructure partner for enterprise customers in Japan. Developed in strategic collaboration with NVIDIA, FPT AI Factory is purpose-built to accelerate the AI-native evolution of businesses and governments alike. FPT AI Factory provides an end-to-end infrastructure stack for the entire AI product lifecycle, integrating thousands of NVIDIA H100/H200 GPUs, the latest NVIDIA AI Enterprise software, and FPT’s AI ecosystem and deployment expertise. This combination empowers businesses to accelerate the development and deployment of advanced AI solutions, streamline resource and process management, and optimize total cost of ownership while ensuring speed, scalability, and sustainability.  FPT AI Factory enables enterprises to accelerate every stage of their AI journey through four integrated components: FPT AI Infrastructure: Built on NVIDIA H100/H200 GPUs, this infrastructure supports compute-intensive AI workloads with high performance and energy efficiency — ideal for training LLMs, multimodal models, and more.  FPT AI Studio: A complete environment for experimentation, fine-tuning, and rapid prototyping that helps teams accelerate development and reduce costs.  FPT AI Inference: A scalable, cost-efficient serving platform optimized for low latency and high throughput, suited for production-grade applications with demanding SLAs.  FPT AI Agents: A GenAI-powered platform for creating intelligent, multilingual, multi-tasking AI agents that integrate seamlessly with enterprise workflows.  Source: FPT Smart Cloud  Additionally, FPT AI Factory is integrated with over 20 ready-to-use generative AI products, enabling rapid AI adoption and immediate impact across customer experience, operational excellence, workforce transformation, and cost optimization. Powering the Intelligent Enterprise Future  AI is no longer just a tool for innovation; it’s becoming the foundation for enterprise transformation and national competitiveness. Around the world, AI Factories are emerging as strategic infrastructure that empowers organizations to develop, deploy, and scale intelligent systems at speed. More than technical assets, they are catalysts for the next wave of productivity and long-term economic resilience.  FPT AI Factory marks Vietnam’s entry into this global movement. Designed to support enterprises across on-premises, cloud, and hybrid environments, it offers a full-stack platform that simplifies the entire AI lifecycle. With strategic investments spanning Vietnam and Japan, FPT is helping shape a future where intelligence is not an add-on, but a core layer of every business, every industry, and every nation ready to lead in the AI era.

A Deep Dive into the Global Artificial Intelligence Trends, Challenges, and Future Prospects

16:37 31/07/2025
Artificial Intelligence (AI) is becoming a core engine for countries and businesses to accelerate transformation and leap ahead in the smart era. With long-term vision, governments around the world are actively advancing AI through strategic policies, infrastructure investments, and innovation ecosystems to accelerate both business growth and national competitiveness.    Across the global AI race, countries are leveraging distinct national strengths to shape their trajectories. The United States leads with a market-driven model, fueled by Big Tech and a robust culture of private-sector innovation. While China advances through a top-down national strategy, positioning AI as a core pillar of digital sovereignty and economic competitiveness. India is rapidly establishing itself as a digital powerhouse, leveraging a deep pool of tech talent and an increasingly dynamic innovation landscape. As a representative of Southeast Asia’s digital ascent, Indonesia demonstrates growing momentum in AI adoption and digital transformation. These four countries have been selected as representative markets to analyze AI trends from various perspectives. I. The USA: Strategic Investment and AI Policy Leadership 1. Emerging Trends in the United States Venture capital investment in AI surged to $55.6 billion in Q2/2025, marking the highest level in two years. This represents a 47% increase compared to the $37.8 billion raised in Q1, largely fueled by growing interest in AI startups, according to Reuters. While funding had previously declined from a peak of $97.5 billion in Q4/2021 to a low of $35.4 billion in Q2/2024 due to high interest rates, AI is now reversing that trend by becoming a top destination for new capital. 2. Government Initiatives and Policy Support As one of the world’s leading technology hubs, the United States is aggressively driving AI development through a comprehensive national strategy—spanning infrastructure, policy, and talent. A key policy shift came under the Trump administration in early 2025, with the Executive Order titled “Removing Barriers to American Leadership in Artificial Intelligence,” focusing on reducing regulatory burdens and accelerating private-sector innovation to strengthen U.S. dominance in the AI race.  Altogether, the U.S. is building a robust AI ecosystem—backed by strong private capital, clear public policies, and effective public-private innovation models—laying a solid foundation to retain its leadership position in the AI era.  II. China: Centralised Planning and Technological Self-Reliance 1. Emerging Trends in China China is investing heavily in domestic AI chips, supercomputers, big data platforms, and autonomous robotics to reduce reliance on Western technologies. Tech giants such as Baidu, Alibaba, Tencent, and Huawei are leading the charge, working alongside startups to commercialize AI across sectors like transportation, healthcare, finance, and defense.   Aligned with its strategic ambition to lead the global AI race, China is channeling close to $100 billion into AI development by 2025—over half of which is driven by state-led initiatives. This investment is anchored in the “Next Generation AI Development Plan,” reinforcing its long-term vision to position AI as a pillar of national competitiveness by 2030. 2. Government Initiatives and Policy Support China is shaping a distinct AI development path through strong state coordination and institutional leadership. Recent national efforts focus on building a resilient AI supply chain, securing strategic technologies such as advanced chips, large language models, and sovereign data infrastructure.   In addition, this country also takes a pioneering role in global AI governance, having registered over 1,400 algorithms and introduced a suite of regulatory frameworks for generative AI, algorithmic accountability, and fair data usage.  Rather than following existing models, China is actively exporting its regulatory approach and technical standards — a signal of its ambition to influence not just the pace but also the rules of global AI development.  III. India's Dominance in the AI Landscape: A Global Powerhouse for Innovation and Startups   In a world shifting toward smarter systems and automated processes, India has become one of the most prominent players in the digital revolution that has a vast talent pool and a thriving technology sector. With over 600,000 AI professionals and 700 million internet users, India contributes 16% of the global AI talent pool, second only to the United States. 1. Emerging Trends in India India is witnessing several significant trends in AI development, and the rise of AI-driven startups is particularly notable. According to Statista (2025), the current Indian AI market is valued between $7-$10 billion, and is projected to reach $31.94 billion by 2031, reflecting a CAGR of 26.37%. This rapid expansion highlights the market’s exceptional growth, as it is expected to more than quadruple in just six years. The growth is largely driven by India’s robust talent base and the rise of AI-driven startups that are shaping this dynamic ecosystem.   In recent years, nearly 3,000 AI startups have been launched contributing significantly to sectors like healthcare diagnostics, agricultural automation, fintech, and language processing. These startups are leveraging AI to create innovative solutions, such as personalized healthcare diagnostics and precision farming techniques. With a rapidly growing AI-driven startup ecosystem, India is on its way to becoming the third-largest startup ecosystem globally.  Furthermore, AI chatbots, virtual assistants, and automated customer service solutions are becoming a growing trend for businesses of all sizes, as they offer high efficiency while significantly reducing operational costs. This trend is expected to continue, as companies increasingly aim to enhance customer experiences and also optimize their operational processes.   2. Government Initiatives and Policy Support With an ambitious plan to position India as a global AI leader in sectors such as healthcare, agriculture, and education, the government launched the IndiaAI initiative in 2024. This program aims to build a robust AI ecosystem by enhancing critical infrastructure, such as high-performance computing resources, to support advanced AI research and development. It also focuses on fostering innovation through funding for AI startups and establishing AI labs that serve as innovation hubs. Moreover, the initiative seeks to equip the future workforce with the necessary AI skills by integrating specialized training programs and educational resources, ensuring that India develops a highly skilled pool of AI professionals.  Additionally, the Indian government is fostering collaborations with global technology companies such as Google, Microsoft, and IBM. These collaborations help India access the latest AI technologies and implement them in local contexts, such as in agriculture, urban planning, and disaster management.  IV. The Rise of AI in Indonesia 1. Emerging Trends in Indonesia Indonesia's AI ecosystem has gained substantial traction in recent years. The growing trend of investing in AI to transform a country like Indonesia, which currently lacks a comprehensive regulatory framework and clear guidelines for the use and mandatory training of AI across sectors, is now more crucial than ever in shaping its future technological landscape. In 2024, the country witnessed significant foreign investment in its AI sector, with two major partnerships highlighting Indonesia’s rising role in global technological innovation. Nvidia partnered with PT Indosat to invest $200 million in an AI factory and skills development program in Surakarta, aiming to build local expertise in AI technologies and provide the infrastructure needed to foster future innovations.  Following this, Microsoft made a landmark commitment to invest $1.7 billion in building cloud and AI infrastructure across Indonesia. As part of this investment, Microsoft plans to train 840,000 professionals, enhancing the country’s AI talent pool and empowering the workforce with crucial skills. These investments not only play a pivotal role in boosting Indonesia’s technological growth and enhancing the digital infrastructure, but also demonstrate a growing international confidence in Indonesia’s AI potential.  2. Government Initiatives and Policy Support In August 2020, the government launched the Indonesia's Golden 2045 Vision, a pivotal initiative aimed at transforming Indonesia from a resource-based economy to an innovation-driven one. This ambitious strategy outlines a comprehensive roadmap for AI development across various sectors, focusing on five key policy pillars: Ethics & Policy, Infrastructure & Data, Talent Development, R&D & Industrial Innovation, and Sectoral Implementation.  Furthermore, the Indonesian government is actively driving the advancement of AI through a multitude of projects, partnering with huge businesses to implement cutting-edge solutions aimed at improving AI capabilities. Several government-led initiatives are being rolled out across various sectors, with the goal of enhancing AI technology and infrastructure. To support this vision, the government is engaging local enterprises to execute and deploy these AI projects, including those focused on research and development to enhance large language models (LLMs). As part of this strategic push, the government is also promoting the development of Interactive Generative AI, which presents ample opportunities for businesses to lead the way in driving innovation and growth in the AI sector. V. Trends and Government Policy Support in other countries Across the globe, nations are increasingly focused on advancing AI technology, each taking unique approaches depending on their specific needs and goals. For example, Japan’s AI Strategy is centered on integrating AI into society to address its aging population and related demographic issues, while Canada is dedicated to cultivating AI research excellence, particularly through initiatives like the Pan-Canadian Artificial Intelligence Strategy.  On the other hand, countries such as Russia and South Korea view AI as a critical tool for enhancing national security and boosting economic power, channeling investments into defense technologies, robotics, and autonomous systems. Meanwhile, in the Middle East, countries like the United Arab Emirates and Saudi Arabia are leveraging AI to accelerate economic diversification, foster innovation, and implement smart city solutions as part of their broader modernization efforts. VI. Navigating the Risks of AI While Artificial Intelligence offers immense potential for economic advancement and societal benefit, it also introduces a series of complex challenges and risks. Chief among these are concerns related to data privacy, algorithmic bias, and the potential for mass surveillance. As AI continues to automate various tasks, there is a growing concern regarding the displacement of workers, especially in sectors reliant on manual and repetitive tasks. Furthermore, the widespread use of AI in data processing raises critical concerns about privacy and security, as personal information is vulnerable to potential breaches, misuse, or unauthorized surveillance.   In addition, the advancement of deepfake technology presents significant risks to the credibility of information, as AI-generated content can be utilized to create misleading or entirely fabricated media.  VII. The Promising Horizons of AI in the Future Artificial Intelligence is fast evolving from a disruptive tool into a foundational driver of global progress. In the coming decade, AI is poised to revolutionize the very foundations of how businesses operate, unleashing a wave of innovation, agility, and intelligence. According to a 2024 report from PwC, AI could contribute up to $15.7 trillion to the global economy by 2030, primarily through increased productivity, automation of routine tasks, and the creation of new markets and services.  Moreover, far from being a job destroyer, AI is also proving to be a job enhancer. The World Economic Forum's “Future of Jobs” report (2025) predicts that while AI will automate certain tasks, it will also generate 78 million net new roles globally, especially in data science, AI governance, creative sectors, and digital infrastructure. Notably, industries with high AI exposure have reported faster wage growth and higher demand for skilled professionals.