Blogs Tech

Categories

Blog chia sẻ kiến thức FPT Cloud

FPT AI Factory: A Powerful AI SOLUTION Suite with NVIDIA H100 and H200 Superchips

13:37 10/06/2025
In the booming era of artificial intelligence (AI), Viet Nam is making a strong mark on the global technology map through the strategic collaboration between FPT Corporation and NVIDIA – the world’s leading provider of high-performance computing solutions, to develop FPT AI Factory, a comprehensive suite for end-to-end AI. This solution is built on the world’s most advanced AI technology, NVIDIA H100 and NVIDIA H200 superchips. Video: Mr. Truong Gia Binh (Chairman of FPT Corporation) discusses the strategic cooperation with NVIDIA in developing comprehensive AI applications for businesses According to the Government News (2024), Mr. Truong Gia Binh – Chairman of the Board and Founder of FPT Corporation – emphasized that FPT is aiming to enhance its capabilities in technology research and development, while building a comprehensive ecosystem of advanced products and services based on AI and Cloud platforms. This ecosystem encompasses everything from cutting-edge technological infrastructure and top-tier experts to deep domain knowledge in various specialized fields. "We are committed to making Vietnam a global hub for AI development." 1. Overview of the Two Superchips NVIDIA H100 & H200: A New Leap in AI Computing 1.1 Information about the NVIDIA H100 Chip (NVIDIA H100 Tensor Core GPU) The NVIDIA H100 Tensor Core GPU is a groundbreaking architecture built on the Hopper™ Architecture (NVIDIA’s next-generation GPU processor design). It is not just an ordinary graphics processing chip, but a machine specially optimized for Deep Learning and Artificial Intelligence (AI) applications. [caption id="attachment_62784" align="aligncenter" width="1200"] Chip NVIDIA H100 (GPU NVIDIA H100 Tensor Core)[/caption]   The NVIDIA H100 superchip is manufactured using TSMC's advanced N4 process and integrates up to 80 billion transistors. Its processing power comes from a maximum of 144 Streaming Multiprocessors (SMs), purpose-built to handle complex AI tasks. Notably, the NVIDIA Hopper H100 delivers optimal performance when deployed via the SXM5 socket. Thanks to the enhanced memory bandwidth provided by the SXM5 standard, the H100 offers significantly superior performance compared to implementations using conventional PCIe sockets—an especially critical advantage for enterprise applications that demand large-scale data handling and high-speed AI processing. [caption id="attachment_62802" align="aligncenter" width="1363"] NVIDIA H100 Tensor Core GPUs: 9x faster AI training and 30x faster AI inference compared to the previous generation A100 in large language models[/caption]   NVIDIA has developed two different form factor packaging versions of the H100 chip: the H100 SXM and H100 NVL, designed to meet the diverse needs of today’s enterprise market. The specific use cases for these two versions are as follows: H100 SXM version: Designed for specialized systems, supercomputers, or large-scale AI data centers aiming to fully harness the GPU’s potential with maximum NVLink scalability. This version is ideal for tasks such as training large AI models (LLMs, Transformers), AI-integrated High Performance Computing (HPC) applications, or exascale-level scientific, biomedical, and financial simulations. H100 NVL version: Optimized for standard servers, this version is easily integrated into existing infrastructure with lower cost and complexity compared to dedicated SXM systems. It is well-suited for enterprises deploying real-time AI inference, big data processing, Natural Language Processing (NLP), computer vision, or AI applications in hybrid cloud environments. Product Specifications H100 SXM H100 NVL FP64 34 teraFLOPS 30 teraFLOP FP64 Tensor Core 67 teraFLOPS 60 teraFLOP FP32 67 teraFLOPS 60 teraFLOP TF32 Tensor Core* 989 teraFLOPS 835 teraFLOP BFLOAT16 Tensor Core* 1.979 teraFLOPS 1.671 teraFLOPS FP16 Tensor Core* 1.979 teraFLOPS 1.671 teraFLOPS FP8 Tensor Core* 3.958 teraFLOPS 3.341 teraFLOPS INT8 Tensor Core* 3.958 TOPS 3.341 TOPS GPU Memory 80GB 94GB GPU Memory Bandwidth 3,35TB/s 3,9TB/s Decoders 7 NVDEC 7 JPEG 7 NVDEC 7 JPEG Max Thermal Design Power (TDP) Up to 7 MIGS @ 10GB each 350 - 400W (adjustable) Multi-Instance GPUs) Up to 7 MIGS @ 10GB each Up to 7 MIGS @ 12GB each Form Factor SXM PCIe dual-slot air-cooled Interconnect NVIDIA NVLink™: 900GB/s PCIe Gen5: 128GB/s NVIDIA NVLink: 600GB/s PCIe Gen5: 128GB/s Server Options NVIDIA HGX H100 Partner and NVIDIA- Certified Systems™ with 4 or 8 GPUs NVIDIA DGX H100 with 8 GPUs Partner and NVIDIA-Certified Systems with 1 – 8 GPUs NVIDIA AI Enterprise Optional Add-on Included Table 1.1: Specification Table of the Two H100 Chip Form Factors – H100 SXM and H100 NVL 1.2 Information about the NVIDIA H200 Chip (NVIDIA H200 Tensor Core GPU) [caption id="attachment_62803" align="aligncenter" width="1200"] Information about the NVIDIA H200 Chip (NVIDIA H200 Tensor Core GPU) including both form factors: H200 SXM and H200 NV[/caption] Building upon and advancing the Hopper™ architecture, the NVIDIA H200 Tensor Core GPU is a powerful upgrade of the H100, introduced by NVIDIA as the world’s most powerful AI chip, delivering results twice as fast as the H100 at the time of its launch in November 2023. The H200 is designed to handle even larger and more complex AI models, especially generative AI models and large language models (LLMs). Similar to the H100 superchip, NVIDIA also offers two different form factors for its H200 Tensor Core product, both designed for enterprise use: the H200 SXM and H200 NVL versions. NVIDIA H200 SXM: Designed to accelerate generative AI tasks and high-performance computing (HPC), especially with the capability to process massive amounts of data. This is the ideal choice for dedicated systems, supercomputers, and large AI data centers aiming to fully leverage the GPU’s potential with maximum NVLink scalability. Enterprises should use the H200 SXM for scenarios such as training extremely large AI models, HPC applications requiring large memory, and enterprise-level generative AI deployment. NVIDIA H200 NVL: Optimized to bring AI acceleration capabilities to standard enterprise servers, easily integrating into existing infrastructure. This version is particularly suitable for enterprises with space constraints needing air-cooled rack designs with flexible configurations, delivering acceleration for all AI and HPC workloads regardless of scale. Use cases for H200 NVL in enterprises include real-time AI inference, AI deployment in hybrid cloud environments, big data processing, and natural language processing (NLP). Product Specifications H200 SXM H200 NVL FP64 34 TFLOPS 30 TFLOPS FP64 Tensor Core 67 TFLOPS 60 TFLOPS FP32 67 TFLOPS 60 TFLOPS TF32 Tensor Core² 989 TFLOPS 835 TFLOPS BFLOAT16 Tensor Core² 1.979 TFLOPS 1.671 TFLOPS FP16 Tensor Core² 1.979 TFLOPS 1.671 TFLOPS FP8 Tensor Core² 3.958 TFLOPS 3.341 TFLOPS INT8 Tensor Core² 3.958 TFLOPS 3.341 TFLOPS GPU Memory 141GB 141GB GPU Memory Bandwidth 4,8TB/s 4,8TB/s Decoders 7 NVDEC 7 JPEG 7 NVDEC 7 JPEG Confidential Computing Supported Supported TDP Up to 700W (customizable) Up to 600W (customizable) Multi-Instance GPUs Up to 7 MIGs @18GB each Up to 7 MIGs @16.5GB each Form Factor SXM PCIe Dual-slot air-cooled Interconnect NVIDIA NVLink™: 900GB/s PCIe Gen5: 128GB/s 2- or 4-way NVIDIA NVLink bridge: 900GB/s per GPU PCIe Gen5: 128GB/s Server Options NVIDIA HGX™ H200 partner and NVIDIA-Certified Systems™ with 4 or 8 GPUs NVIDIA MGX™ H200 NVL partner and NVIDIA-Certified Systems with up to 8 GPUs NVIDIA AI Enterprise Add-on Included Table 1.2: Technical specifications of the two form factors, H200 SXM and H200 NVL 1.3 Detailed Comparison Between NVIDIA H100 and NVIDIA H200 Superchips [caption id="attachment_62804" align="aligncenter" width="1200"] The differences between the two Superchips: H100 and H200 across SXM - NVL form factors, especially in building AI infrastructure and applications for enterprises[/caption] Based on the information regarding the two NVIDIA products, H100 (H100 SXM - H100 NVL) and H200 (H200 SXM - H200 NVL), provided by FPT Cloud, here is a detailed comparison table between NVIDIA H100 & H200 for your reference: Features NVIDIA H100 (SXM) NVIDIA H100 (NVL) NVIDIA H200 (SXM) NVIDIA H200 (NVL) Architecture Hopper™ Hopper™ Inheriting and evolving from Hopper™" Inheriting and evolving from Hopper™" Manufacturing Process TSMC N4 (integrating 80 billion transistors) TSMC N4 (integrating 80 billion transistors) An upgraded version of H100 An upgraded version of H100 FP64 34 teraFLOPS 30 teraFLOP 34 TFLOPS 30 TFLOPS FP64 Tensor Core 67 teraFLOPS 60 teraFLOP 67 TFLOPS 60 TFLOPS FP32 67 teraFLOPS 60 teraFLOP 67 TFLOPS 60 TFLOPS TF32 Tensor Core 989 teraFLOPS 835 teraFLOP 989 TFLOPS 835 TFLOPS BFLOAT16 Tensor Core 1.979 teraFLOPS 1.671 teraFLOPS 1.979 TFLOPS 1.671 TFLOPS FP16 Tensor Core 1.979 teraFLOPS 1.671 teraFLOPS 1.979 TFLOPS 1.671 TFLOPS FP8 Tensor Core 3.958 teraFLOPS 3.341 teraFLOPS 3.958 TFLOPS 3.341 TFLOPS INT8 Tensor Core 3.958 TFLOPS 3.341 TFLOPS 3.958 TFLOPS 3.341 TFLOPS GPU Memory 80GB 94GB 141GB 141GB GPU Memory Bandwidth 3.35TB/s 3.9TB/s 4.8TB/s 4.8TB/s Decoders 7 NVDEC, 7 JPEG 7 NVDEC, 7 JPEG 7 NVDEC, 7 JPEG 7 NVDEC, 7 JPEG Confidential Computing No information available regarding Confidential Computing No information available regarding Confidential Computing Supported Supported Max Thermal Design Power - TDP Up to 700W (user-configurable) 350 - 400W (configurable) Up to 700W (user-configurable) Up to 600W (customizable) Multi-Instance GPUs Up to 7 Multi-Instance GPU (MIG) partitions, each with 10GB Up to 7 Multi-Instance GPU (MIG) partitions, each with 12GB Up to 7 Multi-Instance GPU (MIG) partitions, each with 18GB Up to 7 Multi-Instance GPU (MIG) partitions, each with 16.5GB Form Factor SXM PCIe interface, with a dual-slot, air-cooled design SXM PCIe interface, with a dual-slot, air-cooled design Interconnect NVIDIA NVLink™: 900GB/s;; PCIe Gen5: 128GB/s NVIDIA NVLink: 600GB/s;; PCIe Gen5: 128GB/s NVIDIA NVLink™: 900GB/s; PCIe Gen5: 128GB/s NVIDIA NVLink 2- or 4-way bridge: 900GB/s per GPU; PCIe Gen5: 128GB/s Server Options NVIDIA HGX H100 Partner and NVIDIA-Certified Systems™ with 4 or 8 GPUs; NVIDIA DGX H100 with 8 GPUs Compatible with Partner and NVIDIA-Certified Systems supporting 1 to 8 GPUs Supported on NVIDIA HGX™ H200 Partner Systems and NVIDIA-Certified Platforms featuring 4 or 8 GPUs NVIDIA MGX™ H200 NVL Partner & NVIDIA-Certified Systems (up to 8 GPUs) NVIDIA AI Enterprise Add-on Included Add-on Included Table 1.2: Detailed comparison table between NVIDIA H100 (SXM - NVL) and NVIDIA H200 (SXM - NVL) 2. FPT strategically partners with NVIDIA to develop the first AI Factory in Vietnam The strategic synergy between NVIDIA, a leading technology company, and FPT's extensive experience in deploying enterprise solutions has forged a powerful alliance in developing pioneering AI products for the Vietnamese market. NVIDIA not only supplies its cutting-edge NVIDIA H100 and H200 GPU superchips but also shares profound expertise in AI architecture. For FPT Corporation, FPT Smart Cloud will be the trailblazing entity to provide cloud computing and AI services built upon the foundation of this AI factory, enabling Vietnamese enterprises, businesses, and startups to easily access and leverage the immense power of AI. [caption id="attachment_62805" align="aligncenter" width="2560"] FPT Corporation is a strategic partner of NVIDIA in building and developing the FPT AI Factory solutions: FPT AI Infrastructure, FPT AI Studio, and FPT AI Inference[/caption]   Notably, FPT will concentrate on developing Generative AI Models, offering capabilities for content creation, process automation, and solving complex problems that were previously challenging to address. In the era of burgeoning AI technologies, B2B enterprises across all sectors—from Finance, Securities, and Insurance to Manufacturing and Education are facing a pressing need for a reliable partner to achieve digital transformation breakthroughs. FPT AI Factory from FPT Cloud is the optimal solution, offering your business the following outstanding advantages: Leading AI Infrastructure: By directly utilizing the NVIDIA H100 and H200 superchips, FPT AI Factory delivers a powerful AI computing platform, ensuring superior performance and speed for all AI tasks. Diverse Service Ecosystem: FPT AI Factory is not just hardware but a comprehensive ecosystem designed to support businesses throughout the entire AI solution lifecycle—from development and training to deployment. Cost Optimization: Instead of investing millions of dollars in complex AI infrastructure, businesses can leverage FPT AI Factory as a cloud service, optimizing both initial investment and operational costs. Security, Compliance, and Integration: FPT is committed to providing a secure AI environment that meets international security standards while also enabling seamless integration with existing enterprise systems. [caption id="attachment_62806" align="aligncenter" width="1642"] The superior advantages of the FPT AI Factory solution for businesses across various industries in the market[/caption] 3. Building a Comprehensive FPT AI Factory Ecosystem (FPT AI Infrastructure, FPT AI Studio, and FPT AI Inference) Powered by NVIDIA H100 & H200 Superchips FPT AI Factory currently offers a trio of AI solutions developed based on the core technology of NVIDIA H100 & NVIDIA H200 superchips for enterprises, including: FPT AI Infrastructure: This is the group of products related to enterprise infrastructure. FPT AI Studio: This is the group of products related to the platform of tools and services for enterprises. FPT AI Inference: This is the group of products related to the platform for AI (Artificial Intelligence) and ML (Machine Learning) models for enterprises. Video: FPT’s trio of AI solutions — FPT AI Infrastructure, FPT AI Studio, and FPT AI Inference — enables businesses to build, train, and operate AI solutions simply, easily, and effectively. 3.1 FPT AI Infrastructure Solution [caption id="attachment_62807" align="aligncenter" width="1528"] The FPT AI Infrastructure solution enables businesses to deploy high-performance computing infrastructure, develop AI solutions, and easily scale according to demand[/caption] FPT AI Infrastructure is a robust cloud computing infrastructure platform, specially optimized for AI workloads. It provides superior computing power from NVIDIA H100 and H200 GPUs, enabling enterprises to build supercomputing infrastructure, easily access and utilize resources to train AI models rapidly, and flexibly scale according to their needs using technologies such as Meta Cloud, GPU Virtual Machine, Managed CPU Cluster, and GPU Container. Register for FPT AI Infrastructure today to build and develop powerful infrastructure for your business! 3.2 The FPT AI Studio Product [caption id="attachment_62808" align="aligncenter" width="1587"] The FPT AI Studio product helps businesses process data, develop, train, evaluate, and deploy artificial intelligence and machine learning models based on their specific needs[/caption] Once a business has established an infrastructure system with advanced GPU technology, the next step is to build and develop its own artificial intelligence and machine learning models tailored to specific operational and application needs. FPT AI Studio is the optimal solution for this. It is a comprehensive AI development environment that offers a full suite of tools and services to support businesses throughout the entire process from data processing, model development, training, evaluation, to deployment of real-world AI/ML models—using cutting-edge technologies such as Data Hub, AI Notebook, Model Pre-training, Model Fine-tuning, and Model Hub. Register now to start building and deploying AI and Machine Learning models for your business today! 3.3 The FPT AI Inference Service [caption id="attachment_62809" align="aligncenter" width="1580"] FPT AI Inference service enhances the inference capabilities for enterprises' AI and Machine Learning models[/caption] Once an enterprise's AI or Machine Learning model has been trained using internal and other crucial data, deploying and operating it in a real-world environment demands an efficient solution. FPT AI Inference is the intelligent choice for your business. This solution is optimized to deliver high inference speed and low latency, ensuring your AI models can operate quickly and accurately in real-world applications such as virtual assistants, customer consultation services, recommendation systems, image recognition, or natural language processing, powered by advanced technologies like Model Serving and Model-as-a-Service. This is the final piece in the FPT AI Factory solution suite, helping enterprises to put AI into practical application and deliver immediate business value. Enhance the inference capabilities and real-world applications of your enterprise AI models today with FPT AI Inference! 4. Exclusive offer for customers registering to experience FPT AI Factory on FPT Cloud [caption id="attachment_62810" align="aligncenter" width="1312"] Special benefits for businesses when registering to use FPT AI Factory services as early as possible[/caption] Exclusive incentives from FPT Cloud just for you when you register early to experience the comprehensive AI Factory solution trio: FPT AI Infrastructure, FPT AI Studio, and FPT AI Inference today:  Priority access to FPT AI Infrastructure services at preferential pricing: Significantly reduce costs while accessing world-class AI infrastructure, tools, and applications—right here in Vietnam. Early access to premium features of FPT AI Factory: Ensure your business stays ahead by being among the first to adopt the latest AI technologies and tools in the digital transformation era. Receive Cloud credits to explore a diverse AI & Cloud ecosystem: Experience other powerful FPT Cloud solutions that enhance operational efficiency, such as FPT Backup Services, FPT Disaster Recovery, and FPT Object Storage. Gain expert consultation from seasoned AI & Cloud professionals: FPT’s AI and Cloud specialists will support your business in applying and operating the FPT AI Factory solution suite effectively, driving immediate business impact. Register now to receive in-depth consultation on the FPT AI Factory solution from FPT Cloud’s team of experienced AI & Cloud experts! [caption id="attachment_62811" align="aligncenter" width="963"] Registration Form for Expert AI & Cloud Consultation on FPT AI Factory's Triple Solution Suite for Enterprises[/caption]

LandingAI – Agentic Vision Technologies Leader from Silicon Valley – Leverages FPT AI Factory to Accelerate Visual AI Platform

17:09 03/06/2025
LandingAI, a Silicon Valley-based leader in agentic vision technologies founded by Dr. Andrew Ng, is leveraging FPT AI Factory services to accelerate the development of its tools, including Agentic Document Extraction, Agentic Object Detection, and VisionAgent. Through this partnership, LandingAI utilizes Metal Cloud, powered by NVIDIA H100 Tensor Core GPUs, to meet the growing demand for high-performance computing, scalability, and operational efficiency. LandingAI is redefining visual intelligence with its tools, applying an agentic AI framework designed to help users solve complex visual tasks using unstructured data such as images and documents. The system intelligently selects and orchestrates vision models and generates deployable code to automate similar tasks in the future. A key challenge in developing the Visual AI platform lies in the need for substantial computing resources to fine-tune the agents, run reinforcement learning loops, and drive continuous performance improvement while ensuring rapid iteration speed to keep pace with innovation. Tackling Computational Challenges with Metal Cloud FPT AI Factory offers the critical infrastructure needed to fast-track the development of the Visual AI platform and address performance complexities. Through the partnership with FPT, LandingAI gains access to Metal Cloud - a high-performance AI infrastructure fueled by NVIDIA H100 GPUs, backed by high SLAs and continuous support by FPT’s experts.  The cutting-edge GPUs deliver the computational power necessary for supervised fine-tuning and reinforcement learning at scale, thus enabling rapid and efficient model development. The seamless integration and minimal setup friction further allow LandingAI to quickly incorporate the H100s into its training pipeline and iterate on model architectures and agent behaviors at unprecedented speed and efficiency.  In addition, LandingAI is able to expand its computing capacity while optimizing resource consumption with the competitive pricing of FPT AI Factory services. Key benefits achieved: Significant improvements in visual task generalization  3X faster deployment of customer-facing features “As LandingAI expands our agentic vision technology offerings, FPT AI Factory has provided us with a solid and flexible infrastructure for our large-scale AI development and deployment,” said Mr. Dan Maloney, CEO of LandingAI. “Their system's reliability and flexibility have streamlined our Visual AI workflows, significantly reducing iteration time. We have seen improved operational stability in production and cost savings. Their responsive support has made integration seamless.” Agentic Document Extraction Playground A Solid Foundation for Agentic AI Innovation FPT AI Factory is a full-stack ecosystem for end-to-end AI development, designed to make AI accessible, scalable, and tailored to each business’s unique goals. Powered by thousands of NVIDIA Hopper H100/H200 GPUs, combined with the latest NVIDIA AI Enterprise software platform, FPT AI Factory provides robust infrastructure, foundational models, and necessary tools for businesses to build and advance AI applications from the ground up with faster time-to-market and enterprise-grade performance at a fraction of traditional costs.  As the global demand for Agentic AI systems gains momentum to transform business task automation with minimal effort, LandingAI’s integration of FPT AI Factory demonstrates the potential of high-performance, flexible AI infrastructure to drive innovation in this fast-growing domain. These agentic systems, designed to perform complex tasks using natural language prompts, are not only reshaping automation and collaboration but also making advanced AI capabilities more approachable for developers, engineers, and business users alike.  The AI Agent market value is projected to reach $52.62 billion by 2030 with a CAGR of 46.3% from 2025 to 2030. Built on a low-code or no-code platform, AI Agents are fostering faster AI adoption and more dynamic human-AI collaboration across various sectors. The computing power and agility provided by FPT AI Factory emerge as critical enablers for businesses to enter and lead in the next era of intelligent automation.  “FPT and LandingAI share a mutual vision to democratize AI and make its powerful capabilities accessible to all. This collaboration marks another milestone in our long-term partnership to establish a strong foundation for developing next-generation AI technologies, such as Agentic AI, driving innovation and bringing tangible value across multiple industries,” shared Mr. Le Hong Viet, CEO of FPT Smart Cloud, FPT Corporation. Looking ahead, FPT is committed to continuously enhancing the FPT AI Factory to further eliminate infrastructure barriers and simplify AI development, empowering businesses to innovate faster, smarter, and more efficiently. About FPT Corporation FPT Corporation (FPT) is a global leading technology and IT services provider headquartered in Vietnam. FPT operates in three core sectors: Technology, Telecommunications, and Education. As AI is indeed a key focus, FPT has been integrating AI across its products and solutions to drive innovation and enhance user experiences within its Made by FPT ecosystem. FPT is actively working on expanding its capabilities in AI through investments in human resources, R&D, and partnerships with leading organizations like NVIDIA, Mila, AITOMATIC, and LandingAI. These efforts are aligned with FPT's ambitious goal to solidify its status among the world's top billion-dollar IT companies. For more information, please visit https://fpt.com/en. About LandingAI LandingAI™ delivers cutting-edge agentic vision technologies that empower customers to unlock the value of visual data. With LandingAI’s solutions, companies realize the value of AI and move AI projects from proof-of-concept to production.  Guided by a data-centric AI approach, LandingAI’s flagship product, LandingLens™, enables users to build, iterate, and deploy Visual AI solutions quickly and easily. LandingAI is a pioneer in agentic vision technologies, including Agentic Document Extraction and Agentic Object Detection, which enhance the ability to process and understand visual data at scale, making sophisticated Visual AI tools more accessible and efficient.  Founded by Andrew Ng, co-founder of Coursera, founding lead of Google Brain, and former chief scientist at Baidu, LandingAI is uniquely positioned to lead the development of Visual AI that benefits all. For more information, visit https://landing.ai/.

FPT announced a partner ecosystem with global tech giants, promoting AI Factory development and operations in Vietnam and Japan

10:30 13/05/2025
Global leading IT firm FPT has announced a partner ecosystem of global pioneer technology organizations, including NVIDIA, SCSK, ASUS, Hewlett Packard Enterprise, VAST Data, and DDN Storage. This cooperative endeavor is to expedite AI factory development and operations in Vietnam and Japan. Dr. Truong Gia Binh and senior leaders of FPT, along with representatives of NVIDIA, SCSK, ASUS, Hewlett Packard Enterprise, VAST Data, and DDN Storage, announced a partner ecosystem to promote the AI Factory in Vietnam and Japan. This partner ecosystem commits to combining expertise, resources, and networks to unlock FPT AI Factory’s potential as a powerhouse for ever-growing AI innovation while reinforcing sovereign AI in Vietnam and Japan. To this end, the collaboration focuses on four key objectives: 1) Promoting the development and operations of AI Factories in Vietnam and Japan, following global standards; 2) Diversifying a portfolio of AI products and services; 3) Enriching human technical capabilities; and 4) Guarding data security and autonomy. FPT also revealed the launch of FPT AI Factory in Vietnam and Japan, enabling businesses of all sizes to expedite AI innovation with priority access to premium solutions and features through an exclusive pre-order. FPT AI Factory offers an all-inclusive stack for end-to-end AI development that leverages thousands of NVIDIA H200 and H100 Tensor Core GPUs with the NVIDIA AI Enterprise software platform, which includes NVIDIA NeMo. FPT AI Factory grants organizations, researchers, and innovators scalable GPU supercomputing to cultivate sophisticated AI solutions with faster time-to-market while safeguarding sensitive information and maintaining sovereignty. This suite also authorizes clientele to manage resources and processes expeditiously for large-scale AI and machine learning workloads, achieving up to 45% better total cost of ownership. Mr. Le Hong Viet - CEO, FPT Smart Cloud, FPT Corporation unveiled the future of digital autonomy empowered by FPT AI Factory This flagship stack consists of four main product groups: FPT AI Infrastructure offers enterprise accelerated computing cloud services with the latest technology, top performance, flexibility, and scalability to accelerate model development. Enterprises can enjoy first-rate infrastructure performance at a massive scale for most compute-intensive AI tasks. A unified management system with built-in security allows complete control over the AI computing environment and data throughout the development process. FPT AI Studio is a trusted and inclusive platform that streamlines the AI creation process in a fast and safe manner. It provides a comprehensive set of smart tools to effortlessly explore, develop, evaluate, and deploy custom models enriched and differentiated with corporations’ large-scale data. That benefits businesses in creating cutting-edge AI applications from scratch without requiring deep expertise, securely simplifying operations, and improving AI efficiency. FPT AI Inference is a robust platform that augments AI capabilities with a broad collection of high-performing models for immediate use. Businesses can leverage numerous foundational and FPT-developed models to rapidly fine-tune and deploy models tailored to industry requirements. They can also scale these models in terms of size and number of usages for unique applications hosted on NVIDIA-Certified Systems. FPT AI Agents is a state-of-the-art platform to create and operate multi-lingual AI Agents fueled by the business knowledge base and custom models for specific tasks in customer service, corporate training, internal operations, etc. Developed on the powerful FPT AI Factory infrastructure with advanced generative AI cores, FPT AI Agents will enable businesses to unlock unprecedented productivity, take service quality to new heights, transform the workforce, and achieve borderless innovation. FPT AI Factory is integrated with 20+ ready-to-use AI products, built upon Generative AI, for rapid AI adoption and instant results in elevating customer experience, achieving operational excellence, transforming the human workforce, and optimizing operating expenses. FPT is now accepting exclusive advance orders for FPT AI Factory, letting corporate clients utilize the diverse product and service offerings of AI and cloud, earn cloud credit, and gain early access to premium features. Combined with customized consultation from seasoned AI and Cloud experts, enterprises in any industry can reinforce successful AI journeys with practical, high-value solutions. Since its $200-million investment plan announced in April 2024, FPT has been working closely with NVIDIA to mobilize resources and dedicate to creating groundbreaking products, aiming to ignite profound changes, including customer service and workforce development. With the launch of FPT AI Factory, FPT is completing an end-to-end ecosystem comprising superior infrastructure, intelligent platforms, beneficial applications, and professional services, all designed to meet the dynamic demands of AI evolution. Dr. Truong Gia Binh - Founder of FPT Corporation reclaimed the shared vision of co-creating the future with AI Dr. Truong Gia Binh, Chairman and Founder of FPT Corporation: “AI factories are emerging as an essential foundation for the human and AI agent economy, representing a new transformative force in the digital landscape. Through the collaborative efforts to establish the omnipotent FPT AI Factory, we empower organizations, researchers, developers, and adopters to seize the full potential of AI, forming thousands of intelligent agents enriched and differentiated by the data, knowledge, and culture of every business and nation. Step by step, we redefine and perfect production relations between AI and humans, enhancing the competitiveness of every economy while promoting international integration.” Mr. Dennis Ang, Senior Director, Enterprise Business of ASEAN and ANZ Region, NVIDIA: “AI is reshaping nations and industries. Leveraging the full-stack NVIDIA AI platform, FPT is set to more efficiently scale its AI factory to support enterprises throughout the region.” Mr. Dennis Ang - Senior Director of NVIDIA emphasized the role of AI Factory in shaping the new era of technology Mr. Masaki Komine, Managing Executive Officer, General Manager, Products & Services Business Group, SCSK Corporation: “We believe that we share the goal of fundamentally changing the digital environment with FPT. We value the spirit of co-creation and hope to build a long-term cooperative relationship with a passionate partner like FPT. Furthermore, we would like to utilize our cutting-edge AI infrastructure technology and knowledge gained from many years of system operation to tackle the challenges faced by customers and societies around the world, including Japan and Vietnam.” Mr. Jason Chung, Regional Director of East Asia and Indochina, ASUS: ‘’ASUS is one of the main partners of the NVIDIA Cloud Partner Program. With our experience in large-scale AI server deployment and operations, ASUS is excited to partner with FPT and NVIDIA in this pioneering venture. Together, we're empowering businesses in Vietnam and Japan to harness the power of AI. By fostering innovation and collaboration, we’re building an ecosystem that will ensure that no business is left behind in the AI era.” Mr. Narinder Kapoor, Senior Vice President and Managing Director of APAC, Hewlett Packard Enterprise: “The era of AI has brought with it limitless possibility. HPE is committed to staying at the forefront of innovation to help customers embrace this new era. This launch marks a significant milestone in our collaboration with FPT and NVIDIA to enable enterprises in Vietnam, Japan and across Asia Pacific to harness the full potential of AI. We expect this initiative to accelerate AI adoption and bolster data sovereignty and security, which are critical for our customers in this digital age. We are excited to support this transformative journey and look forward to being part of this initiative supporting Vietnam as it transforms into a hub of AI innovation.” Mr. Sunil Chavan, Vice President of Asia-Pacific, VAST Data: “We’re thrilled to support FPT AI Factory’s launch, which will help redefine how enterprises in Vietnam and Japan harness the power of AI. Today's enterprises are looking at how AI can help them extract real business value. To realize these benefits, they need scalable, reliable solutions that deliver clear returns on their capital investments. With VAST Data’s robust infrastructure, we’re helping companies build flexible, compliant AI ecosystems that integrate seamlessly with existing environments and support a wide range of AI initiatives. This partnership allows businesses to make strategic, data-driven decisions with the confidence that they have a solution tailored to their complex needs.” Mr. Robert Triendl, SVP International and General Manager, DDN Storage: “We congratulate FPT on the launch of “FPT AI Factory”, and we are very excited to have been selected as the performance data solution for this innovative GPU cloud service. DDN’s AI solutions combine superior performance with minimal data center footprint to support the most scalable AI workloads today. We look forward to collaborating closely with FPT to build powerful new services for next-generation AI workloads, and we wish FPT great success in Vietnam, Japan, and the global market.” FPT Corporation (FPT) is a global leading technology and IT services provider headquartered in Vietnam. FPT operates in three core sectors: Technology, Telecommunications, and Education. As AI is indeed a key focus, FPT has been integrating AI across its products and solutions to drive innovation and enhance user experiences within its Made by FPT ecosystem. FPT is actively working on expanding its capabilities in AI through investments in human resources, R&D, and partnerships with leading organizations like NVIDIA, Mila, AITOMATIC, and Landing AI. These efforts are aligned with FPT's ambitious goal to solidify its status among the world's top billion-dollar IT companies. For more information, please visit https://fpt.com/en.

FPT AI Factory Hands-on: A Guide to Deploying GPU Notebooks and Experimenting with AI Models

14:01 08/05/2025
Jupyter Notebook is a browser-based interface that allows users to interact directly with code and data through a user-friendly web UI. It is commonly used in AI tasks such as data exploration, feature extraction, model building, and experimentation.  This guide provides a quick walkthrough for deploying GPU Notebooks on FPT AI Factory—from infrastructure setup to accessing and running AI notebooks for tasks like data analysis, feature engineering, model training, and inference.  I. Service Requirements To deploy a GPU Notebook on FPT AI Factory, users need to:  Register an account at https://id.fptcloud.com   Contact the sales team to subscribe to the FPT AI Factory – AI Infrastructure service.  Once registered, the technical team will provision the necessary resources for service access.  II. Setting Up and Accessing the GPU Notebook The environment setup involves two virtual machines within the same VPC:  Jump Server: acts as an SSH gateway for external access.  GPU VM: the main virtual machine for running the notebook and handling AI workloads.  Step 1: Create GPU VM  Create a GPU VM with H100 configuration using the recommended template (16 CPUs, 192 GB RAM, 80 GB GPU RAM). Reference: https://fptcloud.com/en/documents/gpu-virtual-machine-en/?doc=quick-start  Network configuration: assign a public IP, open notebook ports, and configure access permissions via Security Group.  Step 2: Environment Setup  Update the system and install the GPU driver:  [code lang="js"] sudo apt update && sudo apt upgrade -y  sudo apt install -y nvidia-driver-565  nvidia-smi  # kiểm tra trạng thái GPU  [/code] Install Docker following the official guide: https://docs.docker.com/engine/install/ubuntu/  Install NVIDIA Container Toolkit: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html  Step 3: Launch Jupyter Notebook Container  [code lang="js"] image="quay.io/jupyter/tensorflow-notebook:cuda-python-3.11"  docker run -p 8888:8888 \    -v ~/work:/home/jovyan/work \    --detach \    --name notebook \    --gpus all \  $image  [/code] Step 4: Retrieve Access Token  [code lang="js"] docker ps            # lấy container ID  docker logs -f <ID>  # tìm token trong log  [/code] Step 5: Access Notebook via SSH Tunnel  Open a browser and go to http://localhost:13888 using the token retrieved in Step 4.  [code lang="js"] ssh -L 13888:127.0.0.1:8888 -J <user_jump>@<jump_ip> <user_vm>@<vm_ip>  [/code] III. Running Basic Notebooks  After successfully accessing Jupyter Notebook, users can run notebooks to validate the setup:  1. Check GPU with TensorFlow [code lang="js"] import tensorflow as tf  tf.config.list_physical_devices()  [/code] 2. Directly test GPU driver  Run mnist-example notebook 3. Try Stable Diffusion (Optional) https://github.com/nebuly-ai/learning-hub/blob/main/notebooks/notebooks/stable-diffusion.ipynb  Conclusion  This guide outlines a step-by-step process for deploying a GPU Notebook environment on FPT Smart Cloud’s AI Factory infrastructure. It enables users to easily spin up virtual machines, configure the environment, and run basic AI models such as TensorFlow or GPU-based inference.  The deployment model using a Jump Server ensures secure external access while offering flexibility for scaling and experimenting with more advanced AI workloads. This platform is ideal for research teams, product development, or enterprises aiming to rapidly prototype and test AI models without upfront hardware investment. 

FPT Announces Strategic Partnership and Investment with Sumitomo and SBI Holdings

13:54 22/04/2025
Hanoi, April 22, 2025 – FPT Corporation announced a strategic partnership with Sumitomo Corporation and SBI Holdings - Japan’s leading conglomerates in the finance and industrial sectors to accelerate artificial intelligence (AI) adoption through the FPT AI Factory ecosystem, contributing to the advancement of sovereign AI in Japan. Under this partnership, Sumitomo Corporation and SBI Holdings will each invest 20% and 20% in FPT Smart Cloud Japan, a subsidiary of FPT Corporation. This partnership lays a critical foundation for delivering cutting-edge AI solutions to organizations and enterprises in Japan, expediting AI integration across all aspects of society and supporting the nation’s ambition to become a global AI leader. Combining FPT’s technological capabilities with the extensive networks and expertise of Sumitomo Corporation and SBI Holdings across various industries, the three parties are committed to scaling AI and Cloud business in Japan. Together, they aim to build a diversified product and service ecosystem that meets the unique and increasingly complex demands of the Japanese market. FPT’s Chairman Mr. Truong Gia Binh, SBI Holdings Representative Director, Chairman, President & CEO Mr. Yoshitaka Kitao, and Sumitomo Corporation Director, Vice Chairman Mr. Toshikazu NAMBU signed an investment joint venture in Japan. Mr. Truong Gia Binh, Founder and Chairman of FPT Corporation, emphasized: “Sharing a common vision for the transformative potential of AI, we are working closely with our strategic partners to expand the global application of AI technologies. This partnership also contributes to fostering innovation, strengthening organizational competitiveness, and maintaining technology autonomy, supporting Japan’s goal of becoming an AI nation.” With the core philosophy of “Build Your Own AI,” FPT AI Factory aims to make AI more accessible and easily deployable for every business, organization, and individual. Leveraging FPT’s robust AI infrastructure - powered by thousands of the latest-generation GPUs, pre-packaged models, and deployment frameworks, alongside a comprehensive service ecosystem as well as proven experience in the Japanese market of FPT and its investors, FPT AI Factory enables organizations to harness their proprietary data, knowledge, and identity. This empowers them to rapidly develop tailored AI applications, unlock breakthrough performance, and create sustainable competitive advantages. About SBI Group Established in 1999, SBI Group is one of Japan’s pioneers in online financial services. The Group operates a wide range of financial businesses, offering user-friendly products and services via the Internet, primarily in the areas of securities, banking, and insurance. In addition to its core operations, SBI is also active in asset management and various global investment ventures. About Sumitomo Corporation Sumitomo Corporation (TYO: 8053) is an integrated trading and business investment company with a strong global network comprising 125 offices in 64 countries and regions. The Sumitomo Corporation Group consists of approximately 900 companies and 80,000 employees on a consolidated basis. The Group's business activities are spread across the following nine groups: Steel, Automotive, Transportation & Construction Systems, Diverse Urban Development, Media & Digital, Lifestyle Business, Mineral Resources, Chemicals Solutions and Energy Transformation Business. Sumitomo Corporation is committed to creating greater value for society under the corporate message of "Enriching lives and the world," based on Sumitomo’s business philosophy passed down for over 400 years. About FPT Corporation  FPT Corporation (FPT) is a global leading technology and IT services provider headquartered in Vietnam. FPT operates in three core sectors: Technology, Telecommunications, and Education. As AI is indeed a key focus, FPT has been integrating AI across its products and solutions to drive innovation and enhance user experiences within its Made by FPT ecosystem. FPT is actively working on expanding its capabilities in AI through investments in human resources, R&D, and partnerships with leading organizations like NVIDIA, Mila, AITOMATIC, and Landing AI. These efforts are aligned with FPT's ambitious goal to reach 5 billion USD in IT services revenue from global markets by 2030 and solidify its status among the world's top billion-dollar IT companies.  After nearly two decades in Japan, FPT has become one of the largest foreign-invested technology firms in the country by human resource capacity. The company delivers services and solutions to over 450 clients globally, with over 4,000 employees across 17 local offices and innovation hubs in Japan, and nearly 15,000 professionals supporting this market worldwide.   With Japan as a strategic focus for the company’s global growth, FPT has been actively expanding its business and engaging in M&A deals, such as the joint venture with Konica Minolta, strategic investment in LTS Inc, and most recently, the acquisition of NAC—its first M&A deal in the market. As digital transformation, particularly legacy system modernization, is viewed as a key growth driver in the Japanese market, the company is committed to providing end-to-end solutions and seamless services, utilizing advanced AI technologies as a primary accelerator. For more information, please visit https://fpt.com/en.  

Use Cases for Training Large Language Models (LLMs) with Slurm on Metal Cloud

14:38 21/04/2025
I. Introduction  Large Language Models (LLMs) are pushing the boundaries of artificial intelligence, enabling human-like text generation and the understanding of complex concepts. However, training these powerful models requires immense computational resources. This document explores the realm of distributed training, empowering you to leverage multiple GPUs efficiently to train LLMs using Slurm on Metal Cloud.  1. Purpose   This document presents a proof of concept (PoC) for developing and training Large Language Models (LLMs) utilizing open-source tools. The setup is designed to easily adapt to various frameworks that support distributed training and aims to streamline debugging process.  2. Context: Why Training LLMs Requires a Multi-Node (Cluster) Setup? Large Language Models (LLMs) have significantly advanced artificial intelligence, particularly in the field of natural language processing. Recent models such as GPT-2, GPT-3, and LLaMA2 can understand and generate human-like text with impressive accuracy.  Training LLMs is a highly resource-intensive task that requires substantial hardware resources. Distributed training on GPU clusters, such as NVIDIA H100, has become essential for accelerating the training process and efficiently handling large datasets.  Althought training LLMs on a single node is technically feasible, several limitations make this approach impractical:  Extended Training Time: Training on a single node significantly increases the duration of each training cycle, making it inefficient for large-scale models.  Hardware Limitations: Single-node systems often lack the memory and processing power necessary to handle extremely large models. For instance, models exceeding 70 billion parameters or datasets with over 37,000 samples may exceed the available GPU memory and storage capacity of a single machine.  Scalability Issues: As model size and dataset complexity increase, single-node training struggles to efficiently utilize resources, leading to bottlenecks and suboptimal performance.  These challenges are effectively addressed by utilizing a multi-node (cluster) training setup, which distributes computational workloads across multiple GPUs and accelerates training while ensuring scalability. This approach enables:  Parallel Processing: Distributing model training across multiple nodes reduces processing time and optimizes resource utilization.  Handling Large Models & Datasets: Multi-node setups can accommodate LLMs with billions of parameters by splitting the workload across multiple GPUs and nodes.  Improved Fault Tolerance & Flexibility: Cluster computing provides redundancy and enables better handling of system failures, ensuring training stability.  By leveraging a multi-node Slurm cluster, organizations and researchers can efficiently train LLMs while overcoming the constraints of single-node training. 3. SLURM - The Backbone of High-Performance Computing for AI As AI projects continue to grow in complexity and scale, the demand for high-performance computing (HPC) environments is increasing rapidly. This expansion requires efficient resource management—a challenge that SLURM (Simple Linux Utility for Resource Management) is designed to address effectively.  SLURM acts as the central nervous system of an HPC environment by enabling AI engineers to maximize computing cluster performance and tackle the most demanding AI workloads. It ensures:  Optimized Task Distribution: Workloads are efficiently allocated across computing nodes to maintain performance balance.  Intelligent Resource Management: Critical resources such as CPU cores, memory, and specialized hardware like GPUs are dynamically assigned to maximize efficiency.  Scalability & Adaptability: SLURM reallocates resources as needed, ensuring smooth scalability and efficient workload execution.  By leveraging SLURM, AI researchers and engineers can harness the full power of distributed computing, enabling faster and more efficient training of Large Language Models (LLMs) and other complex AI applications. 4. Why Deploy SLURM on Kubernetes? SLURM (Simple Linux Utility for Resource Management) is a widely used job scheduler for High-Performance Computing (HPC), while Kubernetes (K8s) is the leading container orchestration platform for managing distributed workloads. Combining SLURM with Kubernetes offers several advantages:  Enhanced Scalability & Dynamic Resource Allocation: Kubernetes enables auto-scaling of compute resources based on workload demand, dynamically provisioning or deallocating nodes as needed. Unlike traditional SLURM clusters, which are often static, running SLURM on K8s allows for on-demand scaling, optimizing resource utilization.  Improved Containerized Workflows & Portability: AI/ML and HPC workloads increasingly rely on containerized environments (e.g., Docker, Singularity). Kubernetes provides native support for containers, making it easier to package and deploy SLURM workloads across multi-cloud and hybrid environments.  Efficient Multi-Tenancy & Isolation: Kubernetes supports namespace-based isolation, enabling multiple teams to run SLURM jobs securely on shared infrastructure. Resource quotas and limits in K8s help ensure fair allocation of CPU, GPU, and memory among different workloads.  Integration with Cloud-Native Ecosystem: Running SLURM on K8s allows integration with cloud-native tools like Prometheus (monitoring), Grafana (visualization), and Argo Workflows (pipeline automation). This enables a modern, observability-driven approach to HPC workload management.  Cost Optimization for Cloud-Based HPC: Traditional SLURM clusters often require dedicated hardware, leading to underutilization when workloads are low. With Kubernetes, organizations can dynamically spin up and terminate cloud-based nodes, reducing unnecessary costs while ensuring peak performance during intensive computational workloads.  Deploying SLURM on Kubernetes combines the strengths of HPC job scheduling and cloud-native orchestration, providing scalability, flexibility, and cost efficiency. This approach is ideal for AI/ML training, large-scale simulations, and enterprise-level scientific computing. II. Implementation  1. Specifications Each node in the setup is equipped with the following specifications:   GPUs: 8 NVIDIA H100 GPUs, each with 80GB HBM3 memory and 700W power consumption.   OS: Ubuntu 22.04 LTS   Driver Version: 550.44.15+  CUDA Version: 12.2.1+   Docker Version: 26.1.2+  NVIDIA Toolkit: 1.15.0-1+   Built with Docker using NVIDIA toolkit   2. Steps for Training Machine Learning Models (MLMs) a. Upload Data & Model to High Performance Storage (HPS) Objective:  The first step is to prepare and upload training data and models to High Performance Storage (HPS), ensuring easy access from computing nodes.  Steps:  Training Library Reference: For training the model, we utilize the LLaMA-Factory library, which is available at the following repository:  GitHub Repository: LLaMA-Factory  Additional training guidelines and example configurations can be found in the official documentation:  Examples & Tutorials: LLaMA-Factory Examples  These resources provide detailed instructions on configuring and fine-tuning LLMs, ensuring an efficient training process.  Storing Model and Data in HPS for Training  The model and dataset are stored in the High-Performance Storage (HPS) under the following paths:  Model Storage Path: /mnt/data-hps/models/  Dataset Storage Path: /mnt/data-hps/data/  For this experiment, we use the Qwen/Qwen2.5-72B-Instruct model from Hugging Face (Qwen2.5-72B-Instruct). The model is stored in:  [code lang="js"] /mnt/data-hps/models/Qwen2.5-72B-Instruct/ [/code] Regarding the dataset, we use an SFT dataset named stem_sft, which is stored as:  [code lang="js"] /mnt/data-hps/data/stem_sft.json [/code] Registering the Dataset with LLama Factory  To train the model using LLama Factory, we must register this dataset by defining its metadata in the dataset_info.json file. The file should be structured as follows:  [code lang="js"] { "stem_aug_sft_h_fm_botest": { "file_name": "stem_aug_sft_h_fm_botest.json", "formatting": "sharegpt", "columns": { "messages": "messages" }, "tags": { "role_tag": "role", "content_tag": "content", "user_tag": "user", "assistant_tag": "assistant", "system_tag": "system" } } } [/code] b. Set Up Slurm Cluster (Infrastructure & Configuration) Objective:  Set up the Slurm cluster to manage resources and schedule tasks for model training.  Steps:  Install and Configure Slurm Controller (slurmctld):  Install Slurm Controller on a central server to manage compute nodes.  Edit slurm.conf to define node resources (e.g., CPU, GPU) and configure job parameters.  Set Up Slurm Daemon (slurmd) on Compute Nodes:  Install and configure Slurm Daemon on each compute node.  Ensure communication between compute nodes and Slurm Controller for job distribution.  c. Create Training Configuration & Training Script (LLaMA Factory) Objective:  Define the training parameters and write the training script using frameworks like LLaMA Factory.  Steps:  Create Training Configuration File:  Define hyperparameters like learning rate, batch size, number of epochs, etc., in a configuration file (e.g., config.json or train_config.yaml).  Write the Training Script:  Develop the training script (train.py) using frameworks such as PyTorch or TensorFlow. The script will include model definition, loss functions, optimizers, and training logic.  Integrate LLaMA Factory:  Use LLaMA Factory to streamline model configuration and training, optimizing the process for LLMs.  d. Create Slurm Job File (Resource Allocation & Script) Objective:  Prepare the Slurm job file to specify resource requirements and job configuration for training.  Steps:  Create the Slurm Job Script:  Write a Slurm job file (train_llm.slurm) to define resource requirements (e.g., CPU, GPU, memory) and specify the commands to run the training script.  Example train_llm.slurm file:  [code lang="js"] #!/bin/bash&amp;nbsp; #SBATCH --job-name=train_llm&amp;nbsp; #SBATCH --nodes=1&amp;nbsp; #SBATCH --gres=gpu:1&amp;nbsp; #SBATCH --time=48:00:00&amp;nbsp; #SBATCH --mem=64GB&amp;nbsp; &amp;nbsp;&amp;nbsp; module load cuda/11.2&amp;nbsp; python train.py --config config.json [/code] Define Resource Requirements: Specify the necessary resources (e.g., GPU, CPU, RAM) based on the model's training demands.  e. Submit the Slurm Job (Run the Script to Start the Job) Objective:  Submit the job to the Slurm job scheduler and begin the training process.  Steps:  Submit the Job: Use the sbatch command to send the job to Slurm for execution:  [code lang="js"] sbatch train_llm.slurm&amp;nbsp; [/code] Check Job Status: Use the squeue command to monitor the status of the job and confirm it’s running as expected.  f. Monitor Metrics (GPU Usage, Logs, etc.) Objective:  Monitor the performance of the job during training, especially resource usage like GPU and logs.  Steps:  Track GPU Usage: Use the nvidia-smi command to check GPU utilization:  [code lang="js"] nvidia-smi&amp;nbsp; [/code] Monitor Job Logs: View logs and metrics using:  scontrol show job <job_id> for job details.  tail -f slurm-<job_id>.out for real-time log monitoring.  j. Retrieve the Trained Model from the Output Path (HPS) Objective:  After training is complete, retrieve the trained model from High Performance Storage (HPS).  Steps:  Identify the Output Path: Check the Slurm job script or config file to locate the output path where the trained model is saved.  Download the Trained Model: Use scp, rsync, or API to fetch the model from HPS:  [code lang="js"] scp user@server:/path/to/output_model/model_checkpoint.p th .&amp;nbsp; [/code] Verify the Model: After downloading, verify the model’s integrity and performance.  3. Some Execution Results 3.1 Pre-training stage  Data size: 48.74 GB, context length: 4096, model size: 32B, epoch: 1  1 node:   bs/d = 1: 31 days 7:59:33 ~ 31.3 days  32 nodes:   bs/d = 1: 70h ~ 2.9 days  bs/d = 4: 31h ~ 1.3 days   bs/d = 8: OOM (Out of Memory)  3.2 Post-training stage (SFT)  Data size: 37.66 MB ~ 37,123 samples, context length: 2560, model size: 72B, epoch: 5  1 node:   bs/d = 4: 5h22m  32 nodes:   bs/d = 4: 22m   III. Conclusion  Training Large Language Models (LLMs) is a computationally demanding process that requires efficient resource management and scalable computing infrastructure. By leveraging Slurm on Metal Cloud, organizations and researchers can take full advantage of distributed training, enabling faster model convergence, optimized GPU utilization, and seamless workload orchestration.  Throughout this document, we have explored the key steps in training LLMs, from uploading models and datasets to the High-Performance Storage (HPS) to configuring Slurm clusters, submitting jobs, monitoring GPU usage, and retrieving the trained models. The integration of Kubernetes with Slurm further enhances scalability, flexibility, and cost efficiency, making it an ideal solution for handling large-scale AI workloads.  By implementing this approach, AI engineers can overcome the limitations of single-node training, efficiently manage multi-node clusters, and accelerate the development of state-of-the-art language models. As AI continues to evolve, leveraging Slurm on cloud-based HPC platforms will play a crucial role in advancing large-scale deep learning and natural language processing.  For more information and consultancy about FPT AI Factory, please contact:   Hotline: 1900 638 399   Email: [email protected]   Support: m.me/fptsmartcloud    

Vision-Language Models (VLM) Use Cases for Insurance Company on NVIDIA H100 GPUs

21:19 11/04/2025
As the demand for more intelligent and context-aware AI grows, Vision-Language Models (VLMs) have emerged as a powerful class of models capable of understanding both images and text. These models power applications such as AI assistants, medical document analysis, and automated insurance claim processing.  This article provides practical, experience-based best practices for training large VLMs using Metal Cloud, offering a scalable and high-performance AI infrastructure fueled by NVIDIA H100 GPUs. Whether you're an AI engineer, data scientist, or IT decision-maker looking to scale multimodal AI systems efficiently, this guide walks you through the architectural choices, training pipelines, and optimization strategies proven to deliver real-world impact.  1. Real-World Applications and Deployment Outcomes VLMs are transforming multiple industries:  Document Understanding & Intelligent Document Processing (IDP): Extracting insights from unstructured formats and images.  Medical & Insurance Analysis: Automating claims processing, including data entry and the adjustment process, detecting fraudulent claims, and summarizing medical documents.  Example of medical documents:                                        AI-Powered Assistants: Enabling AI chatbots with multimodal reasoning and contextual awareness.  Business Impact of PDF Data Extraction:  Reduced manual data entry time from 15 minutes to under 2 minutes.  Faster adaptation to new datasets, reducing training duration from months to weeks.  Scaled processing capacity without increasing reliance on human resource   Enhanced fraud detection capabilities through AI-driven analysis.  Based on the NVIDIA VSS Blueprint Architecture, FPT AI Factory has implemented it in automated vehicle insurance claim video processing, using the following architecture:   Figure. High-level architecture of the summarization vision AI agent  Business Impact of accessing car information and damage assessments:  Automated Damage Evaluation: Use VLM to analyze claim descriptions and video for automated damage assessment. VLM model categorizes cases into severe damage, minor damage, or no damage, directing them to the appropriate processing streams and experts. This approach enables automation of up to 80% of minor damage claims, reducing claim processing time from 20 minutes to just 2 minutes.  Enhancing Claims Processing Efficiency: Minimize human intervention and expedite claim settlements through AI-powered assessments  Detecting and Preventing Fraud: Identify anomalies and inconsistencies in claim reports to mitigate fraud  Optimizing Operational Costs: Reduce expenses associated with manual inspections and assessment processes - Example of car damage assessment with VLM   ROI of H100 Over A100  Higher initial cost, but lower total expenditure due to efficiency.  Shorter training cycles, leading to faster model deployment.  Estimated 43% reduction in overall training cost compared to A100.  2. VLM Architecture, Data Processing Pipeline, and Hardware Requirements 2.1 VLM Architecture  A standard VLM consists of three key components:  Vision Encoder: Utilize CNN or transformer-based models such as ViT, CLIP Vision Encoder, and Swin Transformer to extract image features.  Language Decoder: Implement LLMs such as GPT, LLaMA, and Qwen to generate textual outputs based on visual prompts.  Multimodal Fusion Module: Integrate image and text embeddings for cohesive output generation.  2.2 Data Processing Pipeline  The pipeline for processing image and text data in VLM training follows these key steps:  Training Phase:  Image data is passed through the Vision Encoder, extracting relevant visual features.  Text data is processed using a Text Embedder, converting it into vector representations.  Both vision and text embeddings are then fused and passed into the Language Model with Self-Attention Layers, enabling multimodal learning.  Testing Phase:  Zero-shot Visual Question Answering (VQA): The trained model can answer questions about new images it has never encoutered before.  Few-shot Image Classification: By leveraging learned embeddings, the model can classify new images with minimal labeled examples.  2.3 NVIDIA software stack in use  Training:   We use NVIDIA NeMo for our fine-tuning task. NVIDIA NeMo is an open-source framework designed for training and fine-tuning large-scale AI models, including vision-language models (VLMs), speech models, and NLP models.  The NVIDIA Nemo framework supports many utilities:  Pretrained Foundation Models: Providing optimized foundation models that can be fine-tuned for specific applications.  Model Parallelism: Supporting tensor, pipeline, and sequence parallelism, enabling the training of extremely large-scale models on multiple GPUs or nodes.  LoRA and QLoRA Support: Reducing compute and memory costs while maintaining accuracy through efficient parameter-efficient fine-tuning methods.  Integration with NVIDIA HGX Cloud: Enabling seamless cloud-based training on clusters powered by H100 GPUs.   Performance Gains with NVIDIA NeMo on H100 GPUs  ✅ 2-3x Faster Training with FP8 precision and optimized kernels ✅ 50% Lower Memory Usage using mixed precision and memory-efficient optimizers ✅ Seamless Multi-GPU Scaling with Tensor & Pipeline Parallelism  By leveraging NVIDIA NeMo on H100 GPUs, we can fine-tune VLMs efficiently at scale, reducing both compute cost and time to deployment.  Inferencing:  In order to maximize the performance of the VLM with the low latency but high throughput requirements, we utilize the TensorRL-LLM as an optimizer for VLM. With TensorRT-LLM, we achieve significantly lower latency overall and also lower TTFT. TensorRT-LLM also supports a wide range of quantization, including INT8, SmoothQuant, FPT8, INT4, GPTQ, and AWQ.  2.4 Hardware Considerations  For effective training, key hardware factors include:  Batch Size and Sequence Length: Optimized for maximum GPU utilization without memory bottlenecks.  Memory Management: Leveraging H100’s high-bandwidth memory for efficient data processing.  Parallelization Strategies: Using tensor parallelism, pipeline parallelism, and distributed training techniques to optimize large-scale models.  3. Benchmarking NVIDIA H100 vs. A100 GPUs for VLM Training While the NVIDIA H100 GPU has a higher hourly operational cost than the A100, it significantly reduces overall training expenses due to shorter training times. Case studies indicate that training on the  H100 reduces costs by approximately 43% and accelerates training by a factor of 3.5x compared to A100.  Performance comparisons highlight H100’s superior efficiency:  Metric  2 x H100 (HBM3-80GB)  2 x A100 (PCIe-80GB)  Higher is Better?  Epoch Time (Qwen2.5VL-7B, batch_size=2, num_sample=200k)  ~24 h  ~84 h  No  Inference Throughput (Qwen2.5VL-3B,  token/sec, PyTorch)  ~410  ~150  Yes  Power Consumption (100% GPU utilization, per card)  480W  250W  No  Hourly Cost  1.5 x A100  Lower  No  Total Training Cost  0.57 x A100  Higher  No  4. Lessons Learned and Optimization Strategies Resource Optimization  Maximizing GPU Utilization: Proper tuning of batch size, sequence length, and caching mechanisms.  Parallel Processing Strategies: Implementing FSDP, ZeRO, and NCCL to improve training speed.  Distributed Training Challenges  Data Synchronization: Efficient GPU communication to avoid bottlenecks.  Infrastructure Readiness: Ensuring power and cooling support for high-energy-consuming H100 clusters.  System Integration & Stability  Software Stack Compatibility: Ensuring seamless operation with PyTorch/XLA, Triton, and TensorRT.  Continuous Performance Monitoring: Regular fine-tuning to maintain optimal efficiency.  5. Future Trends in VLM Training Optimization To address increasing model complexity and computational demands, several trends are shaping the optimization of VLM training:  Scalability and Efficiency: FP8 precision, quantization techniques, and FlashAttention optimize memory utilization, ensuring fast processing.  Advanced Training Pipelines: Techniques like ZeRO (DeepSpeed) and Fully Sharded Data Parallel (FSDP) reduce memory overhead and improve scalability.  High-Performance Multi-GPU Training: H100’s NVLink 4.0 and PCIe 5.0 enable faster inter-GPU communication, minimizing bottlenecks.  Efficient Fine-Tuning Techniques: Methods such as LoRA and QLoRA allow efficient parameter tuning while reducing computational costs.  Domain-Specific Optimization: Future VLMs will be fine-tuned for specialized domains like medical imaging, legal document processing, and technical analysis, requiring tailored datasets and optimized training strategies.  6. Conclusion & Recommendations When to Choose H100  Training large-scale VLMs (7B+ parameters) requiring high batch sizes and long sequence lengths.  Deploying multi-GPU clusters with NVLink 4.0 for enhanced interconnect speeds.  Use cases demanding real-time inference with minimal latency.  When A100 is Sufficient  Smaller-scale GenAI models (under 4B parameters) with relaxed training time constraints.  Cost-sensitive projects where training duration is less critical.  Single-task models requiring less computational complexity.  Final Thoughts  With increasing demands for more sophisticated VLMs, optimizing hardware, algorithms, and training strategies remains essential. The NVIDIA H100 GPUs stands out as the preferred choice for large-scale, high-performance VLM training, driving advancements in multimodal AI and accelerating real-world applications.  Learn more about FPT AI Factory's services HERE. For more information and consultancy about FPT AI Factory, please contact: Hotline: 1900 638 399 Email: [email protected] Support: m.me/fptsmartcloud

LLaMA Factory : A Feature-Rich Toolkit for Accessible LLM Customization 

18:21 11/04/2025
As large language models (LLMs) and vision-language models (VLMs) become increasingly essential in modern AI applications, the ability to fine-tune these models on custom datasets has never been more important. However, for many developers, especially those without a deep background in machine learning, existing frameworks can be overwhelming, requiring heavy coding and complex configurations.  LLaMA Factory is an open-source toolkit designed to make LLM fine-tuning accessible to everyone. Whether you're a beginner, a non-technical professional, or an organization seeking an efficient model customization solution, LLaMA Factory simplifies the entire process with an intuitive web interface and support for dozens of fine-tuning strategies.  In this article, we’ll explore what makes LLaMA Factory stand out, who can benefit from it, and how it compares with other popular frameworks.  Who should use LLaMA Factory?   LLaMA Factory is ideal for:  🧑‍💻 Beginner developers experimenting with LLMs  📊 Data analysts and researchers without ML expertise  🧠 AI enthusiasts working on personal or community projects  🏢 Small teams or startups without ML engineering bandwidth  If you want to fine-tune powerful open-source models like LLaMA or Mistral on your own dataset without writing a line of code, this tool is built for you.  What Does LLaMA Factory Offer?  LLaMA-Factory is an open-source project that provides a comprehensive set of tools and scripts for fine-tuning, serving, and benchmarking LLM and VLM models. It provides a solution for flexibly customizing the fine-tuning of 100+ LLMs and VLMs without the need for coding through the built-in web UI LlamaBoard.  The LLaMA-Factory repository makes it easy to get started with large models by providing:  Scripts for data preprocessing and tokenization tasks  Training pipelines for fine-tuning models.  Inference scripts for generating text with trained models  Benchmarking tools to evaluate model performance  Gradio web UI for interactive testing and training.  LLaMA Factory is designed specifically for beginners and non-technical professionals who want to fine-tune open-source LLMs on their custom datasets without learning complex concepts of AI. Users simply select a model, upload  their dataset, and adjust a few parameters to initiate the training process.   Once training is complete, the same web application can be used to test the model before exporting it to Hugging Face or saved locally. This provides a fast and efficient way for fine-tuning LLMs in a local environment.   Figure: LLaMA Factory Architecture  Comparing Feature support across LLM training frameworks   Here's how LLaMA Factory stacks up against other popular LLM fine-tuning frameworks like FastChat, LitGPT, and LMFlow:    Llama Factory  FastChat  LitGPT  LMFlow  Open-Instruct  LoRA  ✓  ✓  ✓  ✓  ✓  QLoRA  ✓  ✓  ✓  ✓  ✓  DoRA  ✓          LoRA+  ✓          PiSSA  ✓          GaLore  ✓  ✓    ✓  ✓  BAdam  ✓          Flash attention  ✓  ✓  ✓  ✓  ✓  S2 attention  ✓          Unsloth  ✓    ✓      DeepSpeed  ✓  ✓  ✓  ✓  ✓  SFT  ✓  ✓  ✓  ✓  ✓  RLHF  ✓      ✓    DPO  ✓        ✓  KTO  ✓          ORPO  ✓          Table: Comparison of features in LlamaFactory with popular frameworks of fine-tuning LLMs  Note: While most frameworks are built on PyTorch and have similar hardware requirements, LLaMA Factory differentiates itself through its ease of use, wide feature support, and strong community. It stands out with extensive support for multiple fine-tuning techniques, including LoRA, QLoRA, DoRA, PiSSA, and more, providing users with flexibility to optimize models based on their specific needs.  Fine-Tuning Techniques Supported    Freeze-tuning  GaLore  LoRA  DoRA  LoRA+  PiSSA  Mixed precision  ✓  ✓  ✓  ✓  ✓  ✓  Checkpointing  ✓  ✓  ✓  ✓  ✓  ✓  Flash attention  ✓  ✓  ✓  ✓  ✓  ✓  S2 attention  ✓  ✓  ✓  ✓  ✓  ✓  Quantization  ✗  ✗  ✓  ✓  ✓  ✓  Unsloth  ✗  ✗  ✓  ✓  ✓  ✓  Table 2: Compatibility between the fine-tuning techniques featured in Llama Factory.  Quick Overview of Techniques Freeze-tuning: involves freezing a majority of parameters while fine-tuning the remaining parameters in a small subset of decoder layers.  Gradient low-rank projection: projects gradients into a lower-dimensional space, facilitating full-parameter learning in a memory-efficient manner.  Low-rank adaptation freezes all pre-trained weights and introduces a pair of trainable low-rank matrices to the designated layer.  QLoRA: LoRA combines with quantization to reduce memory usage.  DoRA (Weight-Decomposed Low-Rank Adaptation): breaks down pre-trained weights into magnitude and direction components and updates directional components for enhanced performance.  LoRA+: is proposed to overcome the sub-optimality of LoRA.  PiSSA (Principal Singular Values and Singular Vectors Adaptation) initializes adapters with the principal components of the pre-trained weights for faster convergence.  Quick Start to LLaMA-Factory 1. Installing Dependencies Workspace and environment can be set up easily by cloning LLaMA Factory repository.   [code lang="js"] git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git cd LLaMA-Factory conda create --name llama-factory python=3.11 conda activate llama-factory pip install -e ".[torch,liger-kernel,metrics]" pip install deepspeed==0.14.4 [/code] 2. Preparing Dataset LLaMA Factory supports multiple data formats for various training methods (e.g., SFT, DPO, RLHF...) as well as sample datasets to help us visualize the structure of a dataset.  All data is stored in the /data directory. Users can prepare a customized dataset and update its information in the dataset_info.json file which is also located in the /data directory.  For example, I have a dataset where the image paths, prompt information, and model responses are stored in the file C:/User/annotations.json, and the dataset is structured as follows:  At this point, you can add the following information to the dataset_info.json file:  3. Finetuning You can choose to fine-tune via LLaMA Factory's WebUI by running the following command in the terminal:  [code lang="js"] cd LLaMA-Factory GRADIO_SHARE=1 llamafactory-cli webui [/code] The web interface will appear as follows:  You can adjust the required training configurations, specify the path to the output directory, and click 'Start' to begin the training process.  Additionally, you can fine-tune via the command line by preparing a config.yaml file that includes the required training configurations. You can find examples of different training configurations in the /examples directory.  Then, run the following command to start the training process:  [code lang="js"] llamafactory-cli train training_config.yaml [/code] 4. Merge LoRA In the case of LoRA training, the adapter weights need to be merged with the original model to obtain the fine-tuned model. This process can be executed via the WebUI by selectingclicking on the 'Export' tab. For the command line, you also need to prepare a configuration file in YAML format, as shown in the following example:  Then run the following command to merge:  [code lang="js"] llamafactory-cli export merge_config.yaml [/code] 5. Resource monitoring  After setting up the environment successfully, you can check the running process using the 'nvidia-smi' command. Below is an example of training a LLM model using LlamaFactory on H100 node  Conclusion  LLaMA Factory is a powerful and user-friendly framework designed to lower the barrier to entry for individuals interested in customizing large language models. It offers a comprehensive suite of state-of-the-art fine-tuning techniques, intuitive UI controls, and compatibility with popular deployment tools while eliminating the need for coding.  Whether you're an ML novice or just want a faster way to experiment with LLMs, LLaMA Factory is definitely worth checking out.  Learn more about FPT AI Factory's services HERE. For more information and consultancy about FPT AI Factory, please contact: Hotline: 1900 638 399 Email: [email protected] Support: m.me/fptsmartcloud