About Us
Highlights FPT Cloud Server FPT AI Factory FPT Network FPT Cloud Backup & DR FPT Storage FPT Security FPT Container FPT Database FPT Cloud Monitoring FPT Data Suite FPT.AI

Show all

Object Storage

Secure, unlimited storage to ensures efficiency as well as high and continuous data access demand.

GPU Server

Virtual server integration for 3D Rendering, AI or ML

FPT Load Balancing

Enhance application capacity and availability.

FPT AI Factory

Access to an all-inclusive stack for AI development, driven by NVIDIA’s powerful technology!

Cloud WAF

FPT Web Application Firewall provides powerful protection for web applications

Cloud Server

Advanced virtual server with rapid scalability

Backup Service

Backup and restore data instantly, securely and maintain data integrity.

Cloud Server

Advanced virtual server with rapid scalability

FPT AI Factory

Access to an all-inclusive stack for AI development, driven by NVIDIA’s powerful technology!

FPT Load Balancing

Enhance application capacity and availability.

Backup Service

Backup and restore data instantly, securely and maintain data integrity.

Disaster Recovery Service

Recovery, ensuring quick operation for the business after all incidents and disasters.

Block Storage

Diverse throughput and capacity to meet various business workloads.

Object Storage

Secure, unlimited storage to ensures efficiency as well as high and continuous data access demand.

Cloud WAF

FPT Web Application Firewall provides powerful protection for web applications

FPT Cloud WAPPLES

Intelligent and Comprehensive Virtual Web Application Firewall - Security Collaboration between FPT Cloud and Penta Security.

Next-Gen Firewall

The Next generation firewall security service

Container Registry

Easily store, manage, deploy, and secure Container images

Kubernetes Engine

Safe, secure, stable, high-performance Kubernetes platform

FPT Database for MongoDB

Provided as a service to deploy, monitor, backup, restore, and scale MongoDB databases on cloud.

FPT Database for Redis

Provided as a service to deploy, monitor, backup, restore, and scale Redis databases on cloud.

PostgreSQL Database Engine

Provided as a service to deploy, monitor, backup, restore, and scale PostgreSQL databases on cloud.

Monitoring

System Monitoring Solution anywhere, anytime, anyplatform

FPT Data Suite

Helps reduce operational costs by up to 40% compared to traditional BI solutions, while improving efficiency through optimized resource usage and infrastructure scaling.
Pricing
Partner
- Tech news
- White Paper
Event

ENG

Augment Computer Vision Applications with Agentic AI

Author: Nguyễn Thùy Dương

09:52 14/11/2025

Table of Contents

Today’s computer vision systems are highly effective at detecting what happens in physical environments: identifying objects, anomalies, or events. However, they still struggle to explain why those events matter, articulate fine-grained scene details, or reason about what could happen next.

Agentic intelligence powered by vision language models (VLMs) can help bridge this gap, giving teams quick, easy access to key insights and analyses that connect text descriptors with spatial-temporal information and billions of visual data points captured by their systems every day.

There are three practical ways organizations can upgrade their existing computer vision systems by integrating agentic AI capabilities:

Apply dense captioning for searchable visual content.
Augment system alerts with detailed context.
Use AI reasoning to summarize information from complex scenarios and answer questions.

Making Visual Content Searchable With Dense Captions

Traditional video search tools built on convolutional neural networks (CNNs) often lack context and semantic depth. They are optimized for narrow tasks such as object detection but cannot describe scenes or convert vision into text. As a result, teams still spend significant time manually reviewing footage to extract insights.

By embedding VLMs into existing applications, businesses can automatically produce highly detailed captions for both images and videos. These captions transform raw visual data into rich, searchable metadata, enabling flexible search beyond simple filenames or labels.

This approach is already proving its value. For example, advanced inspection platforms have used VLM-powered understanding to transform millions of images into structured reports, dramatically improving accuracy and reducing manual effort. Systems enhanced with agentic AI have achieved up to 96% defect-detection accuracy, compared with roughly 24% using manual inspection, reducing downtime and improving overall quality control.

For enterprises in manufacturing, transportation, and public services, dense captioning enables transparent, consistent insights essential for compliance, safety, and operational excellence.

Augmenting Computer Vision System Alerts With VLM Reasoning

CNN-based computer vision systems often generate binary detection alerts such as yes or no, and true or false. Without the deep reasoning powered by VLMs, these alerts may trigger false positives, overlook key details, or fail to provide context. This can lead to unnecessary operational costs, reduced trust in automation, and poor decision-making in safety-critical environments.

Instead of replacing existing infrastructure, organizations can layer VLMs on top of current CV systems to create an intelligent review mechanism. When an incident is detected, the VLM adds context: clarifying where it happened, how it occurred, and why it matters.

Smart-city applications have shown the power of this approach. For instance, Linker Vision uses VLMs to verify critical city alerts, such as traffic accidents, flooding, or falling poles and trees from storms. This reduces false positives and adds vital context to each event to improve real-time municipal response.

AI Smart City

Linker Vision’s architecture for agentic AI involves automating event analysis from over 50,000 diverse smart city camera streams to enable cross-department remediation, coordinating actions across teams like traffic control, utilities, and first responders when incidents occur. The ability to query across all camera streams simultaneously enables systems to quickly and automatically turn observations into insights and trigger recommendations for next best actions.

Automatically Analyze Complex Scenarios With Agentic AI

As organizations expand their sensor networks, spanning video, audio, text logs, and IoT devices, they need AI that can reason across all modalities, not just vision. This is possible by combining VLMs with reasoning models, large language models (LLMs), retrieval-augmented generation (RAG), computer vision, and speech transcription.

A simple VLM integration is sufficient for verifying short clips, but standalone models are limited by the number of visual tokens they can process. This often results in shallow, surface-level answers. However, this approach is limited by how many visual tokens a single model can process at once, resulting in surface-level answers without context over longer time periods and external knowledge.

In contrast, whole architectures built on agentic AI enable scalable, accurate processing of lengthy and multichannel video archives. This leads to deeper, more accurate, and more reliable insights that go beyond surface-level understanding. Agentic systems can be used for root-cause analysis or analysis of long inspection videos to generate reports with timestamped insights.

Source: NVIDIA

Maybe you are interested

01.

What’s New on FPT AI Factory

02.

What Are AI Agents? Examples, How they work, How to use them.

03.

Vision-Language Models (VLM) Use Cases for Insurance Company on NVIDIA H100 GPUs

04.

Use Cases for Training Large Language Models (LLMs) with Slurm on Metal Cloud

Maybe you are interested

What’s New on FPT AI Factory

16:39 30/09/2025

What Are AI Agents? Examples, How they work, How to use them.

14:07 22/07/2025

Vision-Language Models (VLM) Use Cases for Insurance Company on NVIDIA H100 GPUs

21:19 11/04/2025

Use Cases for Training Large Language Models (LLMs) with Slurm on Metal Cloud

14:38 21/04/2025

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months
cookielawinfo-checbox-functional	11 months
cookielawinfo-checbox-others	11 months
cookielawinfo-checkbox-necessary	11 months
cookielawinfo-checkbox-performance	11 months
viewed_cookie_policy	11 months